1
|
Herr K, Lu P, Diamreyan K, Xu H, Mendonca E, Weaver KN, Chen J. Estimating prevalence of rare genetic disease diagnoses using electronic health records in a children's hospital. HGG ADVANCES 2024; 5:100341. [PMID: 39148290 PMCID: PMC11401171 DOI: 10.1016/j.xhgg.2024.100341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 08/09/2024] [Accepted: 08/09/2024] [Indexed: 08/17/2024] Open
Abstract
Rare genetic diseases (RGDs) affect a significant number of individuals, particularly in pediatric populations. This study investigates the efficacy of identifying RGD diagnoses through electronic health records (EHRs) and natural language processing (NLP) tools, and analyzes the prevalence of identified RGDs for potential underdiagnosis at Cincinnati Children's Hospital Medical Center (CCHMC). EHR data from 659,139 pediatric patients at CCHMC were utilized. Diagnoses corresponding to RGDs in Orphanet were identified using rule-based and machine learning-based NLP methods. Manual evaluation assessed the precision of the NLP strategies, with 100 diagnosis descriptions reviewed for each method. The rule-based method achieved a precision of 97.5% (95% CI: 91.5%, 99.4%), while the machine-learning-based method had a precision of 73.5% (95% CI: 63.6%, 81.6%). A manual chart review of 70 randomly selected patients with RGD diagnoses confirmed the diagnoses in 90.3% (95% CI: 82.0%, 95.2%) of cases. A total of 37,326 pediatric patients were identified with 977 RGD diagnoses based on the rule-based method, resulting in a prevalence of 5.66% in this population. While a majority of the disorders showed a higher prevalence at CCHMC compared with Orphanet, some diseases, such as 1p36 deletion syndrome, indicated potential underdiagnosis. Analyses further uncovered disparities in RGD prevalence and age of diagnosis across gender and racial groups. This study demonstrates the utility of employing EHR data with NLP tools to systematically investigate RGD diagnoses in large cohorts. The identified disparities underscore the need for enhanced approaches to guarantee timely and accurate diagnosis and management of pediatric RGDs.
Collapse
Affiliation(s)
- Kate Herr
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Peixin Lu
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Kessi Diamreyan
- University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
| | - Huan Xu
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Eneida Mendonca
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
| | - K Nicole Weaver
- University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA; Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; Heart Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Jing Chen
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA.
| |
Collapse
|
2
|
Li F, Phadte A, Bhatia M, Barndt S, Monte Carlo AR, Hou CFD, Yang R, Strock S, Pluciennik A. Structural and molecular basis of FAN1 defects in promoting Huntington's disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.07.617005. [PMID: 39416186 PMCID: PMC11482860 DOI: 10.1101/2024.10.07.617005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
FAN1 is a DNA dependent nuclease whose proper function is essential for maintaining human health. For example, a genetic variant in FAN1, Arg507 to His hastens onset of Huntington's disease, a repeat expansion disorder for which there is no cure. How the Arg507His mutation affects FAN1 structure and enzymatic function is unknown. Using cryo-EM and biochemistry, we have discovered that FAN1 arginine 507 is critical for its interaction with PCNA, and mutation of Arg507 to His attenuates assembly of the FAN1-PCNA on a disease-relevant extrahelical DNA extrusions formed within DNA repeats. This mutation concomitantly abolishes PCNA-FAN1-dependent cleavage of such extrusions, underscoring the importance of PCNA to the genome stabilizing function of FAN1. These results unravel the molecular basis for a specific mutation in FAN1 that dramatically hastens the onset of Huntington's disease.
Collapse
|
3
|
van Karnebeek CDM, O'Donnell-Luria A, Baynam G, Baudot A, Groza T, Jans JJM, Lassmann T, Letinturier MCV, Montgomery SB, Robinson PN, Sansen S, Mehrian-Shai R, Steward C, Kosaki K, Durao P, Sadikovic B. Leaving no patient behind! Expert recommendation in the use of innovative technologies for diagnosing rare diseases. Orphanet J Rare Dis 2024; 19:357. [PMID: 39334316 PMCID: PMC11438178 DOI: 10.1186/s13023-024-03361-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 09/11/2024] [Indexed: 09/30/2024] Open
Abstract
Genetic diagnosis plays a crucial role in rare diseases, particularly with the increasing availability of emerging and accessible treatments. The International Rare Diseases Research Consortium (IRDiRC) has set its primary goal as: "Ensuring that all patients who present with a suspected rare disease receive a diagnosis within one year if their disorder is documented in the medical literature". Despite significant advances in genomic sequencing technologies, more than half of the patients with suspected Mendelian disorders remain undiagnosed. In response, IRDiRC proposes the establishment of "a globally coordinated diagnostic and research pipeline". To help facilitate this, IRDiRC formed the Task Force on Integrating New Technologies for Rare Disease Diagnosis. This multi-stakeholder Task Force aims to provide an overview of the current state of innovative diagnostic technologies for clinicians and researchers, focusing on the patient's diagnostic journey. Herein, we provide an overview of a broad spectrum of emerging diagnostic technologies involving genomics, epigenomics and multi-omics, functional testing and model systems, data sharing, bioinformatics, and Artificial Intelligence (AI), highlighting their advantages, limitations, and the current state of clinical adaption. We provide expert recommendations outlining the stepwise application of these innovative technologies in the diagnostic pathways while considering global differences in accessibility. The importance of FAIR (Findability, Accessibility, Interoperability, and Reusability) and CARE (Collective benefit, Authority to control, Responsibility, and Ethics) data management is emphasized, along with the need for enhanced and continuing education in medical genomics. We provide a perspective on future technological developments in genome diagnostics and their integration into clinical practice. Lastly, we summarize the challenges related to genomic diversity and accessibility, highlighting the significance of innovative diagnostic technologies, global collaboration, and equitable access to diagnosis and treatment for people living with rare disease.
Collapse
Affiliation(s)
- Clara D M van Karnebeek
- Departments of Pediatrics and Human Genetics, Emma Center for Personalized Medicine, Amsterdam Gastro-Enterology Endocrinology Metabolism, Amsterdam University Medical Centers, Amsterdam, The Netherlands.
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, USA
| | - Gareth Baynam
- Aix Marseille Univ, INSERM, Marseille Medical Genetics, MMG, Marseille, France
| | - Anaïs Baudot
- Aix Marseille Univ, INSERM, Marseille Medical Genetics, MMG, Marseille, France
| | - Tudor Groza
- Rare Care Centre, Perth Children's Hospital and Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, Perth, Australia
- European Molecular Biology Laboratory (EMBL-EBI), European Bioinformatics Institute, Hinxton, UK
| | - Judith J M Jans
- Department of Genetics, Section Metabolic Diagnostics, University Medical Center Utrecht, Utrecht, The Netherlands
| | | | | | | | | | | | - Ruty Mehrian-Shai
- Pediatric Brain Cancer Molecular Lab, Sheba Medical Center, Ramat Gan, Israel
| | | | | | - Patricia Durao
- The Cure and Action for Tay-Sachs (CATS) Foundation, Altringham, UK
| | - Bekim Sadikovic
- Verspeeten Clinical Genome Centre, London Health Sciences, London, Canada
- Department of Pathology and Laboratory Medicine, Western University, London, Canada
| |
Collapse
|
4
|
Sun C, Cheng X, Xu J, Chen H, Tao J, Dong Y, Wei S, Chen R, Meng X, Ma Y, Tian H, Guo X, Bi S, Zhang C, Kang J, Zhang M, Lv H, Shang Z, Lv W, Zhang R, Jiang Y. A review of disease risk prediction methods and applications in the omics era. Proteomics 2024; 24:e2300359. [PMID: 38522029 DOI: 10.1002/pmic.202300359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 03/08/2024] [Accepted: 03/12/2024] [Indexed: 03/25/2024]
Abstract
Risk prediction and disease prevention are the innovative care challenges of the 21st century. Apart from freeing the individual from the pain of disease, it will lead to low medical costs for society. Until very recently, risk assessments have ushered in a new era with the emergence of omics technologies, including genomics, transcriptomics, epigenomics, proteomics, and so on, which potentially advance the ability of biomarkers to aid prediction models. While risk prediction has achieved great success, there are still some challenges and limitations. We reviewed the general process of omics-based disease risk model construction and the applications in four typical diseases. Meanwhile, we highlighted the problems in current studies and explored the potential opportunities and challenges for future clinical practice.
Collapse
Affiliation(s)
- Chen Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The EWAS Project, Harbin, China
| | - Xiangshu Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The EWAS Project, Harbin, China
| | - Jing Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The EWAS Project, Harbin, China
| | - Haiyan Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Junxian Tao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The EWAS Project, Harbin, China
| | - Yu Dong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The EWAS Project, Harbin, China
| | - Siyu Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The EWAS Project, Harbin, China
| | - Rui Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xin Meng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yingnan Ma
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The EWAS Project, Harbin, China
| | - Hongsheng Tian
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xuying Guo
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shuo Bi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chen Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jingxuan Kang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Mingming Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hongchao Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Zhenwei Shang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Wenhua Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Ruijie Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yongshuai Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The EWAS Project, Harbin, China
| |
Collapse
|
5
|
Raper AC, Weathers BL, Drivas TG, Ellis CA, Kripke CM, Oyer RA, Owens AT, Verma A, Wileyto PE, Wollack CC, Zhou W, Ritchie MD, Schnoll RA, Nathanson KL. Protocol for a type 3 hybrid implementation cluster randomized clinical trial to evaluate the effect of patient and clinician nudges to advance the use of genomic medicine across a diverse health system. Implement Sci 2024; 19:61. [PMID: 39160614 PMCID: PMC11331805 DOI: 10.1186/s13012-024-01385-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 07/14/2024] [Indexed: 08/21/2024] Open
Abstract
BACKGROUND Germline genetic testing is recommended for an increasing number of conditions with underlying genetic etiologies, the results of which impact medical management. However, genetic testing is underutilized in clinics due to system, clinician, and patient level barriers. Behavioral economics provides a framework to create implementation strategies, such as nudges, to address these multi-level barriers and increase the uptake of genetic testing for conditions where the results impact medical management. METHODS Patients meeting eligibility for germline genetic testing for a group of conditions will be identified using electronic phenotyping algorithms. A pragmatic, type 3 hybrid cluster randomization study will test nudges to patients and/or clinicians, or neither. Clinicians who receive nudges will be prompted to either refer their patient to genetics or order genetic testing themselves. We will use rapid cycle approaches informed by clinician and patient experiences, health equity, and behavioral economics to optimize these nudges before trial initiation. The primary implementation outcome is uptake of germline genetic testing for the pre-selected health conditions. Patient data collected through the electronic health record (e.g. demographics, geocoded address) will be examined as moderators of the effect of nudges. DISCUSSION This study will be one of the first randomized trials to examine the effects of patient- and clinician-directed nudges informed by behavioral economics on uptake of genetic testing. The pragmatic design will facilitate a large and diverse patient sample, allow for the assessment of genetic testing uptake, and provide comparison of the effect of different nudge combinations. This trial also involves optimization of patient identification, test selection, ordering, and result reporting in an electronic health record-based infrastructure to further address clinician-level barriers to utilizing genomic medicine. The findings may help determine the impact of low-cost, sustainable implementation strategies that can be integrated into health care systems to improve the use of genomic medicine. TRIAL REGISTRATION ClinicalTrials.gov. NCT06377033. Registered on March 31, 2024. https://clinicaltrials.gov/study/NCT06377033?term=NCT06377033&rank=1.
Collapse
Affiliation(s)
- Anna C Raper
- Division of Translational Medicine and Human Genetics, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Benita L Weathers
- Division of Translational Medicine and Human Genetics, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Theodore G Drivas
- Division of Translational Medicine and Human Genetics, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Colin A Ellis
- Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Colleen Morse Kripke
- Division of Translational Medicine and Human Genetics, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Randall A Oyer
- Abramson Cancer Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anjali T Owens
- Division of Cardiology, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anurag Verma
- Division of Translational Medicine and Human Genetics, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Paul E Wileyto
- Division of Biostatistics, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Colin C Wollack
- Information Services Applications, Penn Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Wenting Zhou
- Division of Translational Medicine and Human Genetics, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Marylyn D Ritchie
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Abramson Cancer Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Robert A Schnoll
- Abramson Cancer Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Center for Interdisciplinary Research on Nicotine Addiction, University of Pennsylvania, Philadelphia, PA, USA
| | - Katherine L Nathanson
- Division of Translational Medicine and Human Genetics, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA.
- Abramson Cancer Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
6
|
Adams DR, van Karnebeek CDM, Agulló SB, Faùndes V, Jamuar SS, Lynch SA, Pintos-Morell G, Puri RD, Shai R, Steward CA, Tumiene B, Verloes A. Addressing diagnostic gaps and priorities of the global rare diseases community: Recommendations from the IRDiRC diagnostics scientific committee. Eur J Med Genet 2024; 70:104951. [PMID: 38848991 DOI: 10.1016/j.ejmg.2024.104951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 06/05/2024] [Indexed: 06/09/2024]
Abstract
The International Rare Diseases Research Consortium (IRDiRC) Diagnostic Scientific Committee (DSC) is charged with discussion and contribution to progress on diagnostic aspects of the IRDiRC core mission. Specifically, IRDiRC goals include timely diagnosis, use of globally coordinated diagnostic pipelines, and assessing the impact of rare diseases on affected individuals. As part of this mission, the DSC endeavored to create a list of research priorities to achieve these goals. We present a discussion of those priorities along with aspects of current, global rare disease needs and opportunities that support our prioritization. In support of this discussion, we also provide clinical vignettes illustrating real-world examples of diagnostic challenges.
Collapse
Affiliation(s)
- David R Adams
- National Human Genome Research Institute, National Institutes of Health, USA.
| | - Clara D M van Karnebeek
- Departments of Pediatrics and Human Genetics, Emma Center for Personalized Medicine, Amsterdam Gastro-enterology Endocrinology Metabolism, Amsterdam University Medical Centers, the Netherlands
| | - Sergi Beltran Agulló
- Centre Nacional d'Anàlisi Genòmica (CNAG), Spain; Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona (UB), Spain
| | - Víctor Faùndes
- Laboratorio de Genética y Enfermedades Metabólicas, Instituto de Nutrición y Tecnología de los Alimentos, Universidad de Chile, Chile
| | - Saumya Shekhar Jamuar
- Genetics Service, KK Women's and Children's Hospital and Paediatrics ACP, Duke-NUS Medical School, Singapore; Singhealth Duke-NUS Institute of Precision Medicine, Singapore
| | | | - Guillem Pintos-Morell
- Vall d'Hebron Research Institute (VHIR), Vall d'Hebron Barcelona Hospital, Spain; MPS-Spain Patient Advocacy Organization, Spain
| | - Ratna Dua Puri
- Institute of Medical Genetics and Genomics, Sir Ganga Ram Hospital, India
| | - Ruty Shai
- Pediatric Cancer Molecular Lab, Sheba Medical Center, Israel
| | | | - Biruté Tumiene
- Vilnius University, Faculty of Medicine, Institute of Biomedical Sciences, Lithuania
| | - Alain Verloes
- Département de Génétique, CHU Paris - Hôpital Robert Debré, France
| |
Collapse
|
7
|
Miller-Fleming TW, Allos A, Gantz E, Yu D, Isaacs DA, Mathews CA, Scharf JM, Davis LK. Developing a phenotype risk score for tic disorders in a large, clinical biobank. Transl Psychiatry 2024; 14:311. [PMID: 39069519 DOI: 10.1038/s41398-024-03011-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 06/28/2024] [Accepted: 07/04/2024] [Indexed: 07/30/2024] Open
Abstract
Tics are a common feature of early-onset neurodevelopmental disorders, characterized by involuntary and repetitive movements or sounds. Despite affecting up to 2% of children and having a genetic contribution, the underlying causes remain poorly understood. In this study, we leverage dense phenotype information to identify features (i.e., symptoms and comorbid diagnoses) of tic disorders within the context of a clinical biobank. Using de-identified electronic health records (EHRs), we identified individuals with tic disorder diagnosis codes. We performed a phenome-wide association study (PheWAS) to identify the EHR features enriched in tic cases versus controls (n = 1406 and 7030; respectively) and found highly comorbid neuropsychiatric phenotypes, including: obsessive-compulsive disorder, attention-deficit/hyperactivity disorder, autism spectrum disorder, and anxiety (p < 7.396 × 10-5). These features (among others) were then used to generate a phenotype risk score (PheRS) for tic disorder, which was applied across an independent set of 90,051 individuals. A gold standard set of tic disorder cases identified by an EHR algorithm and confirmed by clinician chart review was then used to validate the tic disorder PheRS; the tic disorder PheRS was significantly higher among clinician-validated tic cases versus non-cases (p = 4.787 × 10-151; β = 1.68; SE = 0.06). Our findings provide support for the use of large-scale medical databases to better understand phenotypically complex and underdiagnosed conditions, such as tic disorders.
Collapse
Affiliation(s)
- Tyne W Miller-Fleming
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, TN, Nashville, USA.
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Annmarie Allos
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, TN, Nashville, USA
- Department of Cognitive Science, Dartmouth College, Hanover, NH, USA
| | - Emily Gantz
- Department of Pediatric Neurology, Children's Hospital of Alabama, Birmingham, AL, USA
- Department of Neurology, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pediatrics, Monroe Carell Jr. Children's Hospital at Vanderbilt, Nashville, TN, USA
| | - Dongmei Yu
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - David A Isaacs
- Department of Neurology, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pediatrics, Monroe Carell Jr. Children's Hospital at Vanderbilt, Nashville, TN, USA
| | - Carol A Mathews
- Department of Psychiatry, Genetics Institute, Center for OCD, Anxiety and Related Disorders, University of Florida, Gainesville, FL, USA
| | - Jeremiah M Scharf
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lea K Davis
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, TN, Nashville, USA.
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, Nashville, USA.
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, TN, Nashville, USA.
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, Nashville, USA.
| |
Collapse
|
8
|
Cirulli ET, Schiabor Barrett KM, Bolze A, Judge DP, Pawloski PA, Grzymski JJ, Lee W, Washington NL. A power-based sliding window approach to evaluate the clinical impact of rare genetic variants in the nucleotide sequence or the spatial position of the folded protein. HGG ADVANCES 2024; 5:100284. [PMID: 38509709 PMCID: PMC11004801 DOI: 10.1016/j.xhgg.2024.100284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 03/14/2024] [Accepted: 03/15/2024] [Indexed: 03/22/2024] Open
Abstract
Systematic determination of novel variant pathogenicity remains a major challenge, even when there is an established association between a gene and phenotype. Here we present Power Window (PW), a sliding window technique that identifies the impactful regions of a gene using population-scale clinico-genomic datasets. By sizing analysis windows on the number of variant carriers, rather than the number of variants or nucleotides, statistical power is held constant, enabling the localization of clinical phenotypes and removal of unassociated gene regions. The windows can be built by sliding across either the nucleotide sequence of the gene (through 1D space) or the positions of the amino acids in the folded protein (through 3D space). Using a training set of 350k exomes from the UK Biobank (UKB), we developed PW models for well-established gene-disease associations and tested their accuracy in two independent cohorts (117k UKB exomes and 65k exomes sequenced at Helix in the Healthy Nevada Project, myGenetics, or In Our DNA SC studies). The significant models retained a median of 49% of the qualifying variant carriers in each gene (range 2%-98%), with quantitative traits showing a median effect size improvement of 66% compared with aggregating variants across the entire gene, and binary traits' odds ratios improving by a median of 2.2-fold. PW showcases that electronic health record-based statistical analyses can accurately distinguish between novel coding variants in established genes that will have high phenotypic penetrance and those that will not, unlocking new potential for human genomics research, drug development, variant interpretation, and precision medicine.
Collapse
Affiliation(s)
| | | | - Alexandre Bolze
- Helix, 101 S Ellsworth Ave Suite 350, San Mateo, CA 94401, USA
| | - Daniel P Judge
- Division of Cardiology, Medical University of South Carolina, 30 Courtenay Drive, MSC 592, Charleston, SC 29425, USA
| | | | - Joseph J Grzymski
- University of Nevada, 2215 Raggio Pkwy, Reno, NV 89512, USA; Renown Institute for Health Innovation, Reno, NV 89512, USA
| | - William Lee
- Helix, 101 S Ellsworth Ave Suite 350, San Mateo, CA 94401, USA
| | | |
Collapse
|
9
|
Baynam G, Hartman AL, Letinturier MCV, Bolz-Johnson M, Carrion P, Grady AC, Dong X, Dooms M, Dreyer L, Graessner H, Granados A, Groza T, Houwink E, Jamuar SS, Vasquez-Loarte T, Tumiene B, Wiafe SA, Bjornson-Pennell H, Groft S. Global health for rare diseases through primary care. Lancet Glob Health 2024; 12:e1192-e1199. [PMID: 38876765 DOI: 10.1016/s2214-109x(24)00134-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 03/19/2024] [Accepted: 03/19/2024] [Indexed: 06/16/2024]
Abstract
Rare diseases affect over 300 million people worldwide and are gaining recognition as a global health priority. Their inclusion in the UN Sustainable Development Goals, the UN Resolution on Addressing the Challenges of Persons Living with a Rare Disease, and the anticipated WHO Global Network for Rare Diseases and WHO Resolution on Rare Diseases, which is yet to be announced, emphasise their significance. People with rare diseases often face unmet health needs, including access to screening, diagnosis, therapy, and comprehensive health care. These challenges highlight the need for awareness and targeted interventions, including comprehensive education, especially in primary care. The majority of rare disease research, clinical services, and health systems are addressed with specialist care. WHO Member States have committed to focusing on primary health care in both universal health coverage and health-related Sustainable Development Goals. Recognising this opportunity, the International Rare Diseases Research Consortium (IRDiRC) assembled a global, multistakeholder task force to identify key barriers and opportunities for empowering primary health-care providers in addressing rare disease challenges.
Collapse
Affiliation(s)
- Gareth Baynam
- Rare Care Centre, Perth Children's Hospital and Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, Perth, WA, Australia.
| | - Adam L Hartman
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | | | - Matt Bolz-Johnson
- EURORDIS-Rare Diseases Europe, Fondation Universitaire, Brussels, Belgium
| | | | - Alice Chen Grady
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, United States
| | - Xinran Dong
- Children's Hospital of Fudan University, Shanghai, China
| | - Marc Dooms
- University Hospitals Leuven, Leuven, Belgium
| | - Lauren Dreyer
- Genetic Services Western Australia, King Edward Memorial Hospital, Perth, WA, Australia
| | - Holm Graessner
- Centre for Rare Diseases, Institute for Medical Genetics and Applied Genomics, University Hospital Tübingen, Tübingen, Germany
| | - Alicia Granados
- Global Medical Affairs Rare Diseases, Sanofi, Barcelona, Spain
| | - Tudor Groza
- Rare Care Centre, Perth Children's Hospital and Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, Perth, WA, Australia; European Bioinformatics Institute, European Molecular Biology Laboratory, Hinxton, UK
| | - Elisa Houwink
- Department of Family Medicine, Mayo Clinic, Rochester, MN, USA
| | - Saumya Shekhar Jamuar
- KK Women's and Children's Hospital, SingHealth Duke-NUS Institute of Precision Medicine, Singapore
| | - Tania Vasquez-Loarte
- Rare Disease G2MC, Department of Pediatrics, Wyckoff Heights Medical Center, New York, NY, USA
| | - Biruté Tumiene
- Vilnius University Faculty of Medicine, Institute of Biomedical Sciences, Vilnius University Hospital Santaros Klinikos, Vilnius, Lithuania
| | | | | | - Stephen Groft
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
10
|
Johnson R, Stephens AV, Mester R, Knyazev S, Kohn LA, Freund MK, Bondhus L, Hill BL, Schwarz T, Zaitlen N, Arboleda VA, Bastarache LA, Pasaniuc B, Butte MJ. Electronic health record signatures identify undiagnosed patients with common variable immunodeficiency disease. Sci Transl Med 2024; 16:eade4510. [PMID: 38691621 PMCID: PMC11402387 DOI: 10.1126/scitranslmed.ade4510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 04/10/2024] [Indexed: 05/03/2024]
Abstract
Human inborn errors of immunity include rare disorders entailing functional and quantitative antibody deficiencies due to impaired B cells called the common variable immunodeficiency (CVID) phenotype. Patients with CVID face delayed diagnoses and treatments for 5 to 15 years after symptom onset because the disorders are rare (prevalence of ~1/25,000), and there is extensive heterogeneity in CVID phenotypes, ranging from infections to autoimmunity to inflammatory conditions, overlapping with other more common disorders. The prolonged diagnostic odyssey drives excessive system-wide costs before diagnosis. Because there is no single causal mechanism, there are no genetic tests to definitively diagnose CVID. Here, we present PheNet, a machine learning algorithm that identifies patients with CVID from their electronic health records (EHRs). PheNet learns phenotypic patterns from verified CVID cases and uses this knowledge to rank patients by likelihood of having CVID. PheNet could have diagnosed more than half of our patients with CVID 1 or more years earlier than they had been diagnosed. When applied to a large EHR dataset, followed by blinded chart review of the top 100 patients ranked by PheNet, we found that 74% were highly probable to have CVID. We externally validated PheNet using >6 million records from disparate medical systems in California and Tennessee. As artificial intelligence and machine learning make their way into health care, we show that algorithms such as PheNet can offer clinical benefits by expediting the diagnosis of rare diseases.
Collapse
Affiliation(s)
- Ruth Johnson
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Alexis V. Stephens
- Department of Pediatrics, Division of Immunology, Allergy and Rheumatology, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Rachel Mester
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Sergey Knyazev
- Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Lisa A. Kohn
- Department of Pediatrics, Division of Immunology, Allergy and Rheumatology, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Malika K. Freund
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Leroy Bondhus
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Brian L. Hill
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Tommer Schwarz
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Noah Zaitlen
- Department of Neurology, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Valerie A. Arboleda
- Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Lisa A. Bastarache
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA 37203
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Manish J. Butte
- Department of Pediatrics, Division of Immunology, Allergy and Rheumatology, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
11
|
Nieto-Romero V, García-Torralba A, Molinos-Vicente A, Moya FJ, Rodríguez-Perales S, García-Escudero R, Salido E, Segovia JC, García-Bravo M. Restored glyoxylate metabolism after AGXT gene correction and direct reprogramming of primary hyperoxaluria type 1 fibroblasts. iScience 2024; 27:109530. [PMID: 38577102 PMCID: PMC10993186 DOI: 10.1016/j.isci.2024.109530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 01/18/2024] [Accepted: 03/16/2024] [Indexed: 04/06/2024] Open
Abstract
Primary hyperoxaluria type 1 (PH1) is a rare inherited metabolic disorder characterized by oxalate overproduction in the liver, resulting in renal damage. It is caused by mutations in the AGXT gene. Combined liver and kidney transplantation is currently the only permanent curative treatment. We combined locus-specific gene correction and hepatic direct cell reprogramming to generate autologous healthy induced hepatocytes (iHeps) from PH1 patient-derived fibroblasts. First, site-specific AGXT corrected cells were obtained by homology directed repair (HDR) assisted by CRISPR-Cas9, following two different strategies: accurate point mutation (c.731T>C) correction or knockin of an enhanced version of AGXT cDNA. Then, iHeps were generated, by overexpression of hepatic transcription factors. Generated AGXT-corrected iHeps showed hepatic gene expression profile and exhibited in vitro reversion of oxalate accumulation compared to non-edited PH1-derived iHeps. This strategy set up a potential alternative cellular source for liver cell replacement therapy and a personalized PH1 in vitro disease model.
Collapse
Affiliation(s)
- Virginia Nieto-Romero
- Cell Technology Division, Biomedical Innovation Unit, CIEMAT (Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)-ISCIII, Instituto de Investigación Sanitaria Fundación Jiménez Díaz (IIS-FJD, UAM), 28040 Madrid, Spain
| | - Aida García-Torralba
- Cell Technology Division, Biomedical Innovation Unit, CIEMAT (Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)-ISCIII, Instituto de Investigación Sanitaria Fundación Jiménez Díaz (IIS-FJD, UAM), 28040 Madrid, Spain
| | - Andrea Molinos-Vicente
- Cell Technology Division, Biomedical Innovation Unit, CIEMAT (Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)-ISCIII, Instituto de Investigación Sanitaria Fundación Jiménez Díaz (IIS-FJD, UAM), 28040 Madrid, Spain
| | - Francisco José Moya
- Molecular Cytogenetics and Genome Editing Unit, Human Cancer Genetics Program, Centro Nacional de Investigaciones Oncológicas (CNIO), 28029 Madrid, Spain
| | - Sandra Rodríguez-Perales
- Molecular Cytogenetics and Genome Editing Unit, Human Cancer Genetics Program, Centro Nacional de Investigaciones Oncológicas (CNIO), 28029 Madrid, Spain
| | - Ramón García-Escudero
- Molecular Oncology Unit, CIEMAT (Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas), Centro de Investigación Biomédica en Red de Cáncer (CIBERONC)-ISCIII, Research Institute Hospital 12 de Octubre (imas12)-University Hospital 12 de Octubre, 28040 Madrid, Spain
| | - Eduardo Salido
- Pathology Department, Hospital Universitario de Canarias, Universidad La Laguna, Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)-ISCIII, 38320 Tenerife, Spain
| | - José-Carlos Segovia
- Cell Technology Division, Biomedical Innovation Unit, CIEMAT (Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)-ISCIII, Instituto de Investigación Sanitaria Fundación Jiménez Díaz (IIS-FJD, UAM), 28040 Madrid, Spain
| | - María García-Bravo
- Cell Technology Division, Biomedical Innovation Unit, CIEMAT (Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)-ISCIII, Instituto de Investigación Sanitaria Fundación Jiménez Díaz (IIS-FJD, UAM), 28040 Madrid, Spain
| |
Collapse
|
12
|
Qu HQ, Glessner JT, Qu J, Liu Y, Watson D, Chang X, Saeidian AH, Qiu H, Mentch FD, Connolly JJ, Hakonarson H. High Comorbidity of Pediatric Cancers in Patients with Birth Defects: Insights from Whole Genome Sequencing Analysis of Copy Number Variations. Transl Res 2024; 266:49-56. [PMID: 37989391 DOI: 10.1016/j.trsl.2023.11.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/01/2023] [Accepted: 11/17/2023] [Indexed: 11/23/2023]
Abstract
BACKGROUND Patients with birth defects (BD) exhibit an elevated risk of cancer. We aimed to investigate the potential link between pediatric cancers and BDs, exploring the hypothesis of shared genetic defects contributing to the coexistence of these conditions. METHODS This study included 1454 probands with BDs (704 females and 750 males), including 619 (42.3%) with and 845 (57.7%) without co-occurrence of pediatric onset cancers. Whole genome sequencing (WGS) was done at 30X coverage through the Kids First/Gabriella Miller X01 Program. RESULTS 8211 CNV loci were called from the 1454 unrelated individuals. 191 CNV loci classified as pathogenic/likely pathogenic (P/LP) were identified in 309 (21.3%) patients, with 124 (40.1%) of these patients having pediatric onset cancers. The most common group of CNVs are pathogenic deletions covering the region ChrX:52,863,011-55,652,521, seen in 162 patients including 17 males. Large recurrent P/LP duplications >5MB were detected in 33 patients. CONCLUSIONS This study revealed that P/LP CNVs were common in a large cohort of BD patients with high rate of pediatric cancers. We present a comprehensive spectrum of P/LP CNVs in patients with BDs and various cancers. Notably, deletions involving E2F target genes and genes implicated in mitotic spindle assembly and G2/M checkpoint were identified, potentially disrupting cell-cycle progression and providing mechanistic insights into the concurrent occurrence of BDs and cancers.
Collapse
Affiliation(s)
- Hui-Qi Qu
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Joseph T Glessner
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA; Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, 19104, USA; Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Jingchun Qu
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Yichuan Liu
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Deborah Watson
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Xiao Chang
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Amir Hossein Saeidian
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Haijun Qiu
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Frank D Mentch
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - John J Connolly
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Hakon Hakonarson
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA; Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, 19104, USA; Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA; Division of Pulmonary Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA; Faculty of Medicine, University of Iceland, Reykjavik, Iceland.
| |
Collapse
|
13
|
Rivière JG, Soler Palacín P, Butte MJ. Proceedings from the inaugural Artificial Intelligence in Primary Immune Deficiencies (AIPID) conference. J Allergy Clin Immunol 2024; 153:637-642. [PMID: 38224784 PMCID: PMC11402388 DOI: 10.1016/j.jaci.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 01/09/2024] [Accepted: 01/11/2024] [Indexed: 01/17/2024]
Abstract
Here, we summarize the proceedings of the inaugural Artificial Intelligence in Primary Immune Deficiencies conference, during which experts and advocates gathered to advance research into the applications of artificial intelligence (AI), machine learning, and other computational tools in the diagnosis and management of inborn errors of immunity (IEIs). The conference focused on the key themes of expediting IEI diagnoses, challenges in data collection, roles of natural language processing and large language models in interpreting electronic health records, and ethical considerations in implementation. Innovative AI-based tools trained on electronic health records and claims databases have discovered new patterns of warning signs for IEIs, facilitating faster diagnoses and enhancing patient outcomes. Challenges in training AIs persist on account of data limitations, especially in cases of rare diseases, overlapping phenotypes, and biases inherent in current data sets. Furthermore, experts highlighted the significance of ethical considerations, data protection, and the necessity for open science principles. The conference delved into regulatory frameworks, equity in access, and the imperative for collaborative efforts to overcome these obstacles and harness the transformative potential of AI. Concerted efforts to successfully integrate AI into daily clinical immunology practice are still needed.
Collapse
Affiliation(s)
- Jacques G Rivière
- Infection and Immunity in Pediatric Patients Research Group, Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Pediatric Infectious Diseases and Immunodeficiencies Unit, Hospital Infantil i de la Dona, Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Jeffrey Modell Diagnostic and Research Center for Primary Immunodeficiencies, Barcelona, Spain; Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Pere Soler Palacín
- Infection and Immunity in Pediatric Patients Research Group, Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Pediatric Infectious Diseases and Immunodeficiencies Unit, Hospital Infantil i de la Dona, Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Jeffrey Modell Diagnostic and Research Center for Primary Immunodeficiencies, Barcelona, Spain; Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Manish J Butte
- Division of Immunology, Allergy, and Rheumatology, Department of Pediatrics, University of California Los Angeles, Los Angeles, Calif; Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, Los Angeles, Calif; Department of Human Genetics, University of California Los Angeles, Los Angeles, Calif.
| |
Collapse
|
14
|
Moynihan D, Monaco S, Ting TW, Narasimhalu K, Hsieh J, Kam S, Lim JY, Lim WK, Davila S, Bylstra Y, Balakrishnan ID, Heng M, Chia E, Yeo KK, Goh BK, Gupta R, Tan T, Baynam G, Jamuar SS. Cluster analysis and visualisation of electronic health records data to identify undiagnosed patients with rare genetic diseases. Sci Rep 2024; 14:5056. [PMID: 38424111 PMCID: PMC10904843 DOI: 10.1038/s41598-024-55424-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 02/23/2024] [Indexed: 03/02/2024] Open
Abstract
Rare genetic diseases affect 5-8% of the population but are often undiagnosed or misdiagnosed. Electronic health records (EHR) contain large amounts of data, which provide opportunities for analysing and mining. Data mining, in the form of cluster analysis and visualisation, was performed on a database containing deidentified health records of 1.28 million patients across 3 major hospitals in Singapore, in a bid to improve the diagnostic process for patients who are living with an undiagnosed rare disease, specifically focusing on Fabry Disease and Familial Hypercholesterolaemia (FH). On a baseline of 4 patients, we identified 2 additional patients with potential diagnosis of Fabry disease, suggesting a potential 50% increase in diagnosis. Similarly, we identified > 12,000 individuals who fulfil the clinical and laboratory criteria for FH but had not been diagnosed previously. This proof-of-concept study showed that it is possible to perform mining on EHR data albeit with some challenges and limitations.
Collapse
Affiliation(s)
| | | | - Teck Wah Ting
- Genetics Service, Department of Paediatrics, KK Women's and Children's Hospital, 100 Bukit Timah Road, Singapore, 229899, Singapore
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
| | - Kaavya Narasimhalu
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
- Department of Neurology, National Neuroscience Institute (Singapore General Hospital), Singapore, Singapore
| | - Jenny Hsieh
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
- Department of Internal Medicine, Singapore General Hospital, Singapore, Singapore
| | - Sylvia Kam
- Genetics Service, Department of Paediatrics, KK Women's and Children's Hospital, 100 Bukit Timah Road, Singapore, 229899, Singapore
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
| | - Jiin Ying Lim
- Genetics Service, Department of Paediatrics, KK Women's and Children's Hospital, 100 Bukit Timah Road, Singapore, 229899, Singapore
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
| | - Weng Khong Lim
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore, Singapore
- Cancer & Stem Cell Biology Program, Duke-NUS Medical School, Singapore, Singapore
- Laboratory of Genome Variation Analytics, Genome Institute of Singapore, Singapore, Singapore
| | - Sonia Davila
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore, Singapore
| | - Yasmin Bylstra
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore, Singapore
| | - Iswaree Devi Balakrishnan
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
- National Heart Centre Singapore, Singapore, Singapore
| | - Mark Heng
- SingHealth Office of Insights and Analytics, Singapore, Singapore
| | - Elian Chia
- SingHealth Office of Insights and Analytics, Singapore, Singapore
| | | | - Bee Keow Goh
- Data Analytics Office, KK Women's and Children's Hospital, Singapore, Singapore
| | | | - Tele Tan
- Curtin University, Perth, Australia
| | - Gareth Baynam
- Rare Care Centre, Perth Children's Hospital, Perth, WA, Australia
- Western Australian Register of Developmental Anomalies, Perth, WA, Australia
| | - Saumya Shekhar Jamuar
- Genetics Service, Department of Paediatrics, KK Women's and Children's Hospital, 100 Bukit Timah Road, Singapore, 229899, Singapore.
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore.
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore, Singapore.
| |
Collapse
|
15
|
Cassini T, Bastarache L, Zeng C, Han ST, Wang J, He J, Denny JC. A test of automated use of electronic health records to aid in diagnosis of genetic disease. Genet Med 2023; 25:100966. [PMID: 37622442 PMCID: PMC10840718 DOI: 10.1016/j.gim.2023.100966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 08/16/2023] [Accepted: 08/17/2023] [Indexed: 08/26/2023] Open
Abstract
PURPOSE Automated use of electronic health records may aid in decreasing the diagnostic delay for rare diseases. The phenotype risk score (PheRS) is a weighted aggregate of syndromically related phenotypes that measures the similarity between an individual's conditions and features of a disease. For some diseases, there are individuals without a diagnosis of that disease who have scores similar to diagnosed patients. These individuals may have that disease but not yet be diagnosed. METHODS We calculated the PheRS for cystic fibrosis (CF) for 965,626 subjects in the Vanderbilt University Medical Center electronic health record. RESULTS Of the 400 subjects with the highest PheRS for CF, 248 (62%) had been diagnosed with CF. Twenty-six of the remaining participants, those who were alive and had DNA available in the linked DNA biobank, underwent clinical review and sequencing analysis of CFTR and SERPINA1. This uncovered a potential diagnosis for 2 subjects, 1 with CF and 1 with alpha-1-antitrypsin deficiency. An additional 7 subjects had pathogenic or likely pathogenic variants, 2 in CFTR and 5 in SERPINA1. CONCLUSION These findings may be clinically actionable for the providers caring for these patients. Importantly, this study highlights feasibility and challenges for future implications of this approach.
Collapse
Affiliation(s)
- Thomas Cassini
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD; Department of Pediatrics, Vanderbilt University Medical Center, Nashville TN.
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Chenjie Zeng
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Sangwoo T Han
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Janey Wang
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Jing He
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Joshua C Denny
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| |
Collapse
|
16
|
Chen F, Ahimaz P, Wang K, Chung WK, Ta C, Weng C, Liu C. Phenotype-Driven Molecular Genetic Test Recommendation for Diagnosing Pediatric Rare Disorders. RESEARCH SQUARE 2023:rs.3.rs-3593490. [PMID: 38045411 PMCID: PMC10690317 DOI: 10.21203/rs.3.rs-3593490/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Rare disease patients often endure prolonged diagnostic odysseys and may still remain undiagnosed for years. Selecting the appropriate genetic tests is crucial to lead to timely diagnosis. Phenotypic features offer great potential for aiding genomic diagnosis in rare disease cases. We see great promise in effective integration of phenotypic information into genetic test selection workflow. In this study, we present a phenotype-driven molecular genetic test recommendation (Phen2Test) for pediatric rare disease diagnosis. Phen2Test was constructed using frequency matrix of phecodes and demographic data from the EHR before ordering genetic tests, with the objective to streamline the selection of molecular genetic tests (whole-exome / whole-genome sequencing, or gene panels) for clinicians with minimum genetic training expertise. We developed and evaluated binary classifiers based on 1,005 individuals referred to genetic counselors for potential genetic evaluation. In the evaluation using the gold standard cohort, the model achieved strong performance with an AUROC of 0.82 and an AUPRC of 0.92. Furthermore, we tested the model on another silver standard cohort (n=6,458), achieving an overall AUROC of 0.72 and an AUPRC of 0.671. Phen2Test was adjusted to align with current clinical guidelines, showing superior performance with more recent data, demonstrating its potential for use within a learning healthcare system as a genomic medicine intervention that adapts to guideline updates. This study showcases the practical utility of phenotypic features in recommending molecular genetic tests with performance comparable to clinical geneticists. Phen2Test could assist clinicians with limited genetic training and knowledge to order appropriate genetic tests.
Collapse
Affiliation(s)
- Fangyi Chen
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Priyanka Ahimaz
- Department of Pediatrics, Columbia University, New York, NY, USA
- Institute of Genomic Medicine, Columbia University, New York, NY, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Wendy K. Chung
- Department of Pediatrics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Casey Ta
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
17
|
Tinker RJ, Peterson J, Bastarache L. Phenotypic presentation of Mendelian disease across the diagnostic trajectory in electronic health records. Genet Med 2023; 25:100921. [PMID: 37337966 PMCID: PMC11092403 DOI: 10.1016/j.gim.2023.100921] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 06/12/2023] [Accepted: 06/13/2023] [Indexed: 06/21/2023] Open
Abstract
PURPOSE To investigate the phenotypic presentation of Mendelian disease across the diagnostic trajectory in the electronic health record (EHR). METHODS We applied a conceptual model to delineate the diagnostic trajectory of Mendelian disease to the EHRs of patients affected by 1 of 9 Mendelian diseases. We assessed data availability and phenotype ascertainment across the diagnostic trajectory using phenotype risk scores and validated our findings via chart review of patients with hereditary connective tissue disorders. RESULTS We identified 896 individuals with genetically confirmed diagnoses, 216 (24%) of whom had fully ascertained diagnostic trajectories. Phenotype risk scores increased following clinical suspicion and diagnosis (P < 1 × 10-4, Wilcoxon rank sum test). We found that of all International Classification of Disease-based phenotypes in the EHR, 66% were recorded after clinical suspicion, and manual chart review yielded consistent results. CONCLUSION Using a novel conceptual model to study the diagnostic trajectory of genetic disease in the EHR, we demonstrated that phenotype ascertainment is, in large part, driven by the clinical examinations and studies prompted by clinical suspicion of a genetic disease, a process we term diagnostic convergence. Algorithms designed to detect undiagnosed genetic disease should consider censoring EHR data at the first date of clinical suspicion to avoid data leakage.
Collapse
Affiliation(s)
- Rory J Tinker
- Division of Medical Genetics and Genomic Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Josh Peterson
- Vanderbilt University Medical Center, Department of Medicine, Nashville, TN; Vanderbilt University Medical Center, Department of Biomedical Informatics, Nashville, TN
| | - Lisa Bastarache
- Vanderbilt University Medical Center, Department of Biomedical Informatics, Nashville, TN.
| |
Collapse
|
18
|
Barnado A, Wheless L, Camai A, Green S, Han B, Katta A, Denny JC, Sawalha AH. Phenotype Risk Score but Not Genetic Risk Score Aids in Identifying Individuals With Systemic Lupus Erythematosus in the Electronic Health Record. Arthritis Rheumatol 2023; 75:1532-1541. [PMID: 37096581 PMCID: PMC10501317 DOI: 10.1002/art.42544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 03/23/2023] [Accepted: 04/17/2023] [Indexed: 04/26/2023]
Abstract
OBJECTIVE Systemic lupus erythematosus (SLE) poses diagnostic challenges. We undertook this study to evaluate the utility of a phenotype risk score (PheRS) and a genetic risk score (GRS) to identify SLE individuals in a real-world setting. METHODS Using a de-identified electronic health record (EHR) database with an associated DNA biobank, we identified 789 SLE cases and 2,261 controls with available MEGAEX genotyping. A PheRS for SLE was developed using billing codes that captured American College of Rheumatology SLE criteria. We developed a GRS with 58 SLE risk single-nucleotide polymorphisms (SNPs). RESULTS SLE cases had a significantly higher PheRS (mean ± SD 7.7 ± 8.0 versus 0.8 ± 2.0 in controls; P < 0.001) and GRS (mean ± SD 12.2 ± 2.3 versus 11.0 ± 2.0 in controls; P < 0.001). Black individuals with SLE had a higher PheRS compared to White individuals (mean ± SD 10.0 ± 10.1 versus 7.1 ± 7.2, respectively; P = 0.002) but a lower GRS (mean ± SD 9.0 ± 1.4 versus 12.3 ± 1.7, respectively; P < 0.001). Models predicting SLE that used only the PheRS had an area under the curve (AUC) of 0.87. Adding the GRS to the PheRS resulted in a minimal difference with an AUC of 0.89. On chart review, controls with the highest PheRS and GRS had undiagnosed SLE. CONCLUSION We developed a SLE PheRS to identify established and undiagnosed SLE individuals. A SLE GRS using known risk SNPs did not add value beyond the PheRS and was of limited utility in Black individuals with SLE. More work is needed to understand the genetic risks of SLE in diverse populations.
Collapse
Affiliation(s)
- April Barnado
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Lee Wheless
- Department of Dermatology, Division of Epidemiology, Vanderbilt University Medical Center, Nashville, TN
| | - Alex Camai
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Sarah Green
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Bryan Han
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Anish Katta
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Joshua C. Denny
- All of Us Research Program, National Institutes of Health, Bethesda, MD
| | - Amr H. Sawalha
- Departments of Pediatrics, Medicine, and Immunology & Lupus Center of Excellence, University of Pittsburgh School of Medicine, Pittsburgh, PA
| |
Collapse
|
19
|
Schuler BA, Bastarache L, Wang J, He J, Van Driest SL, Denny JC. Population genetic testing and SERPINA1 sequencing identifies unidentified alpha-1 antitrypsin deficiency alleles and gene-environment interaction with hepatitis C infection. PLoS One 2023; 18:e0286469. [PMID: 37651384 PMCID: PMC10470904 DOI: 10.1371/journal.pone.0286469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 05/16/2023] [Indexed: 09/02/2023] Open
Abstract
Alpha-1 antitrypsin deficiency (AATD), a relatively common autosomal recessive genetic disorder, is underdiagnosed in symptomatic individuals. We sought to compare the risk of liver transplantation associated with hepatitis C infection with AATD heterozygotes and homozygotes and determine if SERPINA1 sequencing would identify undiagnosed AATD. We performed a retrospective cohort study in a deidentified Electronic Health Record (EHR)-linked DNA biobank with 72,027 individuals genotyped for the M, Z, and S alleles in SERPINA1. We investigated liver transplantation frequency by genotype group and compared with hepatitis C infection. We performed SERPINA1 sequencing in carriers of pathogenic AATD alleles who underwent liver transplantation. Liver transplantation was associated with the Z allele (ZZ: odds ratio [OR] = 1.31, p<2e-16; MZ: OR = 1.02, p = 1.2e-13) and with hepatitis C (OR = 1.20, p<2e-16). For liver transplantation, there was a significant interaction between genotype and hepatitis C (ZZ: interaction OR = 1.23, p = 4.7e-4; MZ: interaction OR = 1.11, p = 6.9e-13). Sequencing uncovered a second, rare, pathogenic SERPINA1 variant in six of 133 individuals with liver transplants and without hepatitis C. Liver transplantation was more common in individuals with AATD risk alleles (including heterozygotes), and AATD and hepatitis C demonstrated evidence of a gene-environment interaction in relation to liver transplantation. The current AATD screening strategy may miss diagnoses whereas SERPINA1 sequencing may increase diagnostic yield for AATD, stratify risk for liver disease, and inform clinical management for individuals with AATD risk alleles and liver disease risk factors.
Collapse
Affiliation(s)
- Bryce A. Schuler
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Janey Wang
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Jing He
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Sara L. Van Driest
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Joshua C. Denny
- All of Us Research Program, National Institutes of Health, Bethesda, Maryland, United States of America
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
20
|
Sharo AG, Zou Y, Adhikari AN, Brenner SE. ClinVar and HGMD genomic variant classification accuracy has improved over time, as measured by implied disease burden. Genome Med 2023; 15:51. [PMID: 37443081 PMCID: PMC10347827 DOI: 10.1186/s13073-023-01199-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Accepted: 05/31/2023] [Indexed: 07/15/2023] Open
Abstract
BACKGROUND Curated databases of genetic variants assist clinicians and researchers in interpreting genetic variation. Yet, these databases contain some misclassified variants. It is unclear whether variant misclassification is abating as these databases rapidly grow and implement new guidelines. METHODS Using archives of ClinVar and HGMD, we investigated how variant misclassification has changed over 6 years, across different ancestry groups. We considered inborn errors of metabolism (IEMs) screened in newborns as a model system because these disorders are often highly penetrant with neonatal phenotypes. We used samples from the 1000 Genomes Project (1KGP) to identify individuals with genotypes that were classified by the databases as pathogenic. Due to the rarity of IEMs, nearly all such classified pathogenic genotypes indicate likely variant misclassification in ClinVar or HGMD. RESULTS While the false-positive rates of both ClinVar and HGMD have improved over time, HGMD variants currently imply two orders of magnitude more affected individuals in 1KGP than ClinVar variants. We observed that African ancestry individuals have a significantly increased chance of being incorrectly indicated to be affected by a screened IEM when HGMD variants are used. However, this bias affecting genomes of African ancestry was no longer significant once common variants were removed in accordance with recent variant classification guidelines. We discovered that ClinVar variants classified as Pathogenic or Likely Pathogenic are reclassified sixfold more often than DM or DM? variants in HGMD, which has likely resulted in ClinVar's lower false-positive rate. CONCLUSIONS Considering misclassified variants that have since been reclassified reveals our increasing understanding of rare genetic variation. We found that variant classification guidelines and allele frequency databases comprising genetically diverse samples are important factors in reclassification. We also discovered that ClinVar variants common in European and South Asian individuals were more likely to be reclassified to a lower confidence category, perhaps due to an increased chance of these variants being classified by multiple submitters. We discuss features for variant classification databases that would support their continued improvement.
Collapse
Affiliation(s)
- Andrew G. Sharo
- Biophysics Graduate Group, University of California, Berkeley, CA 94720 USA
- Center for Computational Biology, University of California, Berkeley, CA 94720 USA
- Department of Ecology and Evolutionary Biology, University of California, 124 Biomed Building, 1156 High St., Santa Cruz, CA 95064 USA
| | - Yangyun Zou
- Center for Computational Biology, University of California, Berkeley, CA 94720 USA
- Department of Plant and Microbial Biology, University of California, 461 Koshland Hall, Berkeley, CA 94720 USA
- Currently at: Department of Clinical Research, Yikon Genomics Company, Ltd., Shanghai, China
| | - Aashish N. Adhikari
- Center for Computational Biology, University of California, Berkeley, CA 94720 USA
- Department of Plant and Microbial Biology, University of California, 461 Koshland Hall, Berkeley, CA 94720 USA
- Currently at: Illumina, Foster City, CA 94404 USA
| | - Steven E. Brenner
- Biophysics Graduate Group, University of California, Berkeley, CA 94720 USA
- Center for Computational Biology, University of California, Berkeley, CA 94720 USA
- Department of Plant and Microbial Biology, University of California, 461 Koshland Hall, Berkeley, CA 94720 USA
| |
Collapse
|
21
|
Henry OJ, Stödberg T, Båtelson S, Rasi C, Stranneheim H, Wedell A. Individualised human phenotype ontology gene panels improve clinical whole exome and genome sequencing analytical efficacy in a cohort of developmental and epileptic encephalopathies. Mol Genet Genomic Med 2023; 11:e2167. [PMID: 36967109 PMCID: PMC10337286 DOI: 10.1002/mgg3.2167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 02/21/2023] [Accepted: 03/01/2023] [Indexed: 07/20/2023] Open
Abstract
BACKGROUND The majority of genetic epilepsies remain unsolved in terms of specific genotype. Phenotype-based genomic analyses have shown potential to strengthen genomic analysis in various ways, including improving analytical efficacy. METHODS We have tested a standardised phenotyping method termed 'Phenomodels' for integrating deep-phenotyping information with our in-house developed clinical whole exome/genome sequencing analytical pipeline. Phenomodels includes a user-friendly epilepsy phenotyping template and an objective measure for selecting which template terms to include in individualised Human Phenotype Ontology (HPO) gene panels. In a pilot study of 38 previously solved cases of developmental and epileptic encephalopathies, we compared the sensitivity and specificity of the individualised HPO gene panels with the clinical epilepsy gene panel. RESULTS The Phenomodels template showed high sensitivity for capturing relevant phenotypic information, where 37/38 individuals' HPO gene panels included the causative gene. The HPO gene panels also had far fewer variants to assess than the epilepsy gene panel. CONCLUSION We have demonstrated a viable approach for incorporating standardised phenotype information into clinical genomic analyses, which may enable more efficient analysis.
Collapse
Affiliation(s)
- Olivia J. Henry
- Department of Molecular Medicine and SurgeryKarolinska InstitutetStockholmSweden
| | - Tommy Stödberg
- Department of Women's and Children's HealthKarolinska InstitutetStockholmSweden
- Department of Pediatric NeurologyKarolinska University HospitalStockholmSweden
| | - Sofia Båtelson
- Department of Pediatric NeurologyKarolinska University HospitalStockholmSweden
| | - Chiara Rasi
- Science for Life Laboratory, Department of Microbiology, Tumour and Cell BiologyKarolinska InstitutetStockholmSweden
| | - Henrik Stranneheim
- Department of Molecular Medicine and SurgeryKarolinska InstitutetStockholmSweden
- Science for Life Laboratory, Department of Microbiology, Tumour and Cell BiologyKarolinska InstitutetStockholmSweden
- Centre for Inherited Metabolic DiseasesKarolinska University HospitalStockholmSweden
| | - Anna Wedell
- Department of Molecular Medicine and SurgeryKarolinska InstitutetStockholmSweden
- Centre for Inherited Metabolic DiseasesKarolinska University HospitalStockholmSweden
| |
Collapse
|
22
|
Callahan TJ, Stefanski AL, Wyrwa JM, Zeng C, Ostropolets A, Banda JM, Baumgartner WA, Boyce RD, Casiraghi E, Coleman BD, Collins JH, Deakyne Davies SJ, Feinstein JA, Lin AY, Martin B, Matentzoglu NA, Meeker D, Reese J, Sinclair J, Taneja SB, Trinkley KE, Vasilevsky NA, Williams AE, Zhang XA, Denny JC, Ryan PB, Hripcsak G, Bennett TD, Haendel MA, Robinson PN, Hunter LE, Kahn MG. Ontologizing health systems data at scale: making translational discovery a reality. NPJ Digit Med 2023; 6:89. [PMID: 37208468 PMCID: PMC10196319 DOI: 10.1038/s41746-023-00830-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 04/28/2023] [Indexed: 05/21/2023] Open
Abstract
Common data models solve many challenges of standardizing electronic health record (EHR) data but are unable to semantically integrate all of the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, mapping EHR data to OBO ontologies requires significant manual curation and domain expertise. We introduce OMOP2OBO, an algorithm for mapping Observational Medical Outcomes Partnership (OMOP) vocabularies to OBO ontologies. Using OMOP2OBO, we produced mappings for 92,367 conditions, 8611 drug ingredients, and 10,673 measurement results, which covered 68-99% of concepts used in clinical practice when examined across 24 hospitals. When used to phenotype rare disease patients, the mappings helped systematically identify undiagnosed patients who might benefit from genetic testing. By aligning OMOP vocabularies to OBO ontologies our algorithm presents new opportunities to advance EHR-based deep phenotyping.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA.
| | - Adrianne L Stefanski
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Jordan M Wyrwa
- Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Chenjie Zeng
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Anna Ostropolets
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Juan M Banda
- Department of Computer Science, Georgia State University, Atlanta, GA, 30303, USA
| | - William A Baumgartner
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15260, USA
| | - Elena Casiraghi
- Computer Science, Università degli Studi di Milano, Milan, Italy
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Ben D Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Janine H Collins
- Department of Haematology, University of Cambridge, Cambridge, UK
| | - Sara J Deakyne Davies
- Department of Research Informatics & Data Science, Analytics Resource Center, Children's Hospital Colorado, Aurora, CO, 80045, USA
| | - James A Feinstein
- Adult and Child Center for Health Outcomes Research and Delivery Science (ACCORDS), University of Colorado Anschutz School of Medicine, Aurora, CO, 80045, USA
| | - Asiyah Y Lin
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Blake Martin
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | | | | | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Katy E Trinkley
- Department of Family Medicine, University of Colorado Anschutz School of Medicine, Aurora, CO, 80045, USA
| | - Nicole A Vasilevsky
- Translational and Integrative Sciences Lab, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Andrew E Williams
- Tufts Institute for Clinical Research and Health Policy Studies, Tufts University, Boston, MA, 02155, USA
| | - Xingmin A Zhang
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Joshua C Denny
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Patrick B Ryan
- Janssen Research and Development, Raritan, NJ, 08869, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Tellen D Bennett
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Melissa A Haendel
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Michael G Kahn
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| |
Collapse
|
23
|
Xiao T, Dong X, Lu Y, Zhou W. High-Resolution and Multidimensional Phenotypes Can Complement Genomics Data to Diagnose Diseases in the Neonatal Population. PHENOMICS (CHAM, SWITZERLAND) 2023; 3:204-215. [PMID: 37197647 PMCID: PMC10110825 DOI: 10.1007/s43657-022-00071-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 08/03/2022] [Accepted: 08/08/2022] [Indexed: 05/19/2023]
Abstract
Advances in genomic medicine have greatly improved our understanding of human diseases. However, phenome is not well understood. High-resolution and multidimensional phenotypes have shed light on the mechanisms underlying neonatal diseases in greater details and have the potential to optimize clinical strategies. In this review, we first highlight the value of analyzing traditional phenotypes using a data science approach in the neonatal population. We then discuss recent research on high-resolution, multidimensional, and structured phenotypes in neonatal critical diseases. Finally, we briefly introduce current technologies available for the analysis of multidimensional data and the value that can be provided by integrating these data into clinical practice. In summary, a time series of multidimensional phenome can improve our understanding of disease mechanisms and diagnostic decision-making, stratify patients, and provide clinicians with optimized strategies for therapeutic intervention; however, the available technologies for collecting multidimensional data and the best platform for connecting multiple modalities should be considered.
Collapse
Affiliation(s)
- Tiantian Xiao
- Division of Neonatology, Children’s Hospital of Fudan University, National Children’s Medical Center, 399 Wanyuan Road, Shanghai, 201102 China
- Department of Neonatology, Chengdu Women’s and Children’s Central Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu, 610000 China
| | - Xinran Dong
- Center for Molecular Medicine, Pediatric Research Institute, Children’s Hospital of Fudan University, National Children’s Medical Center, Shanghai, 201102 China
| | - Yulan Lu
- Center for Molecular Medicine, Pediatric Research Institute, Children’s Hospital of Fudan University, National Children’s Medical Center, Shanghai, 201102 China
| | - Wenhao Zhou
- Division of Neonatology, Children’s Hospital of Fudan University, National Children’s Medical Center, 399 Wanyuan Road, Shanghai, 201102 China
- Center for Molecular Medicine, Pediatric Research Institute, Children’s Hospital of Fudan University, National Children’s Medical Center, Shanghai, 201102 China
| |
Collapse
|
24
|
Solomon BD, Adam MP, Fong CT, Girisha KM, Hall JG, Hurst AC, Krawitz PM, Moosa S, Phadke SR, Tekendo-Ngongang C, Wenger TL. Perspectives on the future of dysmorphology. Am J Med Genet A 2023; 191:659-671. [PMID: 36484420 PMCID: PMC9928773 DOI: 10.1002/ajmg.a.63060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 08/30/2022] [Accepted: 11/12/2022] [Indexed: 12/13/2022]
Abstract
The field of clinical genetics and genomics continues to evolve. In the past few decades, milestones like the initial sequencing of the human genome, dramatic changes in sequencing technologies, and the introduction of artificial intelligence, have upended the field and offered fascinating new insights. Though difficult to predict the precise paths the field will follow, rapid change may continue to be inevitable. Within genetics, the practice of dysmorphology, as defined by pioneering geneticist David W. Smith in the 1960s as "the study of, or general subject of abnormal development of tissue form" has also been affected by technological advances as well as more general trends in biomedicine. To address possibilities, potential, and perils regarding the future of dysmorphology, a group of clinical geneticists, representing different career stages, areas of focus, and geographic regions, have contributed to this piece by providing insights about how the practice of dysmorphology will develop over the next several decades.
Collapse
Affiliation(s)
- Benjamin D. Solomon
- Medical Genetics Branch, National Human Genome Research Institute, Bethesda, Maryland, United States of America
| | - Margaret P. Adam
- Department of Pediatrics, University of Washington, Seattle, Washington, United States of America
| | - Chin-To Fong
- Department of Genetics, University of Rochester, Rochester, New York, United States of America
| | - Katta M. Girisha
- Department of Medical Genetics, Kasturba Medical College, Manipal, Manipal Academy of Higher Education, Manipal, India
| | - Judith G. Hall
- University of British Columbia and Children’s and Women’s Health Centre of British Columbia, Canada
- Department of Pediatrics and Medical Genetics, British Columbia Children’s Hospital, Vancouver, British Columbia, Canada
| | - Anna C.E. Hurst
- Department of Genetics, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Peter M. Krawitz
- Institute for Genomic Statistics and Bioinformatics, University of Bonn, Bonn, Germany
| | - Shahida Moosa
- Division of Molecular Biology and Human Genetics, Stellenbosch University
- Medical Genetics, Tygerberg Hospital, Tygerberg, South Africa
| | - Shubha R. Phadke
- Department of Medical Genetics, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, India
| | - Cedrik Tekendo-Ngongang
- Medical Genetics Branch, National Human Genome Research Institute, Bethesda, Maryland, United States of America
| | - Tara L. Wenger
- Division of Genetic Medicine, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
25
|
Fu M, Yan Y, Olde Loohuis LM, Chang TS. Defining the distance between diseases using SNOMED CT embeddings. J Biomed Inform 2023; 139:104307. [PMID: 36738869 DOI: 10.1016/j.jbi.2023.104307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 12/10/2022] [Accepted: 01/29/2023] [Indexed: 02/05/2023]
Abstract
Characterizing disease relationships is essential to biomedical research to understand disease etiology and improve clinical decision-making. Measurements of distance between disease pairs enable valuable research tasks, such as subgrouping patients and identifying common time courses of disease onset. Distance metrics developed in prior work focused on smaller, targeted disease sets. Distance metrics covering all diseases have not yet been defined, which limits the applications to a broader disease spectrum. Our current study defines disease distances for all disease pairs within the International Classification of Diseases, version 10 (ICD-10), the diagnostic classification system universally used in electronic health records. Our proposed distance is computed based on a biomedical ontology, SNOMED CT (Systemized Nomenclature of Medicine, Clinical Terms), which can also be viewed as a structured knowledge graph. We compared the knowledge graph-based metric to three other distance metrics based on the hierarchical structure of ICD, clinical comorbidity, and genetic correlation, to evaluate how each may capture similar or unique aspects of disease relationships. We show that our knowledge graph-based distance metric captures known phenotypic, clinical, and molecular characteristics at a finer granularity than the other three. With the continued growth of using electronic health records data for research, we believe that our distance metric will play an important role in subgrouping patients for precision health, and enabling individualized disease prevention and treatments.
Collapse
Affiliation(s)
- Mingzhou Fu
- Movement Disorders Program, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, USA; Medical Informatics Home Area, Department of Bioinformatics, University of California, Los Angeles, CA, USA
| | - Yu Yan
- Medical Informatics Home Area, Department of Bioinformatics, University of California, Los Angeles, CA, USA
| | - Loes M Olde Loohuis
- Center for Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
| | - Timothy S Chang
- Movement Disorders Program, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, USA.
| |
Collapse
|
26
|
Miller-Fleming TW, Allos A, Gantz E, Yu D, Isaacs DA, Mathews CA, Scharf JM, Davis LK. Developing a Phenotype Risk Score for Tic Disorders in a Large, Clinical Biobank. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.21.23286253. [PMID: 36865201 PMCID: PMC9980249 DOI: 10.1101/2023.02.21.23286253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Importance Tics are a common feature of early-onset neurodevelopmental disorders, characterized by involuntary and repetitive movements or sounds. Despite affecting up to 2% of young children and having a genetic contribution, the underlying causes remain poorly understood, likely due to the complex phenotypic and genetic heterogeneity among affected individuals. Objective In this study, we leverage dense phenotype information from electronic health records to identify the disease features associated with tic disorders within the context of a clinical biobank. These disease features are then used to generate a phenotype risk score for tic disorder. Design Using de-identified electronic health records from a tertiary care center, we extracted individuals with tic disorder diagnosis codes. We performed a phenome-wide association study to identify the features enriched in tic cases versus controls (N=1,406 and 7,030; respectively). These disease features were then used to generate a phenotype risk score for tic disorder, which was applied across an independent set of 90,051 individuals. A previously curated set of tic disorder cases from an electronic health record algorithm followed by clinician chart review was used to validate the tic disorder phenotype risk score. Main Outcomes and Measures Phenotypic patterns associated with a tic disorder diagnosis in the electronic health record. Results Our tic disorder phenome-wide association study revealed 69 significantly associated phenotypes, predominantly neuropsychiatric conditions, including obsessive compulsive disorder, attention-deficit hyperactivity disorder, autism, and anxiety. The phenotype risk score constructed from these 69 phenotypes in an independent population was significantly higher among clinician-validated tic cases versus non-cases. Conclusions and Relevance Our findings provide support for the use of large-scale medical databases to better understand phenotypically complex diseases, such as tic disorders. The tic disorder phenotype risk score provides a quantitative measure of disease risk that can be leveraged for the assignment of individuals in case-control studies or for additional downstream analyses.
Collapse
Affiliation(s)
- Tyne W. Miller-Fleming
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, TN, USA
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Annmarie Allos
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, TN, USA
- Department of Cognitive Science, Dartmouth College, Hanover, NH, USA
| | - Emily Gantz
- Department of Pediatric Neurology, Children’s Hospital of Alabama, Birmingham, AL, USA
- Department of Neurology, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pediatrics, Monroe Carell Jr. Children’s Hospital at Vanderbilt, Nashville, TN, USA
| | - Dongmei Yu
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - David A. Isaacs
- Department of Neurology, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pediatrics, Monroe Carell Jr. Children’s Hospital at Vanderbilt, Nashville, TN, USA
| | - Carol A. Mathews
- Department of Psychiatry, Genetics Institute, Center for OCD, Anxiety and Related Disorders, University of Florida, Gainesville, FL, USA
| | - Jeremiah M. Scharf
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lea K. Davis
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, TN, USA
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, TN, USA
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, USA
| |
Collapse
|
27
|
Zamariolli M, Auwerx C, Sadler MC, van der Graaf A, Lepik K, Schoeler T, Moysés-Oliveira M, Dantas AG, Melaragno MI, Kutalik Z. The impact of 22q11.2 copy-number variants on human traits in the general population. Am J Hum Genet 2023; 110:300-313. [PMID: 36706759 PMCID: PMC9943723 DOI: 10.1016/j.ajhg.2023.01.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 01/03/2023] [Indexed: 01/27/2023] Open
Abstract
While extensively studied in clinical cohorts, the phenotypic consequences of 22q11.2 copy-number variants (CNVs) in the general population remain understudied. To address this gap, we performed a phenome-wide association scan in 405,324 unrelated UK Biobank (UKBB) participants by using CNV calls from genotyping array. We mapped 236 Human Phenotype Ontology terms linked to any of the 90 genes encompassed by the region to 170 UKBB traits and assessed the association between these traits and the copy-number state of 504 genotyping array probes in the region. We found significant associations for eight continuous and nine binary traits associated under different models (duplication-only, deletion-only, U-shape, and mirror models). The causal effect of the expression level of 22q11.2 genes on associated traits was assessed through transcriptome-wide Mendelian randomization (TWMR), revealing that increased expression of ARVCF increased BMI. Similarly, increased DGCR6 expression causally reduced mean platelet volume, in line with the corresponding CNV effect. Furthermore, cross-trait multivariable Mendelian randomization (MVMR) suggested a predominant role of genuine (horizontal) pleiotropy in the CNV region. Our findings show that within the general population, 22q11.2 CNVs are associated with traits previously linked to genes in the region, and duplications and deletions act upon traits in different fashions. We also showed that gain or loss of distinct segments within 22q11.2 may impact a trait under different association models. Our results have provided new insights to help further the understanding of the complex 22q11.2 region.
Collapse
Affiliation(s)
- Malú Zamariolli
- Genetics Division, Universidade Federal de São Paulo, São Paulo, Brazil; Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Chiara Auwerx
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland; University Center for Primary Care and Public Health, University of Lausanne, Lausanne, Switzerland; Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Marie C Sadler
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland; University Center for Primary Care and Public Health, University of Lausanne, Lausanne, Switzerland
| | | | - Kaido Lepik
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Tabea Schoeler
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland; Department of Clinical, Educational and Health Psychology, University College London, London, UK
| | | | - Anelisa G Dantas
- Genetics Division, Universidade Federal de São Paulo, São Paulo, Brazil
| | | | - Zoltán Kutalik
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland; University Center for Primary Care and Public Health, University of Lausanne, Lausanne, Switzerland.
| |
Collapse
|
28
|
Abstract
Hundreds of different genetic causes of chronic kidney disease are now recognized, and while individually rare, taken together they are significant contributors to both adult and pediatric diseases. Traditional genetics approaches relied heavily on the identification of large families with multiple affected members and have been fundamental to the identification of genetic kidney diseases. With the increased utilization of massively parallel sequencing and improvements to genotype imputation, we can analyze rare variants in large cohorts of unrelated individuals, leading to personalized care for patients and significant research advancements. This review evaluates the contribution of rare disorders to patient care and the study of genetic kidney diseases and highlights key advancements that utilize new techniques to improve our ability to identify new gene-disease associations.
Collapse
Affiliation(s)
- Mark D Elliott
- Division of Nephrology, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA;
- Center for Precision Medicine and Genomics, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
- Institute for Genomic Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
| | - Hila Milo Rasouly
- Division of Nephrology, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA;
- Center for Precision Medicine and Genomics, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
| | - Ali G Gharavi
- Division of Nephrology, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA;
- Center for Precision Medicine and Genomics, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
- Institute for Genomic Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
| |
Collapse
|
29
|
Tinker RJ, Peterson J, Bastarache L. Phenotypic convergence: a novel phenomenon in the diagnostic process of Mendelian genetic disorders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.17.23284691. [PMID: 36711865 PMCID: PMC9882467 DOI: 10.1101/2023.01.17.23284691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Introduction The study of Mendelian disease has yielded a large body of knowledge about the phenotypic presentation of disease. Less is known about the way the diseases are reflected in the electronic health record (EHR). Aim To develop an EHR-based model of the diagnostic trajectory and investigate data availability and the longitudinal distribution of signs and symptoms of a Mendelian disorder within EHRs. Methods We created a conceptual model to specify key time points of the diagnostic trajectory and applied it to individuals with genetically confirmed hereditary connective tissue diseases (HCTD). Using the model, we assessed EHR data availability within each time interval. We tested the performance of phenotype risk scores (PheRS), an algorithm that detects Mendelian disease patterns and assessed the phenotypic expression of HCTD over the diagnostic trajectory. Results We identified 251 individuals with HCTD; 79 (35%) of these patients had a fully ascertained diagnostic trajectory. There were few documented signs and symptoms prior to clinical suspicion that evoked an HCTD disorder (median PheRS 0.14); once suspicion was documented, median PheRS increased to 1.87 (SD). The majority (72%) of phenotypic features were identified post clinical suspicion. Discussion Using a novel conceptual model for the diagnostic trajectory of Mendelian disease, we demonstrated that phenotype ascertainment is, in part, driven by the diagnostic process and that many findings are only documented following clinical suspicion and diagnosis, a process we term phenotypic convergence. Therefore, algorithms that aim to detect undiagnosed Mendelian disease should censor EHR data to avoid data leakage.
Collapse
|
30
|
Schiabor Barrett KM, Cirulli ET, Bolze A, Rowan C, Elhanan G, Grzymski JJ, Lee W, Washington NL. Cardiomyopathy prevalence exceeds 30% in individuals with TTN variants and early atrial fibrillation. Genet Med 2023; 25:100012. [PMID: 36637017 DOI: 10.1016/j.gim.2023.100012] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 01/04/2023] [Accepted: 01/05/2023] [Indexed: 01/11/2023] Open
Abstract
PURPOSE TTN truncating variants (TTNtvs) represent the largest known genetic cause of dilated cardiomyopathies (DCMs), however their penetrance for DCM in general populations is low. More broadly, patients with cardiomyopathies (CMs) often exhibit other cardiac conditions, such as atrial fibrillation (Afib), which has also been linked to TTNtvs. This retrospective analysis aims to characterize the relationship between different cardiac conditions in those with TTNtvs and identify individuals with the highest risk of DCM. METHODS In this work we leverage longitudinal electronic health record and exome sequencing data from approximately 450,000 individuals in 2 health systems to statistically confirm and pinpoint the genetic footprint of TTNtv-related diagnoses aside from CM, such as Afib, and determine whether vetting additional significantly associated phenotypes better stratifies CM risk across those with TTNtvs. We focused on TTNtvs in exons with a percentage spliced in >90% (hiPSI TTNtvs), a representation of constitutive cardiac expression. RESULTS When controlling for CM and Afib, other cardiac conditions retained only nominal association with TTNtvs. A sliding window analysis of TTNtvs across the locus confirms that the association is specific to hiPSI exons for both CM and Afib, with no meaningful associations in percent spliced in ≤90% exons (loPSI TTNtvs). The combination of hiPSI TTNtv status and early Afib diagnosis (before age 60) found a subset of TTNtv individuals at high risk for CM. The prevalence of CM in this subset was 33%, a rate that was 3.5 fold higher than that in individuals with hiPSI TTNtvs (9% prevalence), 5-fold higher than that in individuals without TTNtvs with early Afib (6% prevalence), and 80-fold higher than that in the general population. CONCLUSION Our retrospective analyses revealed that those with hiPSI TTNtvs and early Afib (∼1/2900) have a high prevalence of CM (33%), far exceeding that in other individuals with TTNtvs and in those without TTNtvs with an early Afib diagnosis. These results show that combining phenotypic information along with genomic population screening can identify patients at higher risk for progressing to symptomatic heart failure.
Collapse
Affiliation(s)
| | | | | | - Chris Rowan
- Renown Health, Reno, NV; University of Nevada, School of Medicine, Reno, NV
| | - Gai Elhanan
- Renown Health, Reno, NV; Center for Genomic Medicine, Desert Research Institute, Reno, NV
| | - Joseph J Grzymski
- Renown Health, Reno, NV; Center for Genomic Medicine, Desert Research Institute, Reno, NV
| | | | | |
Collapse
|
31
|
Huckins LM. Thoughtful Phenotype Definitions Empower Participants and Power Studies. Complex Psychiatry 2023; 8:57-62. [PMID: 37032718 PMCID: PMC10080191 DOI: 10.1159/000527022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 09/05/2022] [Indexed: 11/19/2022] Open
Affiliation(s)
- Laura M. Huckins
- Department of Psychiatry, Yale University, New Haven, Connecticut, USA
| |
Collapse
|
32
|
O’Neill MJ, Wada Y, Hall LD, Mitchell DW, Glazer AM, Roden DM. Functional Assays Reclassify Suspected Splice-Altering Variants of Uncertain Significance in Mendelian Channelopathies. CIRCULATION. GENOMIC AND PRECISION MEDICINE 2022; 15:e003782. [PMID: 36197721 PMCID: PMC9772980 DOI: 10.1161/circgen.122.003782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 07/12/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND Rare protein-altering variants in SCN5A, KCNQ1, and KCNH2 are major causes of Brugada syndrome and the congenital long QT syndrome. While splice-altering variants lying outside 2-bp canonical splice sites can cause these diseases, their role remains poorly described. We implemented 2 functional assays to assess 12 recently reported putative splice-altering variants of uncertain significance and 1 likely pathogenic variant without functional data observed in Brugada syndrome and long QT syndrome probands. METHODS We deployed minigene assays to assess the splicing consequences of 10 variants. Three variants incompatible with the minigene approach were introduced into control induced pluripotent stem cells by CRISPR genome editing. We differentiated cells into induced pluripotent stem cell-derived cardiomyocytes and studied splicing outcomes by reverse transcription-polymerase chain reaction. We used the American College of Medical Genetics and Genomics functional assay criteria (PS3/BS3) to reclassify variants. RESULTS We identified aberrant splicing, with presumed disruption of protein sequence, in 8/10 variants studied using the minigene assay and 1/3 studied in induced pluripotent stem cell-derived cardiomyocytes. We reclassified 8 variants of uncertain significance to likely pathogenic, 1 variant of uncertain significance to likely benign, and 1 likely pathogenic variant to pathogenic. CONCLUSIONS Functional assays reclassified splice-altering variants outside canonical splice sites in Brugada Syndrome- and long QT syndrome-associated genes.
Collapse
Affiliation(s)
- Matthew J. O’Neill
- Vanderbilt University School of Medicine, Medical Scientist
Training Program, Vanderbilt University
| | - Yuko Wada
- Vanderbilt Center for Arrhythmia Research and Therapeutics
(VanCART), Division of Clinical Pharmacology, Department of Medicine
| | - Lynn D. Hall
- Vanderbilt Center for Arrhythmia Research and Therapeutics
(VanCART), Division of Clinical Pharmacology, Department of Medicine
| | - Devyn W. Mitchell
- Vanderbilt Center for Arrhythmia Research and Therapeutics
(VanCART), Division of Clinical Pharmacology, Department of Medicine
| | - Andrew M. Glazer
- Vanderbilt Center for Arrhythmia Research and Therapeutics
(VanCART), Division of Clinical Pharmacology, Department of Medicine
| | - Dan M. Roden
- Vanderbilt Center for Arrhythmia Research and Therapeutics
(VanCART), Departments of Medicine, Pharmacology, and Biomedical Informatics,
Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
33
|
Dong X, Xiao T, Chen B, Lu Y, Zhou W. Precision medicine via the integration of phenotype-genotype information in neonatal genome project. FUNDAMENTAL RESEARCH 2022; 2:873-884. [PMID: 38933389 PMCID: PMC11197532 DOI: 10.1016/j.fmre.2022.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 07/07/2022] [Accepted: 07/10/2022] [Indexed: 11/21/2022] Open
Abstract
The explosion of next-generation sequencing (NGS) has enabled the widespread use of genomic data in precision medicine. Currently, several neonatal genome projects have emerged to explore the advantages of NGS to diagnose or screen for rare genetic disorders. These projects have made remarkable achievements, but still the genome data could be further explored with the assistance of phenotype collection. In contrast, longitudinal birth cohorts are great examples to record and apply phenotypic information in clinical studies starting at the neonatal period, especially the trajectory analyses for health development or disease progression. It is obvious that efficient integration of genotype and phenotype benefits not only the clinical management of rare genetic disorders but also the risk assessment of complex diseases. Here, we first summarize the recent neonatal genome projects as well as some longitudinal birth cohorts. Then, we propose two simplified strategies by integrating genotypic and phenotypic information in precision medicine based on current studies. Finally, research collaborations, sociological issues, and future perspectives are discussed. How to maximize neonatal genomic information to benefit the pediatric population remains an area in need of more research and effort.
Collapse
Affiliation(s)
- Xinran Dong
- Center for Molecular Medicine, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai 201102, China
| | - Tiantian Xiao
- Division of Neonatology, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai 201102, China
- Department of Neonatology, Chengdu Women's and Children's Central Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu 610066, China
| | - Bin Chen
- Center for Molecular Medicine, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai 201102, China
| | - Yulan Lu
- Center for Molecular Medicine, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai 201102, China
| | - Wenhao Zhou
- Center for Molecular Medicine, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai 201102, China
- Division of Neonatology, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai 201102, China
| |
Collapse
|
34
|
Huckins LM, Signer R, Johnson J, Wu YK, Mitchell KS, Bulik CM. What next for eating disorder genetics? Replacing myths with facts to sharpen our understanding. Mol Psychiatry 2022; 27:3929-3938. [PMID: 35595976 PMCID: PMC9718676 DOI: 10.1038/s41380-022-01601-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 04/20/2022] [Accepted: 04/26/2022] [Indexed: 02/07/2023]
Abstract
Substantial progress has been made in the understanding of anorexia nervosa (AN) and eating disorder (ED) genetics through the efforts of large-scale collaborative consortia, yielding the first genome-wide significant loci, AN-associated genes, and insights into metabo-psychiatric underpinnings of the disorders. However, the translatability, generalizability, and reach of these insights are hampered by an overly narrow focus in our research. In particular, stereotypes, myths, assumptions and misconceptions have resulted in incomplete or incorrect understandings of ED presentations and trajectories, and exclusion of certain patient groups from our studies. In this review, we aim to counteract these historical imbalances. Taking as our starting point the Academy for Eating Disorders (AED) Truth #5 "Eating disorders affect people of all genders, ages, races, ethnicities, body shapes and weights, sexual orientations, and socioeconomic statuses", we discuss what we do and do not know about the genetic underpinnings of EDs among people in each of these groups, and suggest strategies to design more inclusive studies. In the second half of our review, we outline broad strategic goals whereby ED researchers can expand the diversity, insights, and clinical translatability of their studies.
Collapse
Affiliation(s)
- Laura M Huckins
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Mental Illness Research, Education and Clinical Centers, James J. Peters Department of Veterans Affairs Medical Center, Bronx, NY, 14068, USA
| | - Rebecca Signer
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Jessica Johnson
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Ya-Ke Wu
- School of Nursing, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Karen S Mitchell
- National Center for PTSD at VA Boston Healthcare System, Boston, MA, USA
- Department of Psychiatry, Boston University School of Medicine, Boston, MA, USA
| | - Cynthia M Bulik
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
35
|
Aref L, Bastarache L, Hughey JJ. The phers R package: using phenotype risk scores based on electronic health records to study Mendelian disease and rare genetic variants. Bioinformatics 2022; 38:4972-4974. [PMID: 36083022 PMCID: PMC9620826 DOI: 10.1093/bioinformatics/btac619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 09/06/2022] [Accepted: 09/08/2022] [Indexed: 01/29/2023] Open
Abstract
SUMMARY Electronic health record (EHR) data linked to DNA biobanks are a valuable resource for understanding the phenotypic effects of human genetic variation. We previously developed the phenotype risk score (PheRS) as an approach to quantify the extent to which a patient's clinical features resemble a given Mendelian disease. Using PheRS, we have uncovered novel associations between Mendelian disease-like phenotypes and rare genetic variants, and identified patients who may have undiagnosed Mendelian disease. Although the PheRS approach is conceptually simple, it involves multiple mapping steps and was previously only available as custom scripts, limiting the approach's usability. Thus, we developed the phers R package, a complete and user-friendly set of functions and maps for performing a PheRS-based analysis on linked clinical and genetic data. The package includes up-to-date maps between EHR-based phenotypes (i.e. ICD codes and phecodes), human phenotype ontology terms and Mendelian diseases. Starting with occurrences of ICD codes, the package enables the user to calculate PheRSs, validate the scores using case-control analyses, and perform genetic association analyses. By increasing PheRS's transparency and usability, the phers R package will help improve our understanding of the relationships between rare genetic variants and clinically meaningful human phenotypes. AVAILABILITY AND IMPLEMENTATION The phers R package is free and open-source and available on CRAN and at https://phers.hugheylab.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Layla Aref
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Lisa Bastarache
- Program in Chemical and Physical Biology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | | |
Collapse
|
36
|
Johnson R, Ding Y, Venkateswaran V, Bhattacharya A, Boulier K, Chiu A, Knyazev S, Schwarz T, Freund M, Zhan L, Burch KS, Caggiano C, Hill B, Rakocz N, Balliu B, Denny CT, Sul JH, Zaitlen N, Arboleda VA, Halperin E, Sankararaman S, Butte MJ, Lajonchere C, Geschwind DH, Pasaniuc B. Leveraging genomic diversity for discovery in an electronic health record linked biobank: the UCLA ATLAS Community Health Initiative. Genome Med 2022; 14:104. [PMID: 36085083 PMCID: PMC9461263 DOI: 10.1186/s13073-022-01106-x] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 08/03/2022] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Large medical centers in urban areas, like Los Angeles, care for a diverse patient population and offer the potential to study the interplay between genetic ancestry and social determinants of health. Here, we explore the implications of genetic ancestry within the University of California, Los Angeles (UCLA) ATLAS Community Health Initiative-an ancestrally diverse biobank of genomic data linked with de-identified electronic health records (EHRs) of UCLA Health patients (N=36,736). METHODS We quantify the extensive continental and subcontinental genetic diversity within the ATLAS data through principal component analysis, identity-by-descent, and genetic admixture. We assess the relationship between genetically inferred ancestry (GIA) and >1500 EHR-derived phenotypes (phecodes). Finally, we demonstrate the utility of genetic data linked with EHR to perform ancestry-specific and multi-ancestry genome and phenome-wide scans across a broad set of disease phenotypes. RESULTS We identify 5 continental-scale GIA clusters including European American (EA), African American (AA), Hispanic Latino American (HL), South Asian American (SAA) and East Asian American (EAA) individuals and 7 subcontinental GIA clusters within the EAA GIA corresponding to Chinese American, Vietnamese American, and Japanese American individuals. Although we broadly find that self-identified race/ethnicity (SIRE) is highly correlated with GIA, we still observe marked differences between the two, emphasizing that the populations defined by these two criteria are not analogous. We find a total of 259 significant associations between continental GIA and phecodes even after accounting for individuals' SIRE, demonstrating that for some phenotypes, GIA provides information not already captured by SIRE. GWAS identifies significant associations for liver disease in the 22q13.31 locus across the HL and EAA GIA groups (HL p-value=2.32×10-16, EAA p-value=6.73×10-11). A subsequent PheWAS at the top SNP reveals significant associations with neurologic and neoplastic phenotypes specifically within the HL GIA group. CONCLUSIONS Overall, our results explore the interplay between SIRE and GIA within a disease context and underscore the utility of studying the genomes of diverse individuals through biobank-scale genotyping linked with EHR-based phenotyping.
Collapse
Affiliation(s)
- Ruth Johnson
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
| | - Yi Ding
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Vidhya Venkateswaran
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Oral Biology, School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Kristin Boulier
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Medicine, Division of Cardiology, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Alec Chiu
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Sergey Knyazev
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Tommer Schwarz
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Malika Freund
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Genetics, Stanford School of Medicine, Stanford, CA, 94305, USA
| | - Lingyu Zhan
- Molecular Biology Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Kathryn S Burch
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Christa Caggiano
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Brian Hill
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Nadav Rakocz
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Brunilda Balliu
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Christopher T Denny
- Division of Hematology/Oncology, Department of Pediatrics, Gwynne Hazen Cherry Memorial Laboratories, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Jae Hoon Sul
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Noah Zaitlen
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Valerie A Arboleda
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Eran Halperin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Anesthesiology and Perioperative Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Manish J Butte
- Department of Pediatrics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Clara Lajonchere
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Institute of Precision Health, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Daniel H Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Institute of Precision Health, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Institute of Precision Health, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
| |
Collapse
|
37
|
Liu C, Ta CN, Havrilla JM, Nestor JG, Spotnitz ME, Geneslaw AS, Hu Y, Chung WK, Wang K, Weng C. OARD: Open annotations for rare diseases and their phenotypes based on real-world data. Am J Hum Genet 2022; 109:1591-1604. [PMID: 35998640 PMCID: PMC9502051 DOI: 10.1016/j.ajhg.2022.08.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 08/01/2022] [Indexed: 11/23/2022] Open
Abstract
Diagnosis for rare genetic diseases often relies on phenotype-driven methods, which hinge on the accuracy and completeness of the rare disease phenotypes in the underlying annotation knowledgebase. Existing knowledgebases are often manually curated with additional annotations found in published case reports. Despite their potential, real-world data such as electronic health records (EHRs) have not been fully exploited to derive rare disease annotations. Here, we present open annotation for rare diseases (OARD), a real-world-data-derived resource with annotation for rare-disease-related phenotypes. This resource is derived from the EHRs of two academic health institutions containing more than 10 million individuals spanning wide age ranges and different disease subgroups. By leveraging ontology mapping and advanced natural-language-processing (NLP) methods, OARD automatically and efficiently extracts concepts for both rare diseases and their phenotypic traits from billing codes and lab tests as well as over 100 million clinical narratives. The rare disease prevalence derived by OARD is highly correlated with those annotated in the original rare disease knowledgebase. By performing association analysis, we identified more than 1 million novel disease-phenotype association pairs that were previously missed by human annotation, and >60% were confirmed true associations via manual review of a list of sampled pairs. Compared to the manual curated annotation, OARD is 100% data driven and its pipeline can be shared across different institutions. By supporting privacy-preserving sharing of aggregated summary statistics, such as term frequencies and disease-phenotype associations, it fills an important gap to facilitate data-driven research in the rare disease community.
Collapse
Affiliation(s)
- Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Casey N Ta
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Jim M Havrilla
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Jordan G Nestor
- Division of Nephrology, Department of Medicine, Columbia University, New York, NY 10032, USA
| | - Matthew E Spotnitz
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Andrew S Geneslaw
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yu Hu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Wendy K Chung
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
38
|
Thompson M, Hill BL, Rakocz N, Chiang JN, Geschwind D, Sankararaman S, Hofer I, Cannesson M, Zaitlen N, Halperin E. Methylation risk scores are associated with a collection of phenotypes within electronic health record systems. NPJ Genom Med 2022; 7:50. [PMID: 36008412 PMCID: PMC9411568 DOI: 10.1038/s41525-022-00320-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 07/18/2022] [Indexed: 12/20/2022] Open
Abstract
Inference of clinical phenotypes is a fundamental task in precision medicine, and has therefore been heavily investigated in recent years in the context of electronic health records (EHR) using a large arsenal of machine learning techniques, as well as in the context of genetics using polygenic risk scores (PRS). In this work, we considered the epigenetic analog of PRS, methylation risk scores (MRS), a linear combination of methylation states. We measured methylation across a large cohort (n = 831) of diverse samples in the UCLA Health biobank, for which both genetic and complete EHR data are available. We constructed MRS for 607 phenotypes spanning diagnoses, clinical lab tests, and medication prescriptions. When added to a baseline set of predictive features, MRS significantly improved the imputation of 139 outcomes, whereas the PRS improved only 22 (median improvement for methylation 10.74%, 141.52%, and 15.46% in medications, labs, and diagnosis codes, respectively, whereas genotypes only improved the labs at a median increase of 18.42%). We added significant MRS to state-of-the-art EHR imputation methods that leverage the entire set of medical records, and found that including MRS as a medical feature in the algorithm significantly improves EHR imputation in 37% of lab tests examined (median R2 increase 47.6%). Finally, we replicated several MRS in multiple external studies of methylation (minimum p-value of 2.72 × 10-7) and replicated 22 of 30 tested MRS internally in two separate cohorts of different ethnicity. Our publicly available results and weights show promise for methylation risk scores as clinical and scientific tools.
Collapse
Affiliation(s)
- Mike Thompson
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA.
| | - Brian L Hill
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA.
| | - Nadav Rakocz
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA
| | - Jeffrey N Chiang
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Daniel Geschwind
- Institute of Precision Health, University of California Los Angeles, Los Angeles, CA, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA
| | - Ira Hofer
- Department of Anesthesiology and Perioperative Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Maxime Cannesson
- Department of Anesthesiology and Perioperative Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Noah Zaitlen
- Department of Neurology, University of California Los Angeles, Los Angeles, CA, USA
| | - Eran Halperin
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA.
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, USA.
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA.
- Department of Anesthesiology and Perioperative Medicine, University of California Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
39
|
Liu D, Fox K, Weber G, Miller T. Confederated learning in healthcare: training machine learning models using disconnected data separated by individual, data type and identity for Large-Scale Health System Intelligence. J Biomed Inform 2022; 134:104151. [PMID: 35872264 DOI: 10.1016/j.jbi.2022.104151] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2021] [Revised: 05/21/2022] [Accepted: 07/19/2022] [Indexed: 11/29/2022]
Abstract
BACKGROUND A patient's health information is generally fragmented across silos because it follows how care is delivered: multiple providers in multiple settings. Though it is technically feasible to reunite data for analysis in a manner that underpins a rapid learning healthcare system, privacy concerns and regulatory barriers limit data centralization for this purpose. OBJECTIVES Machine learning can be conducted in a federated manner on patient datasets with the same set of variables but separated across storage. But federated learning cannot handle the situation where different data types for a given patient are separated vertically across different organizations and when patient ID matching across different institutions is difficult. We call methods that enable machine learning model training on data separated by two or more dimensions "confederated machine learning", which we aim to develop in this study. METHODS We propose and evaluate confederated learning for training machine learning models to stratify the risk of several diseases among silos when data are horizontally separated by individual, vertically separated by data type, and separated by identity without patient ID matching. The confederated learning method can be intuitively understood as a distributed learning method with representation learning, generative model, imputation method and data augmentation elements. RESULTS Our confederated learning method achieves AUCROC (Area Under The Curve Receiver Operating Characteristics) of 0.787 for diabetes prediction, 0.718 for psychological disorders prediction, and 0.698 for Ischemic heart disease prediction using nationwide health insurance claims. CONCLUSION Our proposed confederated learning method successfully trained machine learning models on health insurance data separated by two or more dimensions.
Collapse
Affiliation(s)
- Dianbo Liu
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA; Department of Pediatrics, Harvard Medical School, Boston, MA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA; Computer Science & Artificial Intelligence Laboratory, MIT, Cambridge, MA.
| | - Kathe Fox
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA; Aetna, CVS Health, Boston, MA
| | - Griffin Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| | - Tim Miller
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA; Department of Pediatrics, Harvard Medical School, Boston, MA.
| |
Collapse
|
40
|
Spendlove SJ, Bondhus L, Lluri G, Sul JH, Arboleda VA. Polygenic risk scores of endo-phenotypes identify the effect of genetic background in congenital heart disease. HGG ADVANCES 2022; 3:100112. [PMID: 35599848 PMCID: PMC9118152 DOI: 10.1016/j.xhgg.2022.100112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 04/19/2022] [Indexed: 01/28/2023] Open
Abstract
Congenital heart disease (CHD) is a rare structural defect that occurs in ∼1% of live births. Studies on CHD genetic architecture have identified pathogenic single-gene mutations in less than 30% of cases. Single-gene mutations often show incomplete penetrance and variable expressivity. Therefore, we hypothesize that genetic background may play a role in modulating disease expression. Polygenic risk scores (PRSs) aggregate effects of common genetic variants to investigate whether, cumulatively, these variants are associated with disease penetrance or severity. However, the major limitations in this field have been in generating sufficient sample sizes for these studies. Here we used CHD-phenotype matched genome-wide association study (GWAS) summary statistics from the UK Biobank (UKBB) as our base study and whole-genome sequencing data from the CHD cohort (n1 = 711 trios, n2 = 362 European trios) of the Gabriella Miller Kids First dataset as our target study to develop PRSs for CHD. PRSs estimated using a GWAS for heart valve problems and heart murmur explain 2.5% of the variance in case-control status of CHD (all SNVs, p = 7.90 × 10-3; fetal cardiac SNVs, p = 8.00 × 10-3) and 1.8% of the variance in severity of CHD (fetal cardiac SNVs, p = 6.20 × 10-3; all SNVs, p = 0.015). These results show that common variants captured in CHD phenotype-matched GWASs have a modest but significant contribution to phenotypic expression of CHD. Further exploration of the cumulative effect of common variants is necessary for understanding the complex genetic etiology of CHD and other rare diseases.
Collapse
Affiliation(s)
- Sarah J. Spendlove
- Interdepartmental Bioinformatics Program, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Leroy Bondhus
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Gentian Lluri
- Ahmanson/UCLA Adult Congenital Heart Disease Center, Division of Cardiology, Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Jae Hoon Sul
- Interdepartmental Bioinformatics Program, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Valerie A. Arboleda
- Interdepartmental Bioinformatics Program, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
41
|
Wu Y, Wong CW, Chiles EN, Mellinger AL, Bae H, Jung S, Peterson T, Wang J, Negrete M, Huang Q, Wang L, Jang C, Muddiman DC, Su X, Williamson I, Shen X. Glycerate from intestinal fructose metabolism induces islet cell damage and glucose intolerance. Cell Metab 2022; 34:1042-1053.e6. [PMID: 35688154 PMCID: PMC9897509 DOI: 10.1016/j.cmet.2022.05.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 12/21/2021] [Accepted: 05/18/2022] [Indexed: 02/06/2023]
Abstract
Dietary fructose, especially in the context of a high-fat western diet, has been linked to type 2 diabetes. Although the effect of fructose on liver metabolism has been extensively studied, a significant portion of the fructose is first metabolized in the small intestine. Here, we report that dietary fat enhances intestinal fructose metabolism, which releases glycerate into the blood. Chronic high systemic glycerate levels induce glucose intolerance by slowly damaging pancreatic islet cells and reducing islet sizes. Our findings provide a link between dietary fructose and diabetes that is modulated by dietary fat.
Collapse
Affiliation(s)
- Yanru Wu
- Department of Prosthodontics, School and Hospital of Stomatology, Wuhan University, Wuhan, Hubei 430079, China; Department of Biomedical Engineering, Pratt School of Engineering, Duke University, Durham, NC 27708, USA
| | - Chi Wut Wong
- Department of Biomedical Engineering, Pratt School of Engineering, Duke University, Durham, NC 27708, USA; Department of Pharmacology & Cancer Biology, Duke University Medical Center, Durham, NC 27710, USA
| | - Eric N Chiles
- Metabolomics Shared Resource, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 08903, USA
| | - Allyson L Mellinger
- FTMS Laboratory for Human Health Research, Department of Chemistry, North Carolina State University, Raleigh, NC 27695, USA
| | - Hosung Bae
- Department of Biological Chemistry, University of California, Irvine, Irvine, CA 92697, USA
| | - Sunhee Jung
- Department of Biological Chemistry, University of California, Irvine, Irvine, CA 92697, USA
| | - Ted Peterson
- Department of Biomedical Engineering, Pratt School of Engineering, Duke University, Durham, NC 27708, USA
| | - Jamie Wang
- Department of Biomedical Engineering, Pratt School of Engineering, Duke University, Durham, NC 27708, USA
| | - Marcos Negrete
- Department of Biomedical Engineering, Pratt School of Engineering, Duke University, Durham, NC 27708, USA
| | - Qiang Huang
- Department of Biomedical Engineering, Pratt School of Engineering, Duke University, Durham, NC 27708, USA; Department of Pediatric Surgery, Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shanxi 710004, China
| | - Lihua Wang
- Department of Biomedical Engineering, Pratt School of Engineering, Duke University, Durham, NC 27708, USA
| | - Cholsoon Jang
- Department of Biological Chemistry, University of California, Irvine, Irvine, CA 92697, USA
| | - David C Muddiman
- FTMS Laboratory for Human Health Research, Department of Chemistry, North Carolina State University, Raleigh, NC 27695, USA; Molecular Education, Technology and Research Innovation Center, North Carolina State University, Raleigh, NC 27695, USA
| | - Xiaoyang Su
- Metabolomics Shared Resource, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 08903, USA; Department of Medicine, Robert Wood Johnson Medical School, Rutgers University, New Brunswick, NJ 08901, USA
| | - Ian Williamson
- Department of Biomedical Engineering, Pratt School of Engineering, Duke University, Durham, NC 27708, USA; Gastroenterology Division, Department of Medicine, Duke University, Durham, NC 27710, USA.
| | - Xiling Shen
- Department of Biomedical Engineering, Pratt School of Engineering, Duke University, Durham, NC 27708, USA; Terasaki Institute for Biomedical Innovation, Los Angeles, CA 90024, USA.
| |
Collapse
|
42
|
Zhang R, Hao Y, Xu Y, Qin J, Wang Y, Kumar Dey S, Li C, Wang H, Banerjee S. Whole exome sequencing identified a homozygous novel mutation in SUOX gene causes extremely rare autosomal recessive isolated sulfite oxidase deficiency. Clin Chim Acta 2022; 532:115-122. [PMID: 35679912 DOI: 10.1016/j.cca.2022.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 05/29/2022] [Accepted: 06/03/2022] [Indexed: 11/03/2022]
Abstract
BACKGROUND Isolated sulfite oxidase deficiency (ISOD) is a rare type of life-threatening neurometabolic disorders characterized by neonatal intractable seizures and severe developmental delay with an autosomal recessive mode of inheritance. Germline mutation in SUOX gene causes ISOD. Till date, only 32 mutations of SUOX gene have been identified and reported to be associated with ISOD. METHODS Here, we investigated a 5-days old Chinese female child, presented with intermittent tremor or seizures of limbs, neonatal encephalopathy, subarachnoid cyst and haemorrhage, dysplasia of corpus callosum, neonatal convulsion, hyperlactatemia, severe metabolic acidosis, hyperglycemia, and hyperkalemia. RESULTS Whole exome sequencing identified a novel homozygous transition (c.1227G > A) in exon 6 of the SUOX gene in the proband. This novel homozygous variant leads to the formation of a truncated sulfite oxidase (p.Trp409*) of 408 amino acids. This variant causes partial loss of the dimerization domain of sulfite oxidase. Hence, it is a loss-of-function variant. Proband's father and mother is carrying this novel variant in a heterozygous state. This variant was not found in 200 ethnically matched normal healthy control individuals. CONCLUSIONS Our study not only expanded the mutational spectrum of SUOX gene associated with ISOD, but also strongly suggested the significance of whole exome sequencing for identifying candidate genes and novel disease-causing variants.
Collapse
Affiliation(s)
- Rui Zhang
- Division of Maternal-Fetal Medicine, Jinan University-affiliated Shenzhen Bao'an Women's and Children's Hospital, Shenzhen 518102, China
| | - Yajing Hao
- Department of Radiology, Jinan University-affiliated Shenzhen Bao'an Women's and Children's Hospital, Jinan University, Shenzhen 518102, China
| | - Ying Xu
- Department of Ultrasound, Jinan University-affiliated Shenzhen Bao'an Women's and Children's Hospital, Shenzhen 518102, China
| | - Jiale Qin
- Department of Radiology, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Yanfang Wang
- Department of Ultrasound, Jinan University-affiliated Shenzhen Bao'an Women's and Children's Hospital, Shenzhen 518102, China
| | - Subrata Kumar Dey
- Department of Biotechnology, Centre for Genetic Studies, School of Biotechnology and Biological Sciences, Maulana Abul Kalam Azad University of Technology (Formerly West Bengal University of Technology), Salt Lake City, Kolkata, India
| | - Chen Li
- Department of Cell Biology and Medical Genetics, School of Medicine, Zhejiang University, Hangzhou, China
| | - Huilin Wang
- Division of Maternal-Fetal Medicine, Jinan University-affiliated Shenzhen Bao'an Women's and Children's Hospital, Shenzhen 518102, China.
| | - Santasree Banerjee
- Department of Genetics, College of Basic Medical Sciences, Jilin University, Changchun, Jilin 130021, China.
| |
Collapse
|
43
|
Blair DR, Hoffmann TJ, Shieh JT. Common genetic variation associated with Mendelian disease severity revealed through cryptic phenotype analysis. Nat Commun 2022; 13:3675. [PMID: 35760791 PMCID: PMC9237040 DOI: 10.1038/s41467-022-31030-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 05/30/2022] [Indexed: 11/09/2022] Open
Abstract
Clinical heterogeneity is common in Mendelian disease, but small sample sizes make it difficult to identify specific contributing factors. However, if a disease represents the severely affected extreme of a spectrum of phenotypic variation, then modifier effects may be apparent within a larger subset of the population. Analyses that take advantage of this full spectrum could have substantially increased power. To test this, we developed cryptic phenotype analysis, a model-based approach that infers quantitative traits that capture disease-related phenotypic variability using qualitative symptom data. By applying this approach to 50 Mendelian diseases in two cohorts, we identify traits that reliably quantify disease severity. We then conduct genome-wide association analyses for five of the inferred cryptic phenotypes, uncovering common variation that is predictive of Mendelian disease-related diagnoses and outcomes. Overall, this study highlights the utility of computationally-derived phenotypes and biobank-scale cohorts for investigating the complex genetic architecture of Mendelian diseases.
Collapse
Affiliation(s)
- David R Blair
- Division of Medical Genetics, Department of Pediatrics, Benioff Children's Hospital, San Francisco, CA, USA.
| | - Thomas J Hoffmann
- Institute for Human Genetics, San Francisco, CA, USA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA
| | - Joseph T Shieh
- Division of Medical Genetics, Department of Pediatrics, Benioff Children's Hospital, San Francisco, CA, USA.
- Institute for Human Genetics, San Francisco, CA, USA.
| |
Collapse
|
44
|
Zeng C, Bastarache LA, Tao R, Venner E, Hebbring S, Andujar JD, Bland ST, Crosslin DR, Pratap S, Cooley A, Pacheco JA, Christensen KD, Perez E, Zawatsky CLB, Witkowski L, Zouk H, Weng C, Leppig KA, Sleiman PMA, Hakonarson H, Williams MS, Luo Y, Jarvik GP, Green RC, Chung WK, Gharavi AG, Lennon NJ, Rehm HL, Gibbs RA, Peterson JF, Roden DM, Wiesner GL, Denny JC. Association of Pathogenic Variants in Hereditary Cancer Genes With Multiple Diseases. JAMA Oncol 2022; 8:835-844. [PMID: 35446370 PMCID: PMC9026237 DOI: 10.1001/jamaoncol.2022.0373] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Importance Knowledge about the spectrum of diseases associated with hereditary cancer syndromes may improve disease diagnosis and management for patients and help to identify high-risk individuals. Objective To identify phenotypes associated with hereditary cancer genes through a phenome-wide association study. Design, Setting, and Participants This phenome-wide association study used health data from participants in 3 cohorts. The Electronic Medical Records and Genomics Sequencing (eMERGEseq) data set recruited predominantly healthy individuals from 10 US medical centers from July 16, 2016, through February 18, 2018, with a mean follow-up through electronic health records (EHRs) of 12.7 (7.4) years. The UK Biobank (UKB) cohort recruited participants from March 15, 2006, through August 1, 2010, with a mean (SD) follow-up of 12.4 (1.0) years. The Hereditary Cancer Registry (HCR) recruited patients undergoing clinical genetic testing at Vanderbilt University Medical Center from May 1, 2012, through December 31, 2019, with a mean (SD) follow-up through EHRs of 8.8 (6.5) years. Exposures Germline variants in 23 hereditary cancer genes. Pathogenic and likely pathogenic variants for each gene were aggregated for association analyses. Main Outcomes and Measures Phenotypes in the eMERGEseq and HCR cohorts were derived from the linked EHRs. Phenotypes in UKB were from multiple sources of health-related data. Results A total of 214 020 participants were identified, including 23 544 in eMERGEseq cohort (mean [SD] age, 47.8 [23.7] years; 12 611 women [53.6%]), 187 234 in the UKB cohort (mean [SD] age, 56.7 [8.1] years; 104 055 [55.6%] women), and 3242 in the HCR cohort (mean [SD] age, 52.5 [15.5] years; 2851 [87.9%] women). All 38 established gene-cancer associations were replicated, and 19 new associations were identified. These included the following 7 associations with neoplasms: CHEK2 with leukemia (odds ratio [OR], 3.81 [95% CI, 2.64-5.48]) and plasma cell neoplasms (OR, 3.12 [95% CI, 1.84-5.28]), ATM with gastric cancer (OR, 4.27 [95% CI, 2.35-7.44]) and pancreatic cancer (OR, 4.44 [95% CI, 2.66-7.40]), MUTYH (biallelic) with kidney cancer (OR, 32.28 [95% CI, 6.40-162.73]), MSH6 with bladder cancer (OR, 5.63 [95% CI, 2.75-11.49]), and APC with benign liver/intrahepatic bile duct tumors (OR, 52.01 [95% CI, 14.29-189.29]). The remaining 12 associations with nonneoplastic diseases included BRCA1/2 with ovarian cysts (OR, 3.15 [95% CI, 2.22-4.46] and 3.12 [95% CI, 2.36-4.12], respectively), MEN1 with acute pancreatitis (OR, 33.45 [95% CI, 9.25-121.02]), APC with gastritis and duodenitis (OR, 4.66 [95% CI, 2.61-8.33]), and PTEN with chronic gastritis (OR, 15.68 [95% CI, 6.01-40.92]). Conclusions and Relevance The findings of this genetic association study analyzing the EHRs of 3 large cohorts suggest that these new phenotypes associated with hereditary cancer genes may facilitate early detection and better management of cancers. This study highlights the potential benefits of using EHR data in genomic medicine.
Collapse
Affiliation(s)
- Chenjie Zeng
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland
| | - Lisa A Bastarache
- Center for Precision Medicine, Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Ran Tao
- Department of Biostatistics, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Eric Venner
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas
| | - Scott Hebbring
- Center for Human Genetics, Marshfield Clinic Research Institute, Marshfield, Wisconsin
| | - Justin D Andujar
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee.,Clinical and Translational Hereditary Cancer Program, Division of Genetic Medicine, Vanderbilt-Ingram Cancer Center, Vanderbilt University, Nashville, Tennessee
| | - Sarah T Bland
- Center for Precision Medicine, Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - David R Crosslin
- Department of Biomedical Informatics and Medical Education, University of Washington School of Medicine, Seattle
| | - Siddharth Pratap
- School of Graduate Studies and Research, Meharry Medical College, Nashville, Tennessee
| | - Ayorinde Cooley
- Department of Microbiology, Immunology and Physiology, Meharry Medical College, Nashville, Tennessee
| | - Jennifer A Pacheco
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois
| | - Kurt D Christensen
- PRecisiOn Medicine Translational Research (PROMoTeR) Center, Department of Population Medicine, Harvard Pilgrim Health Care Institute, Boston, Massachusetts.,Department of Population Medicine, Harvard Medical School, Boston, Massachusetts
| | - Emma Perez
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts
| | - Carrie L Blout Zawatsky
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts
| | - Leora Witkowski
- Centre Universitaire de Santé McGill, McGill University Health Centre, Montreal, Quebec, Canada
| | - Hana Zouk
- Laboratory for Molecular Medicine, Partners Healthcare Personalized Medicine, Cambridge, Massachusetts.,Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York
| | - Kathleen A Leppig
- Genetic Services and Kaiser Permanente Washington Health Research Institute, Kaiser Permanente of Washington, Seattle
| | - Patrick M A Sleiman
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania.,Division of Human Genetics, Department of Pediatrics, The University of Pennsylvania Perelman School of Medicine, Philadelphia
| | - Hakon Hakonarson
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania.,Division of Human Genetics, Department of Pediatrics, The University of Pennsylvania Perelman School of Medicine, Philadelphia
| | - Marc S Williams
- Genomic Medicine Institute, Geisinger, Danville, Pennsylvania
| | - Yuan Luo
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois
| | - Gail P Jarvik
- Department of Medicine (Medical Genetics), University of Washington, Seattle.,Department of Genome Sciences, University of Washington, Seattle
| | - Robert C Green
- Brigham and Women's Hospital, Broad Institute, Ariadne Labs and Harvard Medical School, Boston, Massachusetts
| | - Wendy K Chung
- Department of Pediatrics, Columbia University, New York, New York.,Department of Medicine, Columbia University, New York, New York
| | - Ali G Gharavi
- Division of Nephrology, Department of Medicine, Columbia University Irving Medical Center, New York, New York.,Center for Precision Medicine and Genomics, Department of Medicine, Columbia University Irving Medical Center, New York, New York
| | - Niall J Lennon
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts
| | - Heidi L Rehm
- Medical & Population Genetics Program and Genomics Platform, Broad Institute of MIT and Harvard Cambridge, Cambridge, Massachusetts.,Center for Genomic Medicine, Massachusetts General Hospital, Boston.,Department of Pathology, Harvard Medical School, Boston, Massachusetts
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas
| | - Josh F Peterson
- Center for Precision Medicine, Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Dan M Roden
- Center for Precision Medicine, Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee.,Divisions of Cardiovascular Medicine and Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee.,Department of Pharmacology, Vanderbilt University, Nashville, Tennessee
| | - Georgia L Wiesner
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee.,Clinical and Translational Hereditary Cancer Program, Division of Genetic Medicine, Vanderbilt-Ingram Cancer Center, Vanderbilt University, Nashville, Tennessee
| | - Joshua C Denny
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
45
|
Mo H, Denny JC. The U.S. National Library of Medicine’s impact on precision and genomic medicine. INFORMATION SERVICES & USE 2022; 42:71-80. [PMID: 35600119 PMCID: PMC9108560 DOI: 10.3233/isu-210144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Precision medicine offers the potential to improve health through deeper understandings of the lifestyle, biological, and environmental influences on health. Under Dr. Donald A. B. Lindberg’s leadership, the U.S. National Library of Medicine (NLM) has developed the central reference resources for biomedical research and molecular laboratory medicine that enable precision medicine. The hosting and curation of biomedical knowledge repositories and data by NLM enable quality information reachable for providers and researchers throughout the world. NLM has been supporting the innovation of electronic health record systems to implement computability and secondary use for biomedical research, producing the scale of linked health and molecular datasets necessary for precision medicine discovery.
Collapse
Affiliation(s)
- Huan Mo
- National Human Genome Research Institute, National Institutes of Health, , , USA
| | - Joshua C. Denny
- National Human Genome Research Institute, National Institutes of Health, , , USA
| |
Collapse
|
46
|
Exome sequencing of individuals with Huntington's disease implicates FAN1 nuclease activity in slowing CAG expansion and disease onset. Nat Neurosci 2022; 25:446-457. [PMID: 35379994 PMCID: PMC8986535 DOI: 10.1038/s41593-022-01033-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 02/11/2022] [Indexed: 12/13/2022]
Abstract
The age at onset of motor symptoms in Huntington's disease (HD) is driven by HTT CAG repeat length but modified by other genes. In this study, we used exome sequencing of 683 patients with HD with extremes of onset or phenotype relative to CAG length to identify rare variants associated with clinical effect. We discovered damaging coding variants in candidate modifier genes identified in previous genome-wide association studies associated with altered HD onset or severity. Variants in FAN1 clustered in its DNA-binding and nuclease domains and were associated predominantly with earlier-onset HD. Nuclease activities of purified variants in vitro correlated with residual age at motor onset of HD. Mutating endogenous FAN1 to a nuclease-inactive form in an induced pluripotent stem cell model of HD led to rates of CAG expansion similar to those observed with complete FAN1 knockout. Together, these data implicate FAN1 nuclease activity in slowing somatic repeat expansion and hence onset of HD.
Collapse
|
47
|
The Missing LNK: Evolution from Cytosis to Chronic Myelomonocytic Leukemia in a Patient with Multiple Sclerosis and Germline SH2B3 Mutation. Case Rep Genet 2022; 2022:6977041. [PMID: 35281324 PMCID: PMC8904908 DOI: 10.1155/2022/6977041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 01/27/2022] [Indexed: 11/18/2022] Open
Abstract
Chronic myelomonocytic leukemia (CMML) is a rare but distinct hematological neoplasm with overlapping features of myelodysplastic syndrome (MDS) and myeloproliferative neoplasm (MPN). Individuals with CMML have persistent monocytosis and bone marrow dyspoiesis associated with various constitutional symptoms like fevers, unintentional weight loss, or night sweats. It is established that there is a strong association of CMML with preceding or coexisting autoimmune diseases and systemic inflammatory syndromes affecting around 20% of patients. Various molecular abnormalities like TET2, SRSF2, ASXL1, and RAS are reported in the pathogenesis of CMML, but no such mutations have been described to explain the strong association of autoimmune diseases and severe inflammatory phenotype seen in CMML. Germline mutation in SH2B adaptor protein 3 (SH2B3) had been reported before to affect a family with autoimmune disorders and acute lymphoblastic leukemia. In this report, we describe the first case of a female subject with many years of preceding history of multiple sclerosis before the diagnosis of CMML. We outline the evidence supporting the pathogenic role of SH2B3 p.E395K germline mutation, connecting the dots of association between autoimmune diseases and CMML genesis.
Collapse
|
48
|
Weeks WB, Huynh G, Cao SY, Smith J, Bangur C, Weinstein JN. Examining the Prevalence of Previously Recorded Phenotypically Related Diagnoses Among Fee-for-Service Medicare Enrollees Newly Diagnosed with Mendelian Conditions. J Gen Intern Med 2022; 37:475-477. [PMID: 33479932 PMCID: PMC8811097 DOI: 10.1007/s11606-020-06469-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 12/13/2020] [Indexed: 02/03/2023]
Affiliation(s)
- William B Weeks
- Microsoft Corporation, Microsoft Research, Redmond, WA, USA.
| | - Grace Huynh
- Microsoft Corporation, Microsoft Research, Redmond, WA, USA
| | | | - Jeremy Smith
- White River Junction VA Outcomes Group, WRJ, Hartford, VT, USA
| | | | - James N Weinstein
- Microsoft Corporation, Microsoft Research, Redmond, WA, USA.,The Dartmouth Institute, Lebanon, NH, USA.,Kellogg School of Business, Evanston, IL, USA.,Amos Tuck School of Business, Hanover, NH, USA
| |
Collapse
|
49
|
Gordon SM, O'Connell AE. Inborn Errors of Immunity in the Premature Infant: Challenges in Recognition and Diagnosis. Front Immunol 2022; 12:758373. [PMID: 35003071 PMCID: PMC8738084 DOI: 10.3389/fimmu.2021.758373] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 12/07/2021] [Indexed: 12/14/2022] Open
Abstract
Due to heightened awareness and advanced genetic tools, inborn errors of immunity (IEI) are increasingly recognized in children. However, diagnosing of IEI in premature infants is challenging and, subsequently, reports of IEI in premature infants remain rare. This review focuses on how common disorders of prematurity, such as sepsis, necrotizing enterocolitis, and bronchopulmonary dysplasia, can clinically overlap with presenting signs of IEI. We present four recent cases from a single neonatal intensive care unit that highlight diagnostic dilemmas facing neonatologists and clinical immunologists when considering IEI in preterm infants. Finally, we present a conceptual framework for when to consider IEI in premature infants and a guide to initial workup of premature infants suspected of having IEI.
Collapse
Affiliation(s)
- Scott M Gordon
- Division of Neonatology, Children's Hospital of Philadelphia, and Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Amy E O'Connell
- Division of Newborn Medicine, Boston Children's Hospital, and Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
50
|
AIM in Genomic Basis of Medicine: Applications. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|