1
|
Morris DM, Wang C, Papanastasiou G, Gray CD, Xu W, Sjöström S, Badr S, Paccou J, Semple SIK, MacGillivray T, Cawthorn WP. A novel deep learning method for large-scale analysis of bone marrow adiposity using UK Biobank Dixon MRI data. Comput Struct Biotechnol J 2024; 24:89-104. [PMID: 38268780 PMCID: PMC10806280 DOI: 10.1016/j.csbj.2023.12.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/20/2023] [Accepted: 12/23/2023] [Indexed: 01/26/2024] Open
Abstract
Background Bone marrow adipose tissue (BMAT) represents > 10% fat mass in healthy humans and can be measured by magnetic resonance imaging (MRI) as the bone marrow fat fraction (BMFF). Human MRI studies have identified several diseases associated with BMFF but have been relatively small scale. Population-scale studies therefore have huge potential to reveal BMAT's true clinical relevance. The UK Biobank (UKBB) is undertaking MRI of 100,000 participants, providing the ideal opportunity for such advances. Objective To establish deep learning for high-throughput multi-site BMFF analysis from UKBB MRI data. Materials and methods We studied males and females aged 60-69. Bone marrow (BM) segmentation was automated using a new lightweight attention-based 3D U-Net convolutional neural network that improved segmentation of small structures from large volumetric data. Using manual segmentations from 61-64 subjects, the models were trained to segment four BM regions of interest: the spine (thoracic and lumbar vertebrae), femoral head, total hip and femoral diaphysis. Models were tested using a further 10-12 datasets per region and validated using datasets from 729 UKBB participants. BMFF was then quantified and pathophysiological characteristics assessed, including site- and sex-dependent differences and the relationships with age, BMI, bone mineral density, peripheral adiposity, and osteoporosis. Results Model accuracy matched or exceeded that for conventional U-Nets, yielding Dice scores of 91.2% (spine), 94.5% (femoral head), 91.2% (total hip) and 86.6% (femoral diaphysis). One case of severe scoliosis prevented segmentation of the spine, while one case of Non-Hodgkin Lymphoma prevented segmentation of the spine, femoral head and total hip because of T2 signal depletion; however, successful segmentation was not disrupted by any other pathophysiological variables. The resulting BMFF measurements confirmed expected relationships between BMFF and age, sex and bone density, and identified new site- and sex-specific characteristics. Conclusions We have established a new deep learning method for accurate segmentation of small structures from large volumetric data, allowing high-throughput multi-site BMFF measurement in the UKBB. Our findings reveal new pathophysiological insights, highlighting the potential of BMFF as a novel clinical biomarker. Applying our method across the full UKBB cohort will help to reveal the impact of BMAT on human health and disease.
Collapse
Affiliation(s)
- David M. Morris
- University/BHF Centre for Cardiovascular Science, University of Edinburgh, The Queen's Medical Research Institute, Edinburgh BioQuarter, 47 Little France Crescent, Edinburgh EH16 4TJ, UK
- Edinburgh Imaging, University of Edinburgh, The Queen's Medical Research Institute, Edinburgh BioQuarter, 47 Little France Crescent, Edinburgh EH16 4TJ, UK
| | - Chengjia Wang
- University/BHF Centre for Cardiovascular Science, University of Edinburgh, The Queen's Medical Research Institute, Edinburgh BioQuarter, 47 Little France Crescent, Edinburgh EH16 4TJ, UK
- School of Mathematics and Computer Sciences, Heriot-Watt University, Edinburgh EH14 1AS, UK
| | - Giorgos Papanastasiou
- Edinburgh Imaging, University of Edinburgh, The Queen's Medical Research Institute, Edinburgh BioQuarter, 47 Little France Crescent, Edinburgh EH16 4TJ, UK
- School of Computer Science and Electronic Engineering, Wivenhoe Park, The University of Essex, Colchester CO4 3SQ, UK
| | - Calum D. Gray
- Edinburgh Imaging, University of Edinburgh, The Queen's Medical Research Institute, Edinburgh BioQuarter, 47 Little France Crescent, Edinburgh EH16 4TJ, UK
| | - Wei Xu
- Centre for Global Health, Usher Institute, University of Edinburgh, Edinburgh EH8 9AG, UK
| | - Samuel Sjöström
- University/BHF Centre for Cardiovascular Science, University of Edinburgh, The Queen's Medical Research Institute, Edinburgh BioQuarter, 47 Little France Crescent, Edinburgh EH16 4TJ, UK
| | - Sammy Badr
- University of Lille, Marrow Adiposity and Bone Laboratory (MABlab) ULR 4490, F-59000 Lille, France
- CHU Lille, Department of Radiology and Musculoskeletal Imaging, F-59000 Lille, France
| | - Julien Paccou
- University of Lille, Marrow Adiposity and Bone Laboratory (MABlab) ULR 4490, F-59000 Lille, France
- CHU Lille, Department of Rheumatology, F-59000 Lille, France
| | - Scott IK Semple
- University/BHF Centre for Cardiovascular Science, University of Edinburgh, The Queen's Medical Research Institute, Edinburgh BioQuarter, 47 Little France Crescent, Edinburgh EH16 4TJ, UK
- Edinburgh Imaging, University of Edinburgh, The Queen's Medical Research Institute, Edinburgh BioQuarter, 47 Little France Crescent, Edinburgh EH16 4TJ, UK
| | - Tom MacGillivray
- Centre for Clinical Brain Sciences, University of Edinburgh, The Queen's Medical Research Institute, Edinburgh BioQuarter, 47 Little France Crescent, Edinburgh EH16 4TJ, UK
| | - William P. Cawthorn
- University/BHF Centre for Cardiovascular Science, University of Edinburgh, The Queen's Medical Research Institute, Edinburgh BioQuarter, 47 Little France Crescent, Edinburgh EH16 4TJ, UK
| |
Collapse
|
2
|
Løkhammer S, Koller D, Wendt FR, Choi KW, He J, Friligkou E, Overstreet C, Gelernter J, Hellard SL, Polimanti R. Distinguishing vulnerability and resilience to posttraumatic stress disorder evaluating traumatic experiences, genetic risk and electronic health records. Psychiatry Res 2024; 337:115950. [PMID: 38744179 PMCID: PMC11156529 DOI: 10.1016/j.psychres.2024.115950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/29/2024] [Accepted: 05/04/2024] [Indexed: 05/16/2024]
Abstract
What distinguishes vulnerability and resilience to posttraumatic stress disorder (PTSD) remains unclear. Levering traumatic experiences reporting, genetic data, and electronic health records (EHR), we investigated and predicted the clinical comorbidities (co-phenome) of PTSD vulnerability and resilience in the UK Biobank (UKB) and All of Us Research Program (AoU), respectively. In 60,354 trauma-exposed UKB participants, we defined PTSD vulnerability and resilience considering PTSD symptoms, trauma burden, and polygenic risk scores. EHR-based phenome-wide association studies (PheWAS) were conducted to dissect the co-phenomes of PTSD vulnerability and resilience. Significant diagnostic endpoints were applied as weights, yielding a phenotypic risk score (PheRS) to conduct PheWAS of PTSD vulnerability and resilience PheRS in up to 95,761 AoU participants. EHR-based PheWAS revealed three significant phenotypes positively associated with PTSD vulnerability (top association "Sleep disorders") and five outcomes inversely associated with PTSD resilience (top association "Irritable Bowel Syndrome"). In the AoU cohort, PheRS analysis showed a partial inverse relationship between vulnerability and resilience with distinct comorbid associations. While PheRSvulnerability associations were linked to multiple phenotypes, PheRSresilience showed inverse relationships with eye conditions. Our study unveils phenotypic differences in PTSD vulnerability and resilience, highlighting that these concepts are not simply the absence and presence of PTSD.
Collapse
Affiliation(s)
- Solveig Løkhammer
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, USA
- Department of Clinical Science, University of Bergen, Bergen, Norway
- Dr. Einar Martens Research Group for Biological Psychiatry, Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, Norway
| | - Dora Koller
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, USA
- Department of Genetics, Microbiology, and Statistics, Faculty of Biology, University of Barcelona, Catalonia, Spain
| | - Frank R. Wendt
- Department of Anthropology, University of Toronto, Mississauga, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - Karmel W. Choi
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Jun He
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, USA
- Veterans Affairs Connecticut Healthcare Center, West Haven, Connecticut, USA
| | - Eleni Friligkou
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, USA
- Veterans Affairs Connecticut Healthcare Center, West Haven, Connecticut, USA
| | - Cassie Overstreet
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, USA
- Veterans Affairs Connecticut Healthcare Center, West Haven, Connecticut, USA
| | - Joel Gelernter
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, USA
- Veterans Affairs Connecticut Healthcare Center, West Haven, Connecticut, USA
- Department of Genetics, Yale School of Medicine, New Haven, Connecticut, USA
- Department of Neuroscience, Yale School of Medicine, New Haven, Connecticut, USA
- Wu Tsai Institute, Yale University, New Haven, Connecticut, USA
| | - Stéphanie Le Hellard
- Department of Clinical Science, University of Bergen, Bergen, Norway
- Dr. Einar Martens Research Group for Biological Psychiatry, Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, Norway
- Bergen Center of Brain Plasticity, Haukeland University Hospital, Bergen, Norway
| | - Renato Polimanti
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, USA
- Veterans Affairs Connecticut Healthcare Center, West Haven, Connecticut, USA
- Wu Tsai Institute, Yale University, New Haven, Connecticut, USA
- Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, Connecticut, USA
| |
Collapse
|
3
|
Yang X, Sullivan PF, Li B, Fan Z, Ding D, Shu J, Guo Y, Paschou P, Bao J, Shen L, Ritchie MD, Nave G, Platt ML, Li T, Zhu H, Zhao B. Multi-organ imaging-derived polygenic indexes for brain and body health. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.04.18.23288769. [PMID: 38883759 PMCID: PMC11177904 DOI: 10.1101/2023.04.18.23288769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2024]
Abstract
The UK Biobank (UKB) imaging project is a crucial resource for biomedical research, but is limited to 100,000 participants due to cost and accessibility barriers. Here we used genetic data to predict heritable imaging-derived phenotypes (IDPs) for a larger cohort. We developed and evaluated 4,375 IDP genetic scores (IGS) derived from UKB brain and body images. When applied to UKB participants who were not imaged, IGS revealed links to numerous phenotypes and stratified participants at increased risk for both brain and somatic diseases. For example, IGS identified individuals at higher risk for Alzheimer's disease and multiple sclerosis, offering additional insights beyond traditional polygenic risk scores of these diseases. When applied to independent external cohorts, IGS also stratified those at high disease risk in the All of Us Research Program and the Alzheimer's Disease Neuroimaging Initiative study. Our results demonstrate that, while the UKB imaging cohort is largely healthy and may not be the most enriched for disease risk management, it holds immense potential for stratifying the risk of various brain and body diseases in broader external genetic cohorts.
Collapse
|
4
|
McDermott GC, DiIorio M, Kawano Y, Jeffway M, MacVicar M, Dahal K, Moon SJ, Seyok T, Coblyn J, Massarotti E, Weinblatt ME, Weisenfeld D, Liao KP. Reasons for multiple biologic and targeted synthetic DMARD switching and characteristics of treatment refractory rheumatoid arthritis. Semin Arthritis Rheum 2024; 66:152421. [PMID: 38457949 PMCID: PMC11088978 DOI: 10.1016/j.semarthrit.2024.152421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 12/26/2023] [Accepted: 02/20/2024] [Indexed: 03/10/2024]
Abstract
OBJECTIVE Switching biologic and targeted synthetic DMARD (b/tsDMARD) medications occurs commonly in RA patients, however data are limited on the reasons for these changes. The objective of the study was to identify and categorize reasons for b/tsDMARD switching and investigate characteristics associated with treatment refractory RA. METHODS In a multi-hospital RA electronic health record (EHR) cohort, we identified RA patients prescribed ≥1 b/tsDMARD between 2001 and 2017. Consistent with the EULAR "difficult to treat" (D2T) RA definition, we further identified patients who discontinued ≥2 b/tsDMARDs with different mechanisms of action. We performed manual chart review to determine reasons for medication discontinuation. We defined "treatment refractory" RA as not achieving low disease activity (<3 tender or swollen joints on <7.5 mg of daily prednisone equivalent) despite treatment with two different b/tsDMARD mechanisms of action. We compared demographic, lifestyle, and clinical factors between treatment refractory RA and b/tsDMARD initiators not meeting D2T criteria. RESULTS We identified 6040 RA patients prescribed ≥1 b/tsDMARD including 404 meeting D2T criteria. The most common reasons for medication discontinuation were inadequate response (43.3 %), loss of efficacy (25.8 %), and non-allergic adverse events (13.7 %). Of patients with D2T RA, 15 % had treatment refractory RA. Treatment refractory RA patients were younger at b/tsDMARD initiation (mean 47.2 vs. 55.2 years, p < 0.001), more commonly female (91.8% vs. 76.1 %, p = 0.006), and ever smokers (68.9% vs. 49.9 %, p = 0.005). No RA clinical factors differentiated treatment refractory RA patients from b/tsDMARD initiators. CONCLUSIONS In a large EHR-based RA cohort, the most common reasons for b/tsDMARD switching were inadequate response, loss of efficacy, and nonallergic adverse events (e.g. infections, leukopenia, psoriasis). Clinical RA factors were insufficient for differentiating b/tsDMARD responders from nonresponders.
Collapse
Affiliation(s)
- Gregory C McDermott
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA.
| | - Michael DiIorio
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Yumeko Kawano
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Mary Jeffway
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA
| | - Megan MacVicar
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA
| | - Kumar Dahal
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA
| | - Su-Jin Moon
- Division of Rheumatology, Department of Internal Medicine, Yeouido St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Thany Seyok
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA
| | - Jonathan Coblyn
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Elena Massarotti
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Michael E Weinblatt
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Dana Weisenfeld
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA
| | - Katherine P Liao
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| |
Collapse
|
5
|
Johnson R, Stephens AV, Mester R, Knyazev S, Kohn LA, Freund MK, Bondhus L, Hill BL, Schwarz T, Zaitlen N, Arboleda VA, A Bastarache L, Pasaniuc B, Butte MJ. Electronic health record signatures identify undiagnosed patients with common variable immunodeficiency disease. Sci Transl Med 2024; 16:eade4510. [PMID: 38691621 DOI: 10.1126/scitranslmed.ade4510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 04/10/2024] [Indexed: 05/03/2024]
Abstract
Human inborn errors of immunity include rare disorders entailing functional and quantitative antibody deficiencies due to impaired B cells called the common variable immunodeficiency (CVID) phenotype. Patients with CVID face delayed diagnoses and treatments for 5 to 15 years after symptom onset because the disorders are rare (prevalence of ~1/25,000), and there is extensive heterogeneity in CVID phenotypes, ranging from infections to autoimmunity to inflammatory conditions, overlapping with other more common disorders. The prolonged diagnostic odyssey drives excessive system-wide costs before diagnosis. Because there is no single causal mechanism, there are no genetic tests to definitively diagnose CVID. Here, we present PheNet, a machine learning algorithm that identifies patients with CVID from their electronic health records (EHRs). PheNet learns phenotypic patterns from verified CVID cases and uses this knowledge to rank patients by likelihood of having CVID. PheNet could have diagnosed more than half of our patients with CVID 1 or more years earlier than they had been diagnosed. When applied to a large EHR dataset, followed by blinded chart review of the top 100 patients ranked by PheNet, we found that 74% were highly probable to have CVID. We externally validated PheNet using >6 million records from disparate medical systems in California and Tennessee. As artificial intelligence and machine learning make their way into health care, we show that algorithms such as PheNet can offer clinical benefits by expediting the diagnosis of rare diseases.
Collapse
Affiliation(s)
- Ruth Johnson
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Alexis V Stephens
- Department of Pediatrics, Division of Immunology, Allergy and Rheumatology, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Rachel Mester
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Sergey Knyazev
- Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Lisa A Kohn
- Department of Pediatrics, Division of Immunology, Allergy and Rheumatology, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Malika K Freund
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Leroy Bondhus
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Brian L Hill
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Tommer Schwarz
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Noah Zaitlen
- Department of Neurology, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Valerie A Arboleda
- Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Lisa A Bastarache
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA 37203
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Manish J Butte
- Department of Pediatrics, Division of Immunology, Allergy and Rheumatology, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
6
|
Vessels T, Strayer N, Lee H, Choi KW, Zhang S, Han L, Morley TJ, Smoller JW, Xu Y, Ruderfer DM. Integrating Electronic Health Records and Polygenic Risk to Identify Genetically Unrelated Comorbidities of Schizophrenia That May Be Modifiable. BIOLOGICAL PSYCHIATRY GLOBAL OPEN SCIENCE 2024; 4:100297. [PMID: 38645405 PMCID: PMC11033077 DOI: 10.1016/j.bpsgos.2024.100297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 02/07/2024] [Accepted: 02/11/2024] [Indexed: 04/23/2024] Open
Abstract
Background Patients with schizophrenia have substantial comorbidity that contributes to reduced life expectancy of 10 to 20 years. Identifying modifiable comorbidities could improve rates of premature mortality. Conditions that frequently co-occur but lack shared genetic risk with schizophrenia are more likely to be products of treatment, behavior, or environmental factors and therefore are enriched for potentially modifiable associations. Methods Phenome-wide comorbidity was calculated from electronic health records of 250,000 patients across 2 independent health care institutions (Vanderbilt University Medical Center and Mass General Brigham); associations with schizophrenia polygenic risk scores were calculated across the same phenotypes in linked biobanks. Results Schizophrenia comorbidity was significantly correlated across institutions (r = 0.85), and the 77 identified comorbidities were consistent with prior literature. Overall, comorbidity and polygenic risk score associations were significantly correlated (r = 0.55, p = 1.29 × 10-118). However, directly testing for the absence of genetic effects identified 36 comorbidities that had significantly equivalent schizophrenia polygenic risk score distributions between cases and controls. This set included phenotypes known to be consequences of antipsychotic medications (e.g., movement disorders) or of the disease such as reduced hygiene (e.g., diseases of the nail), thereby validating the approach. It also highlighted phenotypes with less clear causal relationships and minimal genetic effects such as tobacco use disorder and diabetes. Conclusions This work demonstrates the consistency and robustness of electronic health record-based schizophrenia comorbidities across independent institutions and with the existing literature. It identifies known and novel comorbidities with an absence of shared genetic risk, indicating other causes that may be modifiable and where further study of causal pathways could improve outcomes for patients.
Collapse
Affiliation(s)
- Tess Vessels
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
- Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Nicholas Strayer
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Hyunjoon Lee
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts
| | - Karmel W. Choi
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts
| | - Siwei Zhang
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Lide Han
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
- Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Theodore J. Morley
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
- Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Jordan W. Smoller
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts
| | - Yaomin Xu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Douglas M. Ruderfer
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
- Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee
| |
Collapse
|
7
|
Johnston KJA, Cote AC, Hicks E, Johnson J, Huckins LM. Genetically Regulated Gene Expression in the Brain Associated With Chronic Pain: Relationships With Clinical Traits and Potential for Drug Repurposing. Biol Psychiatry 2024; 95:745-761. [PMID: 37678542 PMCID: PMC10924073 DOI: 10.1016/j.biopsych.2023.08.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 07/20/2023] [Accepted: 08/28/2023] [Indexed: 09/09/2023]
Abstract
BACKGROUND Chronic pain is a common, poorly understood condition. Genetic studies including genome-wide association studies have identified many relevant variants, which have yet to be translated into full understanding of chronic pain. Transcriptome-wide association studies using transcriptomic imputation methods such as S-PrediXcan can help bridge this genotype-phenotype gap. METHODS We carried out transcriptomic imputation using S-PrediXcan to identify genetically regulated gene expression associated with multisite chronic pain in 13 brain tissues and whole blood. Then, we imputed genetically regulated gene expression for over 31,000 Mount Sinai BioMe participants and performed a phenome-wide association study to investigate clinical relationships in chronic pain-associated gene expression changes. RESULTS We identified 95 experiment-wide significant gene-tissue associations (p < 7.97 × 10-7), including 36 unique genes and an additional 134 gene-tissue associations reaching within-tissue significance, including 53 additional unique genes. Of the 89 unique genes in total, 59 were novel for multisite chronic pain and 18 are established drug targets. Chronic pain genetically regulated gene expression for 10 unique genes was significantly associated with cardiac dysrhythmia, metabolic syndrome, disc disorders/dorsopathies, joint/ligament sprain, anemias, and neurologic disorder phecodes. Phenome-wide association study analyses adjusting for mean pain score showed that associations were not driven by mean pain score. CONCLUSIONS We carried out the largest transcriptomic imputation study of any chronic pain trait to date. Results highlight potential causal genes in chronic pain development and tissue and direction of effect. Several gene results were also drug targets. Phenome-wide association study results showed significant associations for phecodes including cardiac dysrhythmia and metabolic syndrome, thereby indicating potential shared mechanisms.
Collapse
Affiliation(s)
- Keira J A Johnston
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut.
| | - Alanna C Cote
- Pamela Sklar Division of Psychiatric Genetics, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Emily Hicks
- Pamela Sklar Division of Psychiatric Genetics, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Jessica Johnson
- School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Laura M Huckins
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut.
| |
Collapse
|
8
|
Shyr C, Sulieman L, Harris PA. Illuminating the landscape of high-level clinical trial opportunities in the All of Us Research Program. J Am Med Inform Assoc 2024:ocae062. [PMID: 38622899 DOI: 10.1093/jamia/ocae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 03/02/2024] [Accepted: 03/07/2024] [Indexed: 04/17/2024] Open
Abstract
OBJECTIVE With its size and diversity, the All of Us Research Program has the potential to power and improve representation in clinical trials through ancillary studies like Nutrition for Precision Health. We sought to characterize high-level trial opportunities for the diverse participants and sponsors of future trial investment. MATERIALS AND METHODS We matched All of Us participants with available trials on ClinicalTrials.gov based on medical conditions, age, sex, and geographic location. Based on the number of matched trials, we (1) developed the Trial Opportunities Compass (TOC) to help sponsors assess trial investment portfolios, (2) characterized the landscape of trial opportunities in a phenome-wide association study (PheWAS), and (3) assessed the relationship between trial opportunities and social determinants of health (SDoH) to identify potential barriers to trial participation. RESULTS Our study included 181 529 All of Us participants and 18 634 trials. The TOC identified opportunities for portfolio investment and gaps in currently available trials across federal, industrial, and academic sponsors. PheWAS results revealed an emphasis on mental disorder-related trials, with anxiety disorder having the highest adjusted increase in the number of matched trials (59% [95% CI, 57-62]; P < 1e-300). Participants from certain communities underrepresented in biomedical research, including self-reported racial and ethnic minorities, had more matched trials after adjusting for other factors. Living in a nonmetropolitan area was associated with up to 13.1 times fewer matched trials. DISCUSSION AND CONCLUSION All of Us data are a valuable resource for identifying trial opportunities to inform trial portfolio planning. Characterizing these opportunities with consideration for SDoH can provide guidance on prioritizing the most pressing barriers to trial participation.
Collapse
Affiliation(s)
- Cathy Shyr
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Lina Sulieman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Paul A Harris
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37240, United States
| |
Collapse
|
9
|
Wang X, Liu M, Nogues IE, Chen T, Xiong X, Bonzel CL, Zhang H, Hong C, Xia Y, Dahal K, Costa L, Cui J, Gaziano JM, Kim SC, Ho YL, Cho K, Cai T, Liao KP. Heterogeneous associations between interleukin-6 receptor variants and phenotypes across ancestries and implications for therapy. Sci Rep 2024; 14:8021. [PMID: 38580710 PMCID: PMC10997791 DOI: 10.1038/s41598-024-54063-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 02/08/2024] [Indexed: 04/07/2024] Open
Abstract
The Phenome-Wide Association Study (PheWAS) is increasingly used to broadly screen for potential treatment effects, e.g., IL6R variant as a proxy for IL6R antagonists. This approach offers an opportunity to address the limited power in clinical trials to study differential treatment effects across patient subgroups. However, limited methods exist to efficiently test for differences across subgroups in the thousands of multiple comparisons generated as part of a PheWAS. In this study, we developed an approach that maximizes the power to test for heterogeneous genotype-phenotype associations and applied this approach to an IL6R PheWAS among individuals of African (AFR) and European (EUR) ancestries. We identified 29 traits with differences in IL6R variant-phenotype associations, including a lower risk of type 2 diabetes in AFR (OR 0.96) vs EUR (OR 1.0, p-value for heterogeneity = 8.5 × 10-3), and higher white blood cell count (p-value for heterogeneity = 8.5 × 10-131). These data suggest a more salutary effect of IL6R blockade for T2D among individuals of AFR vs EUR ancestry and provide data to inform ongoing clinical trials targeting IL6 for an expanding number of conditions. Moreover, the method to test for heterogeneity of associations can be applied broadly to other large-scale genotype-phenotype screens in diverse populations.
Collapse
Affiliation(s)
- Xuan Wang
- Department of Population Health Sciences, University of Utah, Salt Lake City, UT, USA
| | - Molei Liu
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| | | | - Tony Chen
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Xin Xiong
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Clara-Lea Bonzel
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Harrison Zhang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, 60 Fenwood Road, Boston, MA, 02115, USA
| | - Chuan Hong
- Department of Biostatistics, Duke University, Durham, NC, USA
| | - Yin Xia
- Department of Statistics and Data Science, Fudan University, Shanghai, China
| | - Kumar Dahal
- Department of Biostatistics, Duke University, Durham, NC, USA
| | - Lauren Costa
- Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston, MA, USA
| | - Jing Cui
- Department of Biostatistics, Duke University, Durham, NC, USA
| | - J Michael Gaziano
- Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston, MA, USA
- Division of Aging, Brigham and Women's Hospital, Boston, MA, USA
| | - Seoyoung C Kim
- Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women's Hospital, Boston, MA, USA
| | - Yuk-Lam Ho
- Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston, MA, USA
| | - Kelly Cho
- Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston, MA, USA
- Division of Aging, Brigham and Women's Hospital, Boston, MA, USA
| | - Tianxi Cai
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Katherine P Liao
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, 60 Fenwood Road, Boston, MA, 02115, USA.
- Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston, MA, USA.
- Rheumatology Section, VA Boston Healthcare System, Boston, USA.
| |
Collapse
|
10
|
Zeng C, Schlueter DJ, Tran TC, Babbar A, Cassini T, Bastarache LA, Denny JC. Comparison of phenomic profiles in the All of Us Research Program against the US general population and the UK Biobank. J Am Med Inform Assoc 2024; 31:846-854. [PMID: 38263490 PMCID: PMC10990551 DOI: 10.1093/jamia/ocad260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 12/05/2023] [Accepted: 01/08/2024] [Indexed: 01/25/2024] Open
Abstract
IMPORTANCE Knowledge gained from cohort studies has dramatically advanced both public and precision health. The All of Us Research Program seeks to enroll 1 million diverse participants who share multiple sources of data, providing unique opportunities for research. It is important to understand the phenomic profiles of its participants to conduct research in this cohort. OBJECTIVES More than 280 000 participants have shared their electronic health records (EHRs) in the All of Us Research Program. We aim to understand the phenomic profiles of this cohort through comparisons with those in the US general population and a well-established nation-wide cohort, UK Biobank, and to test whether association results of selected commonly studied diseases in the All of Us cohort were comparable to those in UK Biobank. MATERIALS AND METHODS We included participants with EHRs in All of Us and participants with health records from UK Biobank. The estimates of prevalence of diseases in the US general population were obtained from the Global Burden of Diseases (GBD) study. We conducted phenome-wide association studies (PheWAS) of 9 commonly studied diseases in both cohorts. RESULTS This study included 287 012 participants from the All of Us EHR cohort and 502 477 participants from the UK Biobank. A total of 314 diseases curated by the GBD were evaluated in All of Us, 80.9% (N = 254) of which were more common in All of Us than in the US general population [prevalence ratio (PR) >1.1, P < 2 × 10-5]. Among 2515 diseases and phenotypes evaluated in both All of Us and UK Biobank, 85.6% (N = 2152) were more common in All of Us (PR >1.1, P < 2 × 10-5). The Pearson correlation coefficients of effect sizes from PheWAS between All of Us and UK Biobank were 0.61, 0.50, 0.60, 0.57, 0.40, 0.53, 0.46, 0.47, and 0.24 for ischemic heart diseases, lung cancer, chronic obstructive pulmonary disease, dementia, colorectal cancer, lower back pain, multiple sclerosis, lupus, and cystic fibrosis, respectively. DISCUSSION Despite the differences in prevalence of diseases in All of Us compared to the US general population or the UK Biobank, our study supports that All of Us can facilitate rapid investigation of a broad range of diseases. CONCLUSION Most diseases were more common in All of Us than in the general US population or the UK Biobank. Results of disease-disease association tests from All of Us are comparable to those estimated in another well-studied national cohort.
Collapse
Affiliation(s)
- Chenjie Zeng
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, United States
| | - David J Schlueter
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, United States
- Department of Health and Society, University of Toronto, Scarborough, Toronto, ON, Canada
| | - Tam C Tran
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, United States
| | - Anav Babbar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, United States
| | - Thomas Cassini
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, United States
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, United States
| | - Lisa A Bastarache
- Center for Precision Medicine, Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Josh C Denny
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, United States
| |
Collapse
|
11
|
Sanchez-Ruiz JA, Coombes BJ, Pazdernik VM, Melhuish Beaupre LM, Jenkins GD, Pendegraft RS, Batzler A, Ozerdem A, McElroy SL, Gardea-Resendez MA, Cuellar-Barboza AB, Prieto ML, Frye MA, Biernacka JM. Clinical and genetic contributions to medical comorbidity in bipolar disorder: a study using electronic health records-linked biobank data. Mol Psychiatry 2024:10.1038/s41380-024-02530-8. [PMID: 38548982 DOI: 10.1038/s41380-024-02530-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 02/21/2024] [Accepted: 03/13/2024] [Indexed: 06/14/2024]
Abstract
Bipolar disorder is a chronic and complex polygenic disease with high rates of comorbidity. However, the independent contribution of either diagnosis or genetic risk of bipolar disorder to the medical comorbidity profile of individuals with the disease remains unresolved. Here, we conducted a multi-step phenome-wide association study (PheWAS) of bipolar disorder using phenomes derived from the electronic health records of participants enrolled in the Mayo Clinic Biobank and the Mayo Clinic Bipolar Disorder Biobank. First, we explored the conditions associated with a diagnosis of bipolar disorder by conducting a phenotype-based PheWAS followed by LASSO-penalized regression to account for correlations within the phenome. Then, we explored the conditions associated with bipolar disorder polygenic risk score (BD-PRS) using a PRS-based PheWAS with a sequential exclusion approach to account for the possibility that diagnosis, instead of genetic risk, may drive such associations. 53,386 participants (58.7% women) with a mean age at analysis of 67.8 years (SD = 15.6) were included. A bipolar disorder diagnosis (n = 1479) was associated with higher rates of psychiatric conditions, injuries and poisonings, endocrine/metabolic and neurological conditions, viral hepatitis C, and asthma. BD-PRS was associated with psychiatric comorbidities but, in contrast, had no positive associations with general medical conditions. While our findings warrant confirmation with longitudinal-prospective studies, the limited associations between bipolar disorder genetics and medical conditions suggest that shared environmental effects or environmental consequences of diagnosis may have a greater impact on the general medical comorbidity profile of individuals with bipolar disorder than its genetic risk.
Collapse
Affiliation(s)
| | - Brandon J Coombes
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | | | | | - Greg D Jenkins
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | | | - Anthony Batzler
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Aysegul Ozerdem
- Department of Psychiatry & Psychology, Mayo Clinic, Rochester, MN, USA
| | - Susan L McElroy
- Lindner Center of HOPE/University of Cincinnati, Cincinnati, OH, USA
| | - Manuel A Gardea-Resendez
- Department of Psychiatry & Psychology, Mayo Clinic, Rochester, MN, USA
- Department of Psychiatry, Universidad Autónoma de Nuevo León, Monterrey, Mexico
| | - Alfredo B Cuellar-Barboza
- Department of Psychiatry & Psychology, Mayo Clinic, Rochester, MN, USA
- Department of Psychiatry, Universidad Autónoma de Nuevo León, Monterrey, Mexico
| | - Miguel L Prieto
- Department of Psychiatry & Psychology, Mayo Clinic, Rochester, MN, USA
- Department of Psychiatry, Faculty of Medicine, Universidad de Los Andes, Santiago, Chile
- Mental Health Service, Clínica Universidad de los Andes, Santiago, Chile
| | - Mark A Frye
- Department of Psychiatry & Psychology, Mayo Clinic, Rochester, MN, USA
| | - Joanna M Biernacka
- Department of Psychiatry & Psychology, Mayo Clinic, Rochester, MN, USA.
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
12
|
De Lillo A, Pathak GA, Low A, De Angelis F, Abou Alaiwi S, Miller EJ, Fuciarelli M, Polimanti R. Clinical spectrum of Transthyretin amyloidogenic mutations among diverse population origins. Hum Genomics 2024; 18:31. [PMID: 38523305 PMCID: PMC10962184 DOI: 10.1186/s40246-024-00596-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 03/08/2024] [Indexed: 03/26/2024] Open
Abstract
PURPOSE Coding mutations in the Transthyretin (TTR) gene cause a hereditary form of amyloidosis characterized by a complex genotype-phenotype correlation with limited information regarding differences among worldwide populations. METHODS We compared 676 diverse individuals carrying TTR amyloidogenic mutations (rs138065384, Phe44Leu; rs730881165, Ala81Thr; rs121918074, His90Asn; rs76992529, Val122Ile) to 12,430 non-carriers matched by age, sex, and genetically-inferred ancestry to assess their clinical presentations across 1,693 outcomes derived from electronic health records in UK biobank. RESULTS In individuals of African descent (AFR), Val122Ile mutation was linked to multiple outcomes related to the circulatory system (fold-enrichment = 2.96, p = 0.002) with the strongest associations being cardiac congenital anomalies (phecode 747.1, p = 0.003), endocarditis (phecode 420.3, p = 0.006), and cardiomyopathy (phecode 425, p = 0.007). In individuals of Central-South Asian descent (CSA), His90Asn mutation was associated with dermatologic outcomes (fold-enrichment = 28, p = 0.001). The same TTR mutation was linked to neoplasms in European-descent individuals (EUR, fold-enrichment = 3.09, p = 0.003). In EUR, Ala81Thr showed multiple associations with respiratory outcomes related (fold-enrichment = 3.61, p = 0.002), but the strongest association was with atrioventricular block (phecode 426.2, p = 2.81 × 10- 4). Additionally, the same mutation in East Asians (EAS) showed associations with endocrine-metabolic traits (fold-enrichment = 4.47, p = 0.003). In the cross-ancestry meta-analysis, Val122Ile mutation was associated with peripheral nerve disorders (phecode 351, p = 0.004) in addition to cardiac congenital anomalies (fold-enrichment = 6.94, p = 0.003). CONCLUSIONS Overall, these findings highlight that TTR amyloidogenic mutations present ancestry-specific and ancestry-convergent associations related to a range of health domains. This supports the need to increase awareness regarding the range of outcomes associated with TTR mutations across worldwide populations to reduce misdiagnosis and delayed diagnosis of TTR-related amyloidosis.
Collapse
Affiliation(s)
- Antonella De Lillo
- Department of Psychiatry, Yale University School of Medicine, 60 Temple, Suite 7A, New Haven, CT, 06510, USA
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Gita A Pathak
- Department of Psychiatry, Yale University School of Medicine, 60 Temple, Suite 7A, New Haven, CT, 06510, USA
- VA CT Healthcare Center, West Haven, CT, USA
| | - Aislinn Low
- Department of Psychiatry, Yale University School of Medicine, 60 Temple, Suite 7A, New Haven, CT, 06510, USA
- VA CT Healthcare Center, West Haven, CT, USA
| | - Flavio De Angelis
- Department of Psychiatry, Yale University School of Medicine, 60 Temple, Suite 7A, New Haven, CT, 06510, USA
- Department of Physical and Mental Health, and Preventive Medicine, University of Campania "Luigi Vanvitelli", Naples, Italy
| | - Sarah Abou Alaiwi
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Edward J Miller
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Maria Fuciarelli
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Renato Polimanti
- Department of Psychiatry, Yale University School of Medicine, 60 Temple, Suite 7A, New Haven, CT, 06510, USA.
- VA CT Healthcare Center, West Haven, CT, USA.
- Wu Tsai Institute, Yale University, New Haven, CT, USA.
| |
Collapse
|
13
|
Jeffery AD, Fabbri D, Reeves RM, Matheny ME. Use of noisy labels as weak learners to identify incompletely ascertainable outcomes: A Feasibility study with opioid-induced respiratory depression. Heliyon 2024; 10:e26434. [PMID: 38444495 PMCID: PMC10912240 DOI: 10.1016/j.heliyon.2024.e26434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 02/09/2024] [Accepted: 02/13/2024] [Indexed: 03/07/2024] Open
Abstract
Objective Assigning outcome labels to large observational data sets in a timely and accurate manner, particularly when outcomes are rare or not directly ascertainable, remains a significant challenge within biomedical informatics. We examined whether noisy labels generated from subject matter experts' heuristics using heterogenous data types within a data programming paradigm could provide outcomes labels to a large, observational data set. We chose the clinical condition of opioid-induced respiratory depression for our use case because it is rare, has no administrative codes to easily identify the condition, and typically requires at least some unstructured text to ascertain its presence. Materials and methods Using de-identified electronic health records of 52,861 post-operative encounters, we applied a data programming paradigm (implemented in the Snorkel software) for the development of a machine learning classifier for opioid-induced respiratory depression. Our approach included subject matter experts creating 14 labeling functions that served as noisy labels for developing a probabilistic Generative model. We used probabilistic labels from the Generative model as outcome labels for training a Discriminative model on the source data. We evaluated performance of the Discriminative model with a hold-out test set of 599 independently-reviewed patient records. Results The final Discriminative classification model achieved an accuracy of 0.977, an F1 score of 0.417, a sensitivity of 1.0, and an AUC of 0.988 in the hold-out test set with a prevalence of 0.83% (5/599). Discussion All of the confirmed Cases were identified by the classifier. For rare outcomes, this finding is encouraging because it reduces the number of manual reviews needed by excluding visits/patients with low probabilities. Conclusion Application of a data programming paradigm with expert-informed labeling functions might have utility for phenotyping clinical phenomena that are not easily ascertainable from highly-structured data.
Collapse
Affiliation(s)
- Alvin D. Jeffery
- Vanderbilt University School of Nursing, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| | - Daniel Fabbri
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ruth M. Reeves
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| | - Michael E. Matheny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| |
Collapse
|
14
|
Woerner J, Sriram V, Nam Y, Verma A, Kim D. Uncovering genetic associations in the human diseasome using an endophenotype-augmented disease network. Bioinformatics 2024; 40:btae126. [PMID: 38527901 PMCID: PMC10963079 DOI: 10.1093/bioinformatics/btae126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 01/17/2024] [Indexed: 03/27/2024] Open
Abstract
MOTIVATION Many diseases, particularly cardiometabolic disorders, exhibit complex multimorbidities with one another. An intuitive way to model the connections between phenotypes is with a disease-disease network (DDN), where nodes represent diseases and edges represent associations, such as shared single-nucleotide polymorphisms (SNPs), between pairs of diseases. To gain further genetic understanding of molecular contributors to disease associations, we propose a novel version of the shared-SNP DDN (ssDDN), denoted as ssDDN+, which includes connections between diseases derived from genetic correlations with intermediate endophenotypes. We hypothesize that a ssDDN+ can provide complementary information to the disease connections in a ssDDN, yielding insight into the role of clinical laboratory measurements in disease interactions. RESULTS Using PheWAS summary statistics from the UK Biobank, we constructed a ssDDN+ revealing hundreds of genetic correlations between diseases and quantitative traits. Our augmented network uncovers genetic associations across different disease categories, connects relevant cardiometabolic diseases, and highlights specific biomarkers that are associated with cross-phenotype associations. Out of the 31 clinical measurements under consideration, HDL-C connects the greatest number of diseases and is strongly associated with both type 2 diabetes and heart failure. Triglycerides, another blood lipid with known genetic causes in non-mendelian diseases, also adds a substantial number of edges to the ssDDN. This work demonstrates how association with clinical biomarkers can better explain the shared genetics between cardiometabolic disorders. Our study can facilitate future network-based investigations of cross-phenotype associations involving pleiotropy and genetic heterogeneity, potentially uncovering sources of missing heritability in multimorbidities. AVAILABILITY AND IMPLEMENTATION The generated ssDDN+ can be explored at https://hdpm.biomedinfolab.com/ddn/biomarkerDDN.
Collapse
Affiliation(s)
- Jakob Woerner
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Vivek Sriram
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Yonghyun Nam
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Anurag Verma
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Dokyoon Kim
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States
| |
Collapse
|
15
|
Rivière JG, Soler Palacín P, Butte MJ. Proceedings from the inaugural Artificial Intelligence in Primary Immune Deficiencies (AIPID) conference. J Allergy Clin Immunol 2024; 153:637-642. [PMID: 38224784 DOI: 10.1016/j.jaci.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 01/09/2024] [Accepted: 01/11/2024] [Indexed: 01/17/2024]
Abstract
Here, we summarize the proceedings of the inaugural Artificial Intelligence in Primary Immune Deficiencies conference, during which experts and advocates gathered to advance research into the applications of artificial intelligence (AI), machine learning, and other computational tools in the diagnosis and management of inborn errors of immunity (IEIs). The conference focused on the key themes of expediting IEI diagnoses, challenges in data collection, roles of natural language processing and large language models in interpreting electronic health records, and ethical considerations in implementation. Innovative AI-based tools trained on electronic health records and claims databases have discovered new patterns of warning signs for IEIs, facilitating faster diagnoses and enhancing patient outcomes. Challenges in training AIs persist on account of data limitations, especially in cases of rare diseases, overlapping phenotypes, and biases inherent in current data sets. Furthermore, experts highlighted the significance of ethical considerations, data protection, and the necessity for open science principles. The conference delved into regulatory frameworks, equity in access, and the imperative for collaborative efforts to overcome these obstacles and harness the transformative potential of AI. Concerted efforts to successfully integrate AI into daily clinical immunology practice are still needed.
Collapse
Affiliation(s)
- Jacques G Rivière
- Infection and Immunity in Pediatric Patients Research Group, Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Pediatric Infectious Diseases and Immunodeficiencies Unit, Hospital Infantil i de la Dona, Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Jeffrey Modell Diagnostic and Research Center for Primary Immunodeficiencies, Barcelona, Spain; Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Pere Soler Palacín
- Infection and Immunity in Pediatric Patients Research Group, Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Pediatric Infectious Diseases and Immunodeficiencies Unit, Hospital Infantil i de la Dona, Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Jeffrey Modell Diagnostic and Research Center for Primary Immunodeficiencies, Barcelona, Spain; Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Manish J Butte
- Division of Immunology, Allergy, and Rheumatology, Department of Pediatrics, University of California Los Angeles, Los Angeles, Calif; Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, Los Angeles, Calif; Department of Human Genetics, University of California Los Angeles, Los Angeles, Calif.
| |
Collapse
|
16
|
Tang AS, Rankin KP, Cerono G, Miramontes S, Mills H, Roger J, Zeng B, Nelson C, Soman K, Woldemariam S, Li Y, Lee A, Bove R, Glymour M, Aghaeepour N, Oskotsky TT, Miller Z, Allen IE, Sanders SJ, Baranzini S, Sirota M. Leveraging electronic health records and knowledge networks for Alzheimer's disease prediction and sex-specific biological insights. NATURE AGING 2024; 4:379-395. [PMID: 38383858 PMCID: PMC10950787 DOI: 10.1038/s43587-024-00573-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 01/19/2024] [Indexed: 02/23/2024]
Abstract
Identification of Alzheimer's disease (AD) onset risk can facilitate interventions before irreversible disease progression. We demonstrate that electronic health records from the University of California, San Francisco, followed by knowledge networks (for example, SPOKE) allow for (1) prediction of AD onset and (2) prioritization of biological hypotheses, and (3) contextualization of sex dimorphism. We trained random forest models and predicted AD onset on a cohort of 749 individuals with AD and 250,545 controls with a mean area under the receiver operating characteristic of 0.72 (7 years prior) to 0.81 (1 day prior). We further harnessed matched cohort models to identify conditions with predictive power before AD onset. Knowledge networks highlight shared genes between multiple top predictors and AD (for example, APOE, ACTB, IL6 and INS). Genetic colocalization analysis supports AD association with hyperlipidemia at the APOE locus, as well as a stronger female AD association with osteoporosis at a locus near MS4A6A. We therefore show how clinical data can be utilized for early AD prediction and identification of personalized biological hypotheses.
Collapse
Affiliation(s)
- Alice S Tang
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA.
- Graduate Program in Bioengineering, University of California, San Francisco and University of California, Berkeley, San Francisco and Berkeley, CA, USA.
| | - Katherine P Rankin
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Memory and Aging Center, Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Gabriel Cerono
- Weill Institute for Neuroscience. Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Silvia Miramontes
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Hunter Mills
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Jacquelyn Roger
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Billy Zeng
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Charlotte Nelson
- Weill Institute for Neuroscience. Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Karthik Soman
- Weill Institute for Neuroscience. Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Sarah Woldemariam
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Yaqiao Li
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Albert Lee
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Riley Bove
- Weill Institute for Neuroscience. Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Maria Glymour
- Department of Anesthesiology, Pain, and Perioperative Medicine, Stanford University, Palo Alto, CA, USA
| | - Nima Aghaeepour
- Department of Anesthesiology, Pain, and Perioperative Medicine, Stanford University, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | - Tomiko T Oskotsky
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Zachary Miller
- Memory and Aging Center, Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Isabel E Allen
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, USA
| | - Stephan J Sanders
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, UK
- Department of Psychiatry and Behavioral Sciences, Weill Institute for Neurosciences, University of California, San Francisco, CA, USA
| | - Sergio Baranzini
- Weill Institute for Neuroscience. Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Marina Sirota
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA.
- Department of Pediatrics, University of California, San Francisco, CA, USA.
| |
Collapse
|
17
|
Deng J, Altintas B, Haley JS, Kim J, Ramos M, Carey DJ, Stewart DR, McReynolds LJ. Most Fanconi anemia heterozygotes are not at increased cancer risk: A genome-first DiscovEHR cohort population study. Genet Med 2024; 26:101042. [PMID: 38063144 PMCID: PMC10939803 DOI: 10.1016/j.gim.2023.101042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 12/01/2023] [Accepted: 12/01/2023] [Indexed: 01/23/2024] Open
Abstract
PURPOSE Fanconi anemia (FA) is a bone marrow failure and cancer predisposition syndrome caused primarily by biallelic pathogenic variants in 1 of 22 genes involved in DNA interstrand cross-link repair. An enduring question concerns cancer risk of those with a single pathogenic FA gene variant. To investigate all FA genes, this study utilized the DiscovEHR cohort of 170,503 individuals with exome sequencing and electronic health data. METHODS 5822 subjects with a single pathogenic variant in an FA gene were identified. Two control groups were used in primary analysis deriving cancer risk signals. Secondary exploratory analysis was conducted using the UK Biobank and The Cancer Genome Atlas. RESULTS Signals for elevated cancer risk were found in all 5 known cancer predisposition genes. Among the remaining 15 genes associated with autosomal recessive inheritance cancer risk signals were found for 4 cancers across 3 genes in the primary cohort but were not validated in secondary cohorts. CONCLUSION To our knowledge, this is the first and largest FA heterozygote study to use genomic ascertainment and validates well-established cancer predispositions in 5 genes, whereas finding insufficient evidence of predisposition in 15 others. Our findings inform clinical surveillance given how common pathogenic FA variants are in the population.
Collapse
Affiliation(s)
- Joseph Deng
- Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD
| | - Burak Altintas
- Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD; Washington University, St. Louis Children's Hospital, St. Louis, MO
| | - Jeremy S Haley
- Department of Genomic Health, Weis Center for Research, Geisinger Medical Center, Danville, PA
| | - Jung Kim
- Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD
| | - Mark Ramos
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD
| | - David J Carey
- Department of Genomic Health, Weis Center for Research, Geisinger Medical Center, Danville, PA
| | - Douglas R Stewart
- Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD
| | - Lisa J McReynolds
- Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD.
| |
Collapse
|
18
|
Sheu YH, Simm J, Wang B, Lee H, Smoller JW. Continuous-Time and Dynamic Suicide Attempt Risk Prediction with Neural Ordinary Differential Equations. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.25.24303343. [PMID: 38464260 PMCID: PMC10925370 DOI: 10.1101/2024.02.25.24303343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Suicide is one of the leading causes of death in the US, and the number of attributable deaths continues to increase. Risk of suicide-related behaviors (SRBs) is dynamic, and SRBs can occur across a continuum of time and locations. However, current SRB risk assessment methods, whether conducted by clinicians or through machine learning models, treat SRB risk as static and are confined to specific times and locations, such as following a hospital visit. Such a paradigm is unrealistic as SRB risk fluctuates and creates time gaps in the availability of risk scores. Here, we develop two closely related model classes, Event-GRU-ODE and Event-GRU-Discretized, that can predict the dynamic risk of events as a continuous trajectory based on Neural ODEs, an advanced AI model class for time series prediction. As such, these models can estimate changes in risk across the continuum of future time points, even without new observations, and can update these estimations as new data becomes available. We train and validate these models for SRB prediction using a large electronic health records database. Both models demonstrated high discrimination performance for SRB prediction (e.g., AUROC > 0.92 in the full, general cohort), serving as an initial step toward developing novel and comprehensive suicide prevention strategies based on dynamic changes in risk.
Collapse
Affiliation(s)
- Yi-han Sheu
- Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Massachusetts General Hospital / Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jaak Simm
- Department of Electrical Engineering, KU Leuven, Leuven, Belgium
| | - Bo Wang
- Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Massachusetts General Hospital / Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Hyunjoon Lee
- Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Massachusetts General Hospital / Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jordan W. Smoller
- Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Massachusetts General Hospital / Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
19
|
Zhuang Y, Kim NY, Fritsche LG, Mukherjee B, Lee S. Incorporating functional annotation with bilevel continuous shrinkage for polygenic risk prediction. BMC Bioinformatics 2024; 25:65. [PMID: 38336614 DOI: 10.1186/s12859-024-05664-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 01/19/2024] [Indexed: 02/12/2024] Open
Abstract
BACKGROUND Genetic variants can contribute differently to trait heritability by their functional categories, and recent studies have shown that incorporating functional annotation can improve the predictive performance of polygenic risk scores (PRSs). In addition, when only a small proportion of variants are causal variants, PRS methods that employ a Bayesian framework with shrinkage can account for such sparsity. It is possible that the annotation group level effect is also sparse. However, the number of PRS methods that incorporate both annotation information and shrinkage on effect sizes is limited. We propose a PRS method, PRSbils, which utilizes the functional annotation information with a bilevel continuous shrinkage prior to accommodate the varying genetic architectures both on the variant-specific level and on the functional annotation level. RESULTS We conducted simulation studies and investigated the predictive performance in settings with different genetic architectures. Results indicated that when there was a relatively large variability of group-wise heritability contribution, the gain in prediction performance from the proposed method was on average 8.0% higher AUC compared to the benchmark method PRS-CS. The proposed method also yielded higher predictive performance compared to PRS-CS in settings with different overlapping patterns of annotation groups and obtained on average 6.4% higher AUC. We applied PRSbils to binary and quantitative traits in three real world data sources (the UK Biobank, the Michigan Genomics Initiative (MGI), and the Korean Genome and Epidemiology Study (KoGES)), and two sources of annotations: ANNOVAR, and pathway information from the Kyoto Encyclopedia of Genes and Genomes (KEGG), and demonstrated that the proposed method holds the potential for improving predictive performance by incorporating functional annotations. CONCLUSIONS By utilizing a bilevel shrinkage framework, PRSbils enables the incorporation of both overlapping and non-overlapping annotations into PRS construction to improve the performance of genetic risk prediction. The software is available at https://github.com/styvon/PRSbils .
Collapse
Affiliation(s)
| | - Na Yeon Kim
- Seoul National University, Seoul, Republic of Korea
| | | | | | - Seunggeun Lee
- Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
20
|
Fu M, Tran T, Eskin E, Lajonchere C, Pasaniuc B, Geschwind DH, Vossel K, Chang TS. Multi-class Modeling Identifies Shared Genetic Risk for Late-onset Epilepsy and Alzheimer's Disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.05.24302353. [PMID: 38370677 PMCID: PMC10871371 DOI: 10.1101/2024.02.05.24302353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Background Previous studies have established a strong link between late-onset epilepsy (LOE) and Alzheimer's disease (AD). However, their shared genetic risk beyond the APOE gene remains unclear. Our study sought to examine the shared genetic factors of AD and LOE, interpret the biological pathways involved, and evaluate how AD onset may be mediated by LOE and shared genetic risks. Methods We defined phenotypes using phecodes mapped from diagnosis codes, with patients' records aged 60-90. A two-step Least Absolute Shrinkage and Selection Operator (LASSO) workflow was used to identify shared genetic variants based on prior AD GWAS integrated with functional genomic data. We calculated an AD-LOE shared risk score and used it as a proxy in a causal mediation analysis. We used electronic health records from an academic health center (UCLA Health) for discovery analyses and validated our findings in a multi-institutional EHR database (All of Us). Results The two-step LASSO method identified 34 shared genetic loci between AD and LOE, including the APOE region. These loci were mapped to 65 genes, which showed enrichment in molecular functions and pathways such as tau protein binding and lipoprotein metabolism. Individuals with high predicted shared risk scores have a higher risk of developing AD, LOE, or both in their later life compared to those with low-risk scores. LOE partially mediates the effect of AD-LOE shared genetic risk on AD (15% proportion mediated on average). Validation results from All of Us were consistent with findings from the UCLA sample. Conclusions We employed a machine learning approach to identify shared genetic risks of AD and LOE. In addition to providing substantial evidence for the significant contribution of the APOE-TOMM40-APOC1 gene cluster to shared risk, we uncovered novel genes that may contribute. Our study is one of the first to utilize All of Us genetic data to investigate AD, and provides valuable insights into the potential common and disease-specific mechanisms underlying AD and LOE, which could have profound implications for the future of disease prevention and the development of targeted treatment strategies to combat the co-occurrence of these two diseases.
Collapse
Affiliation(s)
- Mingzhou Fu
- Mary S. Easton Center for Alzheimer’s Research and Care, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
- Medical Informatics Home Area, Department of Bioinformatics, University of California, Los Angeles, CA 90095, USA
| | - Thai Tran
- Medical Informatics Home Area, Department of Bioinformatics, University of California, Los Angeles, CA 90095, USA
| | - Eleazar Eskin
- Department of Computational Medicine, University of California, Los Angeles, CA 90095, USA
| | - Clara Lajonchere
- Institute of Precision Health, University of California, Los Angeles, CA 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Bogdan Pasaniuc
- Department of Computational Medicine, University of California, Los Angeles, CA 90095, USA
| | - Daniel H. Geschwind
- Institute of Precision Health, University of California, Los Angeles, CA 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Keith Vossel
- Mary S. Easton Center for Alzheimer’s Research and Care, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Timothy S Chang
- Mary S. Easton Center for Alzheimer’s Research and Care, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
21
|
Pankratz N, Cole BR, Beutel KM, Liao KP, Ashe J. Parkinson Disease Genetics Extended to African and Hispanic Ancestries in the VA Million Veteran Program. Neurol Genet 2024; 10:e200110. [PMID: 38130828 PMCID: PMC10732342 DOI: 10.1212/nxg.0000000000200110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 10/06/2023] [Indexed: 12/23/2023]
Abstract
Background and Objectives Nearly all genetic analyses of Parkinson disease (PD) have been in populations of European ancestry. We sought to test the ability of a machine learning method to extract accurate PD diagnoses from an electronic medical record (EMR) system, to see whether genetic variants identified in European populations generalize to individuals of African and Hispanic ancestries, and to compare the rates of PD across ancestries. Methods A machine learning method using natural language processing was applied to EMRs of US veterans participating in the VA Million Veteran Program (MVP) to identify individuals with PD. These putative cases were vetted via blind chart review by a movement disorder specialist. A polygenic risk score (PRS) of 90 established genetic variants whose genotypes were imputed from a customized Axiom Biobank Array was evaluated in different case groups. Results The EMR prediction scores had a distinct trimodal distribution, with 97% of the high group and only 30% of the middle group having a credible diagnosis of PD. Using the 3,542 cases from the high group matched 4:1 to controls, the PRS was highly predictive in individuals of European ancestry (n = 3,137 cases; OR = 1.82; p = 8.01E-48), and nearly identical effect sizes were seen in individuals of African (n = 184; OR = 2.07; p = 3.4E-4) and Hispanic ancestries (n = 221; OR = 2.13; p = 3.9E-6). The PRS was much less predictive for the 2,757 European ancestry cases who had an ICD code for PD but for whom the machine learning method had a lower confidence in their diagnosis. No novel ancestry-specific genetic variants were identified. Individuals with African ancestry had one-quarter the rate of PD compared with European or Hispanic ancestries aged 60-70 years and one half the rate in the 70-80 years age range. African American cases had a higher proportion of their DNA originating in Europe compared with African American controls. Discussion Machine learning can reliably classify PD using data from a large EMR. Larger studies of non-European populations are required to confirm the generalizability of PD risk variants identified in populations of European ancestry and the increased risk coming from a higher proportion of European DNA in African Americans.
Collapse
Affiliation(s)
- Nathan Pankratz
- From the Department of Laboratory Medicine and Pathology (N.P., B.R.C., K.M.B.), School of Medicine, University of Minnesota, Minneapolis; Division of Rheumatology (K.P.L.), Immunology, and Allergy, Brigham and Women's Hospital; Department of Biomedical Informatics (K.P.L.), Harvard Medical School; Division of Data Sciences (K.P.L.), VA Boston Healthcare System, MA; Department of Neurology (J.A.), University of Minnesota Medical School; and Department of Neurology (J.A.), Minneapolis Veterans Affairs Health Care System, MN
| | - Benjamin R Cole
- From the Department of Laboratory Medicine and Pathology (N.P., B.R.C., K.M.B.), School of Medicine, University of Minnesota, Minneapolis; Division of Rheumatology (K.P.L.), Immunology, and Allergy, Brigham and Women's Hospital; Department of Biomedical Informatics (K.P.L.), Harvard Medical School; Division of Data Sciences (K.P.L.), VA Boston Healthcare System, MA; Department of Neurology (J.A.), University of Minnesota Medical School; and Department of Neurology (J.A.), Minneapolis Veterans Affairs Health Care System, MN
| | - Kathleen M Beutel
- From the Department of Laboratory Medicine and Pathology (N.P., B.R.C., K.M.B.), School of Medicine, University of Minnesota, Minneapolis; Division of Rheumatology (K.P.L.), Immunology, and Allergy, Brigham and Women's Hospital; Department of Biomedical Informatics (K.P.L.), Harvard Medical School; Division of Data Sciences (K.P.L.), VA Boston Healthcare System, MA; Department of Neurology (J.A.), University of Minnesota Medical School; and Department of Neurology (J.A.), Minneapolis Veterans Affairs Health Care System, MN
| | - Katherine P Liao
- From the Department of Laboratory Medicine and Pathology (N.P., B.R.C., K.M.B.), School of Medicine, University of Minnesota, Minneapolis; Division of Rheumatology (K.P.L.), Immunology, and Allergy, Brigham and Women's Hospital; Department of Biomedical Informatics (K.P.L.), Harvard Medical School; Division of Data Sciences (K.P.L.), VA Boston Healthcare System, MA; Department of Neurology (J.A.), University of Minnesota Medical School; and Department of Neurology (J.A.), Minneapolis Veterans Affairs Health Care System, MN
| | - James Ashe
- From the Department of Laboratory Medicine and Pathology (N.P., B.R.C., K.M.B.), School of Medicine, University of Minnesota, Minneapolis; Division of Rheumatology (K.P.L.), Immunology, and Allergy, Brigham and Women's Hospital; Department of Biomedical Informatics (K.P.L.), Harvard Medical School; Division of Data Sciences (K.P.L.), VA Boston Healthcare System, MA; Department of Neurology (J.A.), University of Minnesota Medical School; and Department of Neurology (J.A.), Minneapolis Veterans Affairs Health Care System, MN
| |
Collapse
|
22
|
Jeffery AD, Fabbri D, Reeves RM, Matheny ME. Use of Noisy Labels as Weak Learners to Identify Incompletely Ascertainable Outcomes: A Feasibility Study with Opioid-Induced Respiratory Depression. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.29.24301963. [PMID: 38352435 PMCID: PMC10863026 DOI: 10.1101/2024.01.29.24301963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]
Abstract
Objective Assigning outcome labels to large observational data sets in a timely and accurate manner, particularly when outcomes are rare or not directly ascertainable, remains a significant challenge within biomedical informatics. We examined whether noisy labels generated from subject matter experts' heuristics using heterogenous data types within a data programming paradigm could provide outcomes labels to a large, observational data set. We chose the clinical condition of opioid-induced respiratory depression for our use case because it is rare, has no administrative codes to easily identify the condition, and typically requires at least some unstructured text to ascertain its presence. Materials and Methods Using de-identified electronic health records of 52,861 post-operative encounters, we applied a data programming paradigm (implemented in the Snorkel software) for the development of a machine learning classifier for opioid-induced respiratory depression. Our approach included subject matter experts creating 14 labeling functions that served as noisy labels for developing a probabilistic Generative model. We used probabilistic labels from the Generative model as outcome labels for training a Discriminative model on the source data. We evaluated performance of the Discriminative model with a hold-out test set of 599 independently-reviewed patient records. Results The final Discriminative classification model achieved an accuracy of 0.977, an F1 score of 0.417, a sensitivity of 1.0, and an AUC of 0.988 in the hold-out test set with a prevalence of 0.83% (5/599). Discussion All of the confirmed Cases were identified by the classifier. For rare outcomes, this finding is encouraging because it reduces the number of manual reviews needed by excluding visits/patients with low probabilities. Conclusion Application of a data programming paradigm with expert-informed labeling functions might have utility for phenotyping clinical phenomena that are not easily ascertainable from highly-structured data.
Collapse
Affiliation(s)
- Alvin D Jeffery
- School of Nursing, Vanderbilt University, Department of Biomedical Informatics, Vanderbilt University Medical Center, Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| | - Daniel Fabbri
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ruth M Reeves
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| | - Michael E Matheny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| |
Collapse
|
23
|
Davis H, Tang LA, M Picou E, Bastarache L, Tharpe AM. The Use of Electronic Health Records for Behavioral Phenotyping of School-Age Children With Unilateral Hearing Loss: A Methodological Approach. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:254-268. [PMID: 38056484 DOI: 10.1044/2023_jslhr-22-00610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/08/2023]
Abstract
PURPOSE This methodological study describes a technique for extracting information from de-identified electronic health records (EHRs) to identify occurrences of permanent unilateral hearing loss (UHL) and associated educational comorbidities. METHOD This was an exploratory methodological study utilizing approximately 3.3 million de-identified medical records. Structured and unstructured data were extracted using both automated and manual methods. When both methods were available, positive and negative predictive values were calculated to evaluate the utility of using automated methods. RESULTS We defined a cohort of 471 records that met our criteria of school-age children with permanent UHL and no additional significant disabilities/diagnoses. Fifty-one percent of the children reflected in this cohort had indicators of adverse educational progress, defined as documentation of receiving educational services, speech-language therapy, and/or parental/teacher concern, with 12% of records reflecting overlapping services/concerns. Negative predictive values were generally high and positive predictive values were generally low, suggesting automated searches are useful for excluding factors of interest, but not finding them. CONCLUSIONS This study demonstrates the feasibility of using EHRs in examining UHL in school-age children. By restricting our cohort to individuals who were seen in audiology clinic, we were able to capture variables such as educational difficulty that are not routinely ascertained in medical contexts. The proportion of children in this cohort demonstrating a marker of adverse educational progress is consistent with numerous prior observational studies, thus providing validity to this ascertainment approach. We describe challenges encountered in creating this cohort and detail our hybrid approach to ascertaining key variables accurately.
Collapse
Affiliation(s)
- Hilary Davis
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Leigh Anne Tang
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Erin M Picou
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Anne Marie Tharpe
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
24
|
Gupta S, Jordan IK, Mariño-Ramírez L. Sick individuals, sick populations revisited: a test of the Rose hypothesis for type 2 diabetes disparities. BMJ PUBLIC HEALTH 2023; 1:e000655. [PMID: 38239263 PMCID: PMC10795613 DOI: 10.1136/bmjph-2023-000655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2024]
Abstract
Introduction The Rose hypothesis predicts that since genetic variation is greater within than between populations, genetic risk factors will be associated with individuals' risk of disease but not population disparities, and since socioenvironmental variation is greater between than within populations, socioenvironmental risk factors will be associated with population disparities but not individuals' disease risk. Methods We used the UK Biobank to test the Rose hypothesis for type 2 diabetes (T2D) ethnic disparities in the UK. Our cohort consists of 26 912 participants from Asian, black and white ethnic groups. Participants were characterised as T2D cases or controls based on the presence or absence of T2D diagnosis codes in electronic health records. T2D genetic risk was measured using a polygenic risk score (PRS), and socioeconomic deprivation was measured with the Townsend Index (TI). The variation of genetic (PRS) and socioeconomic (TI) risk factors within and between ethnic groups was calculated using analysis of variance. Multivariable logistic regression was used to associate PRS and TI with T2D cases, and mediation analysis was used to analyse the effect of PRS and TI on T2D ethnic group disparities. Results T2D prevalence differs for Asian 23.34% (OR=5.14, CI=4.68 to 5.65), black 16.64% (OR=3.81, CI=3.44 to 4.22) and white 7.35% (reference) ethnic groups in the UK. Both genetic and socioenvironmental T2D risk factors show greater within (w) than between (b) ethnic group variation: PRS w=64.60%, b=35.40%; TI w=71.18%, b=28.19%. Nevertheless, both genetic risk (PRS OR=1.96, CI=1.87 to 2.07) and socioeconomic deprivation (TI OR=1.09, CI=1.08 to 1.10) are associated with T2D individual risk and mediate T2D ethnic disparities (Asian PRS=22.5%, TI=9.8%; black PRS=32.0%, TI=25.3%). Conclusion A relative excess of within-group versus between-group variation does not preclude T2D risk factors from contributing to T2D ethnic disparities. Our results support an integrative approach to health disparities research that includes both genetic and socioenvironmental risk factors.
Collapse
Affiliation(s)
- Sonali Gupta
- National Institute on Minority Health and Health Disparities, National Institutes of Health, Rockville, Maryland, USA
| | - I King Jordan
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Leonardo Mariño-Ramírez
- National Institute on Minority Health and Health Disparities, National Institutes of Health, Rockville, Maryland, USA
| |
Collapse
|
25
|
Jordan DM, Vy HMT, Do R. A deep learning transformer model predicts high rates of undiagnosed rare disease in large electronic health systems. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.21.23300393. [PMID: 38196638 PMCID: PMC10775679 DOI: 10.1101/2023.12.21.23300393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
It is estimated that as many as 1 in 16 people worldwide suffer from rare diseases. Rare disease patients face difficulty finding diagnosis and treatment for their conditions, including long diagnostic odysseys, multiple incorrect diagnoses, and unavailable or prohibitively expensive treatments. As a result, it is likely that large electronic health record (EHR) systems include high numbers of participants suffering from undiagnosed rare disease. While this has been shown in detail for specific diseases, these studies are expensive and time consuming and have only been feasible to perform for a handful of the thousands of known rare diseases. The bulk of these undiagnosed cases are effectively hidden, with no straightforward way to differentiate them from healthy controls. The ability to access them at scale would enormously expand our capacity to study and develop drugs for rare diseases, adding to tools aimed at increasing availability of study cohorts for rare disease. In this study, we train a deep learning transformer algorithm, RarePT (Rare-Phenotype Prediction Transformer), to impute undiagnosed rare disease from EHR diagnosis codes in 436,407 participants in the UK Biobank and validated on an independent cohort from 3,333,560 individuals from the Mount Sinai Health System. We applied our model to 155 rare diagnosis codes with fewer than 250 cases each in the UK Biobank and predicted participants with elevated risk for each diagnosis, with the number of participants predicted to be at risk ranging from 85 to 22,000 for different diagnoses. These risk predictions are significantly associated with increased mortality for 65% of diagnoses, with disease burden expressed as disability-adjusted life years (DALY) for 73% of diagnoses, and with 72% of available disease-specific diagnostic tests. They are also highly enriched for known rare diagnoses in patients not included in the training set, with an odds ratio (OR) of 48.0 in cross-validation cohorts of the UK Biobank and an OR of 30.6 in the independent Mount Sinai Health System cohort. Most importantly, RarePT successfully screens for undiagnosed patients in 32 rare diseases with available diagnostic tests in the UK Biobank. Using the trained model to estimate the prevalence of undiagnosed disease in the UK Biobank for these 32 rare phenotypes, we find that at least 50% of patients remain undiagnosed for 20 of 32 diseases. These estimates provide empirical evidence of a high prevalence of undiagnosed rare disease, as well as demonstrating the enormous potential benefit of using RarePT to screen for undiagnosed rare disease patients in large electronic health systems.
Collapse
Affiliation(s)
- Daniel M. Jordan
- Center for Genomic Data Analytics, Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ha My T. Vy
- Center for Genomic Data Analytics, Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ron Do
- Center for Genomic Data Analytics, Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
26
|
Huang SY, Johnathan R, Shah N, Srivastava P, Huang AA, Gress F. Technical Report: Protocol for Characterizing Phenotype Variants Using Phenome-Wide Association Study (PheWAS) Utilizing the Nationwide Inpatient Sample 2020 in Individuals With Pancreatic Cysts and Lung Cancer. Cureus 2023; 15:e50982. [PMID: 38259398 PMCID: PMC10801675 DOI: 10.7759/cureus.50982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/22/2023] [Indexed: 01/24/2024] Open
Abstract
This technical report serves as a comprehensive guide for conducting a phenome-wide association study (PheWAS) utilizing data extracted from the Nationwide Inpatient Sample 2020. Specifically tailored to individuals diagnosed with pancreatic cysts and lung cancer, the report establishes a step-by-step workflow designed to assist researchers in uncovering potential associations within this specific cohort. The methodology outlined in the report ensures clarity and reproducibility by employing a curated cohort sourced from the GitHub repository and executed using R for robust data analysis. The code encompasses pivotal steps, including the utilization of a QQ plot as a crucial diagnostic tool aimed at identifying systematic biases or associations. Additionally, the report incorporates the creation of a Manhattan plot, delving into essential mathematical considerations to enhance the interpretability of the results. Notably, the report elucidates the handling of the International Classification of Disease version 10 (ICD-10) codes, providing a sample approach for their segmentation to analyze associations by diagnostic categories. The segmentation aligns with the guidelines outlined in the American Medical Association's ICD-10-CM 2022, the Complete Official Codebook with Guidelines (American Medical Association Press, 2021), ensuring a standardized and rigorous analytical process. This comprehensive guide equips researchers with the tools and insights needed to navigate the complexities of PheWAS within the context of pancreatic cysts and lung cancer, fostering transparency, reproducibility, and meaningful scientific exploration.
Collapse
Affiliation(s)
- Samuel Y Huang
- Internal Medicine, Icahn School of Medicine at Mount Sinai South Nassau, Oceanside, USA
| | - Reyes Johnathan
- Gastroenterology, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Neal Shah
- Gastroenterology, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Pranay Srivastava
- Gastroenterology, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Alexander A Huang
- General Surgery, Northwestern University Feinberg School of Medicine, Chicago, USA
| | - Frank Gress
- Gastroenterology, Icahn School of Medicine at Mount Sinai, New York, USA
| |
Collapse
|
27
|
Cassini T, Bastarache L, Zeng C, Han ST, Wang J, He J, Denny JC. A test of automated use of electronic health records to aid in diagnosis of genetic disease. Genet Med 2023; 25:100966. [PMID: 37622442 PMCID: PMC10840718 DOI: 10.1016/j.gim.2023.100966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 08/16/2023] [Accepted: 08/17/2023] [Indexed: 08/26/2023] Open
Abstract
PURPOSE Automated use of electronic health records may aid in decreasing the diagnostic delay for rare diseases. The phenotype risk score (PheRS) is a weighted aggregate of syndromically related phenotypes that measures the similarity between an individual's conditions and features of a disease. For some diseases, there are individuals without a diagnosis of that disease who have scores similar to diagnosed patients. These individuals may have that disease but not yet be diagnosed. METHODS We calculated the PheRS for cystic fibrosis (CF) for 965,626 subjects in the Vanderbilt University Medical Center electronic health record. RESULTS Of the 400 subjects with the highest PheRS for CF, 248 (62%) had been diagnosed with CF. Twenty-six of the remaining participants, those who were alive and had DNA available in the linked DNA biobank, underwent clinical review and sequencing analysis of CFTR and SERPINA1. This uncovered a potential diagnosis for 2 subjects, 1 with CF and 1 with alpha-1-antitrypsin deficiency. An additional 7 subjects had pathogenic or likely pathogenic variants, 2 in CFTR and 5 in SERPINA1. CONCLUSION These findings may be clinically actionable for the providers caring for these patients. Importantly, this study highlights feasibility and challenges for future implications of this approach.
Collapse
Affiliation(s)
- Thomas Cassini
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD; Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN.
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Chenjie Zeng
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Sangwoo T Han
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Janey Wang
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Jing He
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Joshua C Denny
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| |
Collapse
|
28
|
Lam V, Sharma S, Gupta S, Spouge JL, Jordan IK, Mariño-Ramírez L. Ancestry-attenuated effects of socioeconomic deprivation on type 2 diabetes disparities in the All of Us cohort. BMC GLOBAL AND PUBLIC HEALTH 2023; 1:22. [PMID: 38045036 PMCID: PMC10693462 DOI: 10.1186/s44263-023-00025-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 09/28/2023] [Indexed: 12/05/2023]
Abstract
Background Diabetes is a common disease with a major burden on morbidity, mortality, and productivity. Type 2 diabetes (T2D) accounts for roughly 90% of all diabetes cases in the USA and has a greater observed prevalence among those who identify as Black or Hispanic. Methods This study aimed to assess T2D racial and ethnic disparities using the All of Us Research Program data and to measure associations between genetic ancestry (GA), socioeconomic deprivation, and T2D. We used the All of Us Researcher Workbench to analyze T2D prevalence and model its associations with GA, individual-level (iSDI), and zip code-based (zSDI) socioeconomic deprivation indices among participant self-identified race and ethnicity (SIRE) groups. Results The study cohort of 86,488 participants from the four largest SIRE groups in All of Us: Asian (n = 2311), Black (n = 16,282), Hispanic (n = 16,966), and White (n = 50,292). SIRE groups show characteristic genetic ancestry patterns, consistent with their diverse origins, together with a continuum of ancestry fractions within and between groups. The Black and Hispanic groups show the highest levels of socioeconomic deprivation, followed by the Asian and White groups. Black participants show the highest age- and sex-adjusted T2D prevalence (21.9%), followed by the Hispanic (19.9%), Asian (15.1%), and White (14.8%) groups. Minority SIRE groups and socioeconomic deprivation, both iSDI and zSDI, are positively associated with T2D, when the entire cohort is analyzed together. However, SIRE and GA both show negative interaction effects with iSDI and zSDI on T2D. Higher levels of iSDI and zSDI are negatively associated with T2D in the Black and Hispanic groups, and higher levels of iSDI and zSDI are negatively associated with T2D at high levels of African and Native American ancestry. Conclusions Socioeconomic deprivation is associated with a higher prevalence of T2D in Black and Hispanic minority groups, compared to the majority White group. Nonetheless, socioeconomic deprivation is associated with reduced T2D risk within the Black and Hispanic groups. These results are paradoxical and have not been reported elsewhere, with possible explanations related to the nature of the All of Us data along with SIRE group differences in access to healthcare, diet, and lifestyle.
Collapse
Affiliation(s)
- Vincent Lam
- National Institute on Minority Health and Health Disparities, National Institutes of Health, 11545 Rockville Pike, Building 11545 Rockville Pike, 2WF Room C14, Rockville, MD 20818 USA
| | - Shivam Sharma
- National Institute on Minority Health and Health Disparities, National Institutes of Health, 11545 Rockville Pike, Building 11545 Rockville Pike, 2WF Room C14, Rockville, MD 20818 USA
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA USA
| | - Sonali Gupta
- National Institute on Minority Health and Health Disparities, National Institutes of Health, 11545 Rockville Pike, Building 11545 Rockville Pike, 2WF Room C14, Rockville, MD 20818 USA
| | - John L. Spouge
- National Library of Medicine, National Institutes of Health, Bethesda, MD USA
| | - I. King Jordan
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA USA
| | - Leonardo Mariño-Ramírez
- National Institute on Minority Health and Health Disparities, National Institutes of Health, 11545 Rockville Pike, Building 11545 Rockville Pike, 2WF Room C14, Rockville, MD 20818 USA
| |
Collapse
|
29
|
Nguyen NH, Sarangi S, McChesney EM, Sheng S, Durrant JD, Porter AW, Kleyman TR, Pitluk ZW, Brodsky JL. Genome mining yields putative disease-associated ROMK variants with distinct defects. PLoS Genet 2023; 19:e1011051. [PMID: 37956218 PMCID: PMC10695394 DOI: 10.1371/journal.pgen.1011051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 12/04/2023] [Accepted: 11/04/2023] [Indexed: 11/15/2023] Open
Abstract
Bartter syndrome is a group of rare genetic disorders that compromise kidney function by impairing electrolyte reabsorption. Left untreated, the resulting hyponatremia, hypokalemia, and dehydration can be fatal, and there is currently no cure. Bartter syndrome type II specifically arises from mutations in KCNJ1, which encodes the renal outer medullary potassium channel, ROMK. Over 40 Bartter syndrome-associated mutations in KCNJ1 have been identified, yet their molecular defects are mostly uncharacterized. Nevertheless, a subset of disease-linked mutations compromise ROMK folding in the endoplasmic reticulum (ER), which in turn results in premature degradation via the ER associated degradation (ERAD) pathway. To identify uncharacterized human variants that might similarly lead to premature degradation and thus disease, we mined three genomic databases. First, phenotypic data in the UK Biobank were analyzed using a recently developed computational platform to identify individuals carrying KCNJ1 variants with clinical features consistent with Bartter syndrome type II. In parallel, we examined genomic data in both the NIH TOPMed and ClinVar databases with the aid of Rhapsody, a verified computational algorithm that predicts mutation pathogenicity and disease severity. Subsequent phenotypic studies using a yeast screen to assess ROMK function-and analyses of ROMK biogenesis in yeast and human cells-identified four previously uncharacterized mutations. Among these, one mutation uncovered from the two parallel approaches (G228E) destabilized ROMK and targeted it for ERAD, resulting in reduced cell surface expression. Another mutation (T300R) was ERAD-resistant, but defects in channel activity were apparent based on two-electrode voltage clamp measurements in X. laevis oocytes. Together, our results outline a new computational and experimental pipeline that can be applied to identify disease-associated alleles linked to a range of other potassium channels, and further our understanding of the ROMK structure-function relationship that may aid future therapeutic strategies to advance precision medicine.
Collapse
Affiliation(s)
- Nga H. Nguyen
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Srikant Sarangi
- Paradigm4, Inc., Waltham, Massachusetts, United States of America
| | - Erin M. McChesney
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Shaohu Sheng
- Renal-Electrolyte Division, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Jacob D. Durrant
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Aidan W. Porter
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Thomas R. Kleyman
- Renal-Electrolyte Division, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | | | - Jeffrey L. Brodsky
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
30
|
Shuey MM, Stead WW, Aka I, Barnado AL, Bastarache JA, Brokamp E, Campbell M, Carroll RJ, Goldstein JA, Lewis A, Malow BA, Mosley JD, Osterman T, Padovani-Claudio DA, Ramirez A, Roden DM, Schuler BA, Siew E, Sucre J, Thomsen I, Tinker RJ, Van Driest S, Walsh C, Warner JL, Wells QS, Wheless L, Bastarache L. Next-generation phenotyping: introducing phecodeX for enhanced discovery research in medical phenomics. Bioinformatics 2023; 39:btad655. [PMID: 37930895 PMCID: PMC10627409 DOI: 10.1093/bioinformatics/btad655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 09/13/2023] [Indexed: 11/08/2023] Open
Abstract
MOTIVATION Phecodes are widely used and easily adapted phenotypes based on International Classification of Diseases codes. The current version of phecodes (v1.2) was designed primarily to study common/complex diseases diagnosed in adults; however, there are numerous limitations in the codes and their structure. RESULTS Here, we present phecodeX, an expanded version of phecodes with a revised structure and 1,761 new codes. PhecodeX adds granularity to phenotypes in key disease domains that are under-represented in the current phecode structure-including infectious disease, pregnancy, congenital anomalies, and neonatology-and is a more robust representation of the medical phenome for global use in discovery research. AVAILABILITY AND IMPLEMENTATION phecodeX is available at https://github.com/PheWAS/phecodeX.
Collapse
Affiliation(s)
- Megan M Shuey
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - William W Stead
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Ida Aka
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - April L Barnado
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Julie A Bastarache
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Elly Brokamp
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Meredith Campbell
- Department of Pediatrics, Virginia Commonwealth University, Richmond, VA 23219, United States
| | - Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Jeffrey A Goldstein
- Department of Pathology, Northwestern Feinberg School of Medicine, Chicago, IL 60611, United States
| | - Adam Lewis
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Beth A Malow
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Jonathan D Mosley
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Travis Osterman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Dolly A Padovani-Claudio
- Department of Ophthalmology, Vanderbilt University Medical Center, Nashville, TN 37232, United States
| | - Andrea Ramirez
- All of Us Research Program, National Institutes of Health, Bethesda, MD 20892, United States
| | - Dan M Roden
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Bryce A Schuler
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Edward Siew
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Jennifer Sucre
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Isaac Thomsen
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Rory J Tinker
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Sara Van Driest
- All of Us Research Program, National Institutes of Health, Bethesda, MD 20892, United States
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Colin Walsh
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Jeremy L Warner
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Quinn S Wells
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Lee Wheless
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| |
Collapse
|
31
|
Gagliano Taliun SA, Dinsmore IR, Mirshahi T, Chang AR, Paterson AD, Barua M. GWAS for the composite traits of hematuria and albuminuria. Sci Rep 2023; 13:18084. [PMID: 37872228 PMCID: PMC10593773 DOI: 10.1038/s41598-023-45102-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 10/16/2023] [Indexed: 10/25/2023] Open
Abstract
Our GWAS of hematuria in the UK Biobank identified 6 loci, some of which overlap with loci for albuminuria suggesting pleiotropy. Since clinical syndromes are often defined by combinations of traits, generating a combined phenotype can improve power to detect loci influencing multiple characteristics. Thus the composite trait of hematuria and albuminuria was chosen to enrich for glomerular pathologies. Cases had both hematuria defined by ICD codes and albuminuria defined as uACR > 3 mg/mmol. Controls had neither an ICD code for hematuria nor an uACR > 3 mg/mmol. 2429 cases and 343,509 controls from the UK Biobank were included. eGFR was lower in cases compared to controls, with the exception of the comparison in females using CKD-EPI after age adjustment. Variants at 4 loci met genome-wide significance with the following nearest genes: COL4A4, TRIM27, ETV1 and CUBN. TRIM27 is part of the extended MHC locus. All loci with the exception of ETV1 were replicated in the Geisinger MyCode cohort. The previous GWAS of hematuria reported COL4A3-COL4A4 variants and HLA-B*0801 within MHC, which is in linkage disequilibrium with the TRIM27 variant (D' = 0.59). TRIM27 is highly expressed in the tubules. Additional loci included a coding sequence variant in CUBN (p.Ala2914Val, MAF = 0.014 (A), p = 3.29E-8, OR = 2.09, 95% CI = 1.61-2.72). Overall, GWAS for the composite trait of hematuria and albuminuria identified 4 loci, 2 of which were not previously identified in a GWAS of hematuria.
Collapse
Affiliation(s)
- Sarah A Gagliano Taliun
- Department of Medicine and Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
- Montréal Heart Institute, Montréal, QC, Canada
| | - Ian R Dinsmore
- Department of Genomic Health, Geisinger, Danville, PA, USA
| | | | - Alexander R Chang
- Department of Population Health Sciences, Center for Kidney Health Research, Geisinger, Danville, PA, USA
- Department of Nephrology, Geisinger, Danville, PA, USA
| | - Andrew D Paterson
- Divisions of Epidemiology and Biostatistics, Dalla Lana School of Public Health, Toronto, ON, Canada.
- Genetics and Genome Biology, Research Institute at the Hospital for Sick Children, Toronto, ON, Canada.
- Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada.
| | - Moumita Barua
- Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada.
- Division of Nephrology, University Health Network, Toronto, ON, Canada.
- Department of Medicine, University of Toronto, Toronto, ON, Canada.
- Toronto General Hospital Research Institute, 8NU-855, 200 Elizabeth Street, Toronto, ON, M5G2C4, Canada.
| |
Collapse
|
32
|
Hartmann S, Yasmeen S, Jacobs BM, Denaxas S, Pirmohamed M, Gamazon ER, Caulfield MJ, Hemingway H, Pietzner M, Langenberg C. ADRA2A and IRX1 are putative risk genes for Raynaud's phenomenon. Nat Commun 2023; 14:6156. [PMID: 37828025 PMCID: PMC10570309 DOI: 10.1038/s41467-023-41876-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 09/21/2023] [Indexed: 10/14/2023] Open
Abstract
Raynaud's phenomenon (RP) is a common vasospastic disorder that causes severe pain and ulcers, but despite its high reported heritability, no causal genes have been robustly identified. We conducted a genome-wide association study including 5,147 RP cases and 439,294 controls, based on diagnoses from electronic health records, and identified three unreported genomic regions associated with the risk of RP (p < 5 × 10-8). We prioritized ADRA2A (rs7090046, odds ratio (OR) per allele: 1.26; 95%-CI: 1.20-1.31; p < 9.6 × 10-27) and IRX1 (rs12653958, OR: 1.17; 95%-CI: 1.12-1.22, p < 4.8 × 10-13) as candidate causal genes through integration of gene expression in disease relevant tissues. We further identified a likely causal detrimental effect of low fasting glucose levels on RP risk (rG = -0.21; p-value = 2.3 × 10-3), and systematically highlighted drug repurposing opportunities, like the antidepressant mirtazapine. Our results provide the first robust evidence for a strong genetic contribution to RP and highlight a so far underrated role of α2A-adrenoreceptor signalling, encoded at ADRA2A, as a possible mechanism for hypersensitivity to catecholamine-induced vasospasms.
Collapse
Affiliation(s)
- Sylvia Hartmann
- Computational Medicine, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Summaira Yasmeen
- Computational Medicine, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Benjamin M Jacobs
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, UK
- Health Data Research UK, London, UK
- British Heart Foundation Data Science Centre, London, UK
- National Institute of Health Research University College London Hospitals Biomedical Research Centre, London, UK
| | - Munir Pirmohamed
- Department of Pharmacology and Therapeutics, The Wolfson Centre for Personalised Medicine, University Liverpool, Liverpool, UK
| | - Eric R Gamazon
- Division of Genetic Medicine and Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Mark J Caulfield
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Harry Hemingway
- Institute of Health Informatics, University College London, London, UK
- Health Data Research UK, London, UK
- National Institute of Health Research University College London Hospitals Biomedical Research Centre, London, UK
| | - Maik Pietzner
- Computational Medicine, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany.
- MRC Epidemiology Unit, University of Cambridge, Cambridge, UK.
- Precision Healthcare University Research Institute, Queen Mary University of London, London, UK.
| | - Claudia Langenberg
- Computational Medicine, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany.
- MRC Epidemiology Unit, University of Cambridge, Cambridge, UK.
- Precision Healthcare University Research Institute, Queen Mary University of London, London, UK.
| |
Collapse
|
33
|
Powell W, Song X, Mohamed Y, Walsh D, Parks EJ, McMahon TM, Khan M, Waitman LR. Medications and conditions associated with weight loss in patients prescribed semaglutide based on real-world data. Obesity (Silver Spring) 2023; 31:2482-2492. [PMID: 37593896 DOI: 10.1002/oby.23859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 06/12/2023] [Accepted: 06/13/2023] [Indexed: 08/19/2023]
Abstract
OBJECTIVE Approved by the Food and Drug Administration (FDA) in 2017 for diabetes and in 2021 for weight loss, semaglutide has seen widespread use among individuals who aim to lose weight. The aim of this study was to evaluate weight loss and the influence of clinical factors on semaglutide patients in real-world clinical practice. METHODS Using data from 10 health systems within the Greater Plains Collaborative (a PCORnet Clinical Research Network), nearly 4000 clinical factors encompassing demographic, diagnosis, and prescription information were extracted for semaglutide patients. A gradient-boosting, machine-learning classifier was developed for weight-loss prediction and identification of the most impactful factors via SHapley Additive exPlanations (SHAP) value extrapolation. RESULTS A total of 3555 eligible patients (539 of whom were observed 52 weeks following exposure) from March 2017 to April 2022 were studied. On average, individuals lost 4.44% (male individuals, 3.66%; female individuals, 5.08%) of their initial weight. History of diabetes mellitus diagnosis was associated with less weight loss, whereas prediabetes and linaclotide use were associated with more pronounced weight loss. CONCLUSIONS Weight loss in patients prescribed semaglutide from real-world evidence was strong but attenuated compared with previous clinical trials. Machine-learning analysis of electronic health record data identified factors that warrant further research and consideration when tailoring weight-loss therapy.
Collapse
Affiliation(s)
- William Powell
- Department of Biomedical and Health Informatics, University of Missouri-Kansas City School of Medicine, Kansas City, Missouri, USA
| | - Xing Song
- Department of Health Management and Informatics, University of Missouri School of Medicine, Columbia, Missouri, USA
| | - Yahia Mohamed
- Department of Biomedical and Health Informatics, University of Missouri-Kansas City School of Medicine, Kansas City, Missouri, USA
| | - Dave Walsh
- Department of Biomedical and Health Informatics, University of Missouri-Kansas City School of Medicine, Kansas City, Missouri, USA
| | - Elizabeth J Parks
- Department of Nutrition and Exercise Physiology, University of Missouri, Columbia, Missouri, USA
| | - Tamara M McMahon
- Department of Biomedical and Health Informatics, University of Missouri-Kansas City School of Medicine, Kansas City, Missouri, USA
| | - Mirza Khan
- Department of Cardiovascular Medicine, Saint Luke's Mid America Heart Institute, Kansas City, Missouri, USA
- Section of Cardiology, University of Missouri-Kansas City, Kansas City, Missouri, USA
| | - Lemuel R Waitman
- Department of Biomedical and Health Informatics, University of Missouri-Kansas City School of Medicine, Kansas City, Missouri, USA
- Department of Health Management and Informatics, University of Missouri School of Medicine, Columbia, Missouri, USA
| |
Collapse
|
34
|
Tinker RJ, Peterson J, Bastarache L. Phenotypic presentation of Mendelian disease across the diagnostic trajectory in electronic health records. Genet Med 2023; 25:100921. [PMID: 37337966 PMCID: PMC11092403 DOI: 10.1016/j.gim.2023.100921] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 06/12/2023] [Accepted: 06/13/2023] [Indexed: 06/21/2023] Open
Abstract
PURPOSE To investigate the phenotypic presentation of Mendelian disease across the diagnostic trajectory in the electronic health record (EHR). METHODS We applied a conceptual model to delineate the diagnostic trajectory of Mendelian disease to the EHRs of patients affected by 1 of 9 Mendelian diseases. We assessed data availability and phenotype ascertainment across the diagnostic trajectory using phenotype risk scores and validated our findings via chart review of patients with hereditary connective tissue disorders. RESULTS We identified 896 individuals with genetically confirmed diagnoses, 216 (24%) of whom had fully ascertained diagnostic trajectories. Phenotype risk scores increased following clinical suspicion and diagnosis (P < 1 × 10-4, Wilcoxon rank sum test). We found that of all International Classification of Disease-based phenotypes in the EHR, 66% were recorded after clinical suspicion, and manual chart review yielded consistent results. CONCLUSION Using a novel conceptual model to study the diagnostic trajectory of genetic disease in the EHR, we demonstrated that phenotype ascertainment is, in large part, driven by the clinical examinations and studies prompted by clinical suspicion of a genetic disease, a process we term diagnostic convergence. Algorithms designed to detect undiagnosed genetic disease should consider censoring EHR data at the first date of clinical suspicion to avoid data leakage.
Collapse
Affiliation(s)
- Rory J Tinker
- Division of Medical Genetics and Genomic Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Josh Peterson
- Vanderbilt University Medical Center, Department of Medicine, Nashville, TN; Vanderbilt University Medical Center, Department of Biomedical Informatics, Nashville, TN
| | - Lisa Bastarache
- Vanderbilt University Medical Center, Department of Biomedical Informatics, Nashville, TN.
| |
Collapse
|
35
|
Lam V, Sharma S, Gupta S, Spouge JL, Jordan IK, Mariño-Ramírez L. Ancestry-attenuated effects of socioeconomic deprivation on type 2 diabetes disparities in the All of Us cohort. RESEARCH SQUARE 2023:rs.3.rs-2976764. [PMID: 37790565 PMCID: PMC10543018 DOI: 10.21203/rs.3.rs-2976764/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Background Diabetes is a common disease with a major burden on morbidity, mortality, and productivity. Type 2 diabetes (T2D) accounts for roughly 90% of all diabetes cases in the United States and has greater observed prevalence among those who identify as Black or Hispanic. Methods The aims of this study were to determine whether T2D racial and ethnic disparities can be observed in data from the All of Us Research Program and to measure associations of genetic ancestry (GA) and socioeconomic deprivation with T2D. The All of Us Researcher Workbench was used to calculate T2D prevalence and to model T2D associations with GA, individual-level (iSDI) and zip code-based (zSDI) socioeconomic deprivation indices within and between participant self-identified race and ethnicity (SIRE) groups. Results The study cohort of 86,488 participants from the four largest SIRE groups in All of Us: Asian (n=2,311), Black (n=16,282), Hispanic (n=16,966), and White (n=50,292). SIRE groups show characteristic genetic ancestry patterns, consistent with their diverse origins, together with a continuum of ancestry fractions within and between groups. The Black and Hispanic groups show the highest median SDI values, followed by the Asian and White groups. Black participants show the highest age- and sex-adjusted T2D prevalence (21.9%), followed by the Hispanic (19.9%), Asian (15.1%), and White (14.8%) groups. Minority SIRE groups and socioeconomic deprivation are positively associated with T2D, when the entire cohort is analyzed together. However, SIRE and GA both show negative interaction effects with SDI on T2D. Higher levels of SDI are negatively associated with T2D in the Black and Hispanic groups, and higher levels of SDI are negatively associated with T2D at high levels of African and Native American ancestry. Conclusion Socioeconomic deprivation is positively associated with the SIRE group T2D disparities observed here but negatively associated with T2D within the Black and Hispanic groups that show the highest T2D prevalence. These results are paradoxical and have not been reported elsewhere. We discuss possible explanations for this paradox related to the nature of the All of Us data along with SIRE group differences in access to healthcare, diet, and lifestyle.
Collapse
|
36
|
Bastarache L, Delozier S, Pandit A, He J, Lewis A, Annis AC, LeFaive J, Denny JC, Carroll RJ, Altman RB, Hughey JJ, Zawistowski M, Peterson JF. The phenotype-genotype reference map: Improving biobank data science through replication. Am J Hum Genet 2023; 110:1522-1533. [PMID: 37607538 PMCID: PMC10502848 DOI: 10.1016/j.ajhg.2023.07.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/26/2023] [Accepted: 07/27/2023] [Indexed: 08/24/2023] Open
Abstract
Population-scale biobanks linked to electronic health record data provide vast opportunities to extend our knowledge of human genetics and discover new phenotype-genotype associations. Given their dense phenotype data, biobanks can also facilitate replication studies on a phenome-wide scale. Here, we introduce the phenotype-genotype reference map (PGRM), a set of 5,879 genetic associations from 523 GWAS publications that can be used for high-throughput replication experiments. PGRM phenotypes are standardized as phecodes, ensuring interoperability between biobanks. We applied the PGRM to five ancestry-specific cohorts from four independent biobanks and found evidence of robust replications across a wide array of phenotypes. We show how the PGRM can be used to detect data corruption and to empirically assess parameters for phenome-wide studies. Finally, we use the PGRM to explore factors associated with replicability of GWAS results.
Collapse
Affiliation(s)
- Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Sarah Delozier
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Anita Pandit
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Jing He
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Adam Lewis
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Aubrey C Annis
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Jonathon LeFaive
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Joshua C Denny
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Russ B Altman
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Jacob J Hughey
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Matthew Zawistowski
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Josh F Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
37
|
Wizenty J, Koop PH, Clusmann J, Tacke F, Trautwein C, Schneider KM, Sigal M, Schneider CV. Association of Helicobacter pylori Positivity With Risk of Disease and Mortality. Clin Transl Gastroenterol 2023; 14:e00610. [PMID: 37367296 PMCID: PMC10522101 DOI: 10.14309/ctg.0000000000000610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 05/17/2023] [Indexed: 06/28/2023] Open
Abstract
INTRODUCTION Helicobacter pylori colonizes the human stomach. Infection causes chronic gastritis and increases the risk of gastroduodenal ulcer and gastric cancer. Its chronic colonization in the stomach triggers aberrant epithelial and inflammatory signals that are also associated with systemic alterations. METHODS Using a PheWAS analysis in more than 8,000 participants in the community-based UK Biobank, we explored the association of H. pylori positivity with gastric and extragastric disease and mortality in a European country. RESULTS Along with well-established gastric diseases, we dominantly found overrepresented cardiovascular, respiratory, and metabolic disorders. Using multivariate analysis, the overall mortality of H. pylori -positive participants was not altered, while the respiratory and Coronovirus 2019-associated mortality increased. Lipidomic analysis for H. pylori -positive participants revealed a dyslipidemic profile with reduced high-density lipoprotein cholesterol and omega-3 fatty acids, which may represent a causative link between infection, systemic inflammation, and disease. DISCUSSION Our study of H. pylori positivity demonstrates that it plays an organ- and disease entity-specific role in the development of human disease and highlights the importance of further research into the systemic effects of H. pylori infection.
Collapse
Affiliation(s)
- Jonas Wizenty
- Department of Hepatology and Gastroenterology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Paul-Henry Koop
- Department for Gastroenterology, Metabolic Diseases and Intensive Care, University Hospital RWTH Aachen, Aachen, Germany
| | - Jan Clusmann
- Department for Gastroenterology, Metabolic Diseases and Intensive Care, University Hospital RWTH Aachen, Aachen, Germany
| | - Frank Tacke
- Department of Hepatology and Gastroenterology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Christian Trautwein
- Department for Gastroenterology, Metabolic Diseases and Intensive Care, University Hospital RWTH Aachen, Aachen, Germany
| | - Kai Markus Schneider
- Department for Gastroenterology, Metabolic Diseases and Intensive Care, University Hospital RWTH Aachen, Aachen, Germany
| | - Michael Sigal
- Department of Hepatology and Gastroenterology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Carolin V. Schneider
- Department for Gastroenterology, Metabolic Diseases and Intensive Care, University Hospital RWTH Aachen, Aachen, Germany
| |
Collapse
|
38
|
Barnado A, Wheless L, Camai A, Green S, Han B, Katta A, Denny JC, Sawalha AH. Phenotype Risk Score but Not Genetic Risk Score Aids in Identifying Individuals With Systemic Lupus Erythematosus in the Electronic Health Record. Arthritis Rheumatol 2023; 75:1532-1541. [PMID: 37096581 PMCID: PMC10501317 DOI: 10.1002/art.42544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 03/23/2023] [Accepted: 04/17/2023] [Indexed: 04/26/2023]
Abstract
OBJECTIVE Systemic lupus erythematosus (SLE) poses diagnostic challenges. We undertook this study to evaluate the utility of a phenotype risk score (PheRS) and a genetic risk score (GRS) to identify SLE individuals in a real-world setting. METHODS Using a de-identified electronic health record (EHR) database with an associated DNA biobank, we identified 789 SLE cases and 2,261 controls with available MEGAEX genotyping. A PheRS for SLE was developed using billing codes that captured American College of Rheumatology SLE criteria. We developed a GRS with 58 SLE risk single-nucleotide polymorphisms (SNPs). RESULTS SLE cases had a significantly higher PheRS (mean ± SD 7.7 ± 8.0 versus 0.8 ± 2.0 in controls; P < 0.001) and GRS (mean ± SD 12.2 ± 2.3 versus 11.0 ± 2.0 in controls; P < 0.001). Black individuals with SLE had a higher PheRS compared to White individuals (mean ± SD 10.0 ± 10.1 versus 7.1 ± 7.2, respectively; P = 0.002) but a lower GRS (mean ± SD 9.0 ± 1.4 versus 12.3 ± 1.7, respectively; P < 0.001). Models predicting SLE that used only the PheRS had an area under the curve (AUC) of 0.87. Adding the GRS to the PheRS resulted in a minimal difference with an AUC of 0.89. On chart review, controls with the highest PheRS and GRS had undiagnosed SLE. CONCLUSION We developed a SLE PheRS to identify established and undiagnosed SLE individuals. A SLE GRS using known risk SNPs did not add value beyond the PheRS and was of limited utility in Black individuals with SLE. More work is needed to understand the genetic risks of SLE in diverse populations.
Collapse
Affiliation(s)
- April Barnado
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Lee Wheless
- Department of Dermatology, Division of Epidemiology, Vanderbilt University Medical Center, Nashville, TN
| | - Alex Camai
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Sarah Green
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Bryan Han
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Anish Katta
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Joshua C. Denny
- All of Us Research Program, National Institutes of Health, Bethesda, MD
| | - Amr H. Sawalha
- Departments of Pediatrics, Medicine, and Immunology & Lupus Center of Excellence, University of Pittsburgh School of Medicine, Pittsburgh, PA
| |
Collapse
|
39
|
Lee IH, Walker DI, Lin Y, Smith MR, Mandl KD, Jones DP, Kong SW. Association between Neuroligin-1 polymorphism and plasma glutamine levels in individuals with autism spectrum disorder. EBioMedicine 2023; 95:104746. [PMID: 37544204 PMCID: PMC10427990 DOI: 10.1016/j.ebiom.2023.104746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 07/21/2023] [Accepted: 07/24/2023] [Indexed: 08/08/2023] Open
Abstract
BACKGROUND Unravelling the relationships between candidate genes and autism spectrum disorder (ASD) phenotypes remains an outstanding challenge. Endophenotypes, defined as inheritable, measurable quantitative traits, might provide intermediary links between genetic risk factors and multifaceted ASD phenotypes. In this study, we sought to determine whether plasma metabolite levels could serve as endophenotypes in individuals with ASD and their family members. METHODS We employed an untargeted, high-resolution metabolomics platform to analyse 14,342 features across 1099 plasma samples. These samples were collected from probands and their family members participating in the Autism Genetic Resource Exchange (AGRE) (N = 658), compared with neurotypical individuals enrolled in the PrecisionLink Health Discovery (PLHD) program at Boston Children's Hospital (N = 441). We conducted a metabolite quantitative trait loci (mQTL) analysis using whole-genome genotyping data from each cohort in AGRE and PLHD, aiming to prioritize significant mQTL and metabolite pairs that were exclusively observed in AGRE. FINDINGS Within the AGRE group, we identified 54 significant associations between genotypes and metabolite levels (P < 5.27 × 10-11), 44 of which were not observed in the PLHD group. Plasma glutamine levels were found to be associated with variants in the NLGN1 gene, a gene that encodes post-synaptic cell-adhesion molecules in excitatory neurons. This association was not detected in the PLHD group. Notably, a significant negative correlation between plasma glutamine and glutamate levels was observed in the AGRE group, but not in the PLHD group. Furthermore, plasma glutamine levels showed a negative correlation with the severity of restrictive and repetitive behaviours (RRB) in ASD, although no direct association was observed between RRB severity and the NLGN1 genotype. INTERPRETATION Our findings suggest that plasma glutamine levels could potentially serve as an endophenotype, thus establishing a link between the genetic risk associated with NLGN1 and the severity of RRB in ASD. This identified association could facilitate the development of novel therapeutic targets, assist in selecting specific cohorts for clinical trials, and provide insights into target symptoms for future ASD treatment strategies. FUNDING This work was supported by the National Institute of Health (grant numbers: R01MH107205, U01TR002623, R24OD024622, OT2OD032720, and R01NS129188) and the PrecisionLink Biobank for Health Discovery at Boston Children's Hospital.
Collapse
Affiliation(s)
- In-Hee Lee
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02215, USA
| | - Douglas I Walker
- Gangarosa Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, GA, 30322, USA
| | - Yufei Lin
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02215, USA
| | - Matthew Ryan Smith
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, Emory University, Atlanta, GA, 30602, USA; Atlanta Department of Veterans Affairs Medical Center, Decatur, GA, 30033, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02215, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
| | - Dean P Jones
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, Emory University, Atlanta, GA, 30602, USA
| | - Sek Won Kong
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02215, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
40
|
Zhang Y, Jiang X, Mentzer AJ, McVean G, Lunter G. Topic modeling identifies novel genetic loci associated with multimorbidities in UK Biobank. CELL GENOMICS 2023; 3:100371. [PMID: 37601973 PMCID: PMC10435382 DOI: 10.1016/j.xgen.2023.100371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 05/04/2023] [Accepted: 07/07/2023] [Indexed: 08/22/2023]
Abstract
Many diseases show patterns of co-occurrence, possibly driven by systemic dysregulation of underlying processes affecting multiple traits. We have developed a method (treeLFA) for identifying such multimorbidities from routine health-care data, which combines topic modeling with an informative prior derived from medical ontology. We apply treeLFA to UK Biobank data and identify a variety of topics representing multimorbidity clusters, including a healthy topic. We find that loci identified using topic weights as traits in a genome-wide association study (GWAS) analysis, which we validated with a range of approaches, only partially overlap with loci from GWASs on constituent single diseases. We also show that treeLFA improves upon existing methods like latent Dirichlet allocation in various ways. Overall, our findings indicate that topic models can characterize multimorbidity patterns and that genetic analysis of these patterns can provide insight into the etiology of complex traits that cannot be determined from the analysis of constituent traits alone.
Collapse
Affiliation(s)
- Yidong Zhang
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Chinese Academy of Medical Sciences Oxford Institute, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
- Department of Radiation Oncology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100006, China
| | - Xilin Jiang
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge CB2 0SR, UK
- Heart and Lung Research Institute, University of Cambridge, Cambridge CB2 0BB, UK
| | - Alexander J. Mentzer
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Gerton Lunter
- MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford OX3 9DS, UK
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen 9700 RB, the Netherlands
| |
Collapse
|
41
|
Gu S, Rajendiran G, Forest K, Tran TC, Denny JC, Larson EA, Wilke RA. Drug-Induced Liver Injury with Commonly Used Antibiotics in the All of Us Research Program. Clin Pharmacol Ther 2023; 114:404-412. [PMID: 37150941 PMCID: PMC10484299 DOI: 10.1002/cpt.2930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 04/30/2023] [Indexed: 05/09/2023]
Abstract
Antibiotics are a known cause of idiosyncratic drug-induced liver injury (DILI). According to the Centers for Disease Control and Prevention, the five most commonly prescribed antibiotics in the United States are azithromycin, ciprofloxacin, cephalexin, amoxicillin, and amoxicillin-clavulanate. We quantified the frequency of acute DILI for these common antibiotics in the All of Us Research Program, one of the largest electronic health record (EHR)-linked research cohorts in the United States. Retrospective analyses were conducted applying a standardized phenotyping algorithm to de-identified clinical data available in the All of Us database for 318,598 study participants. Between February 1984 and December 2022, more than 30% of All of Us participants (n = 119,812 individuals) had been exposed to at least 1 of our 5 study drugs. Initial screening identified 591 potential case patients that met our preselected laboratory-based phenotyping criteria. Because DILI is a diagnosis of exclusion, we then used phenome scanning to narrow the case counts by (i) scanning all EHRs to identify all alternative diagnostic explanations for the laboratory abnormalities, and (ii) leveraging International Classification of Disease 9th revision (ICD)-9 and ICD 10th revision (ICD)-10 codes as exclusion criteria to eliminate misclassification. Our final case counts were 30 DILI cases with amoxicillin-clavulanate, 24 cases with azithromycin, 24 cases with ciprofloxacin, 22 cases with amoxicillin alone, and < 20 cases with cephalexin. These findings demonstrate that data from EHR-linked research cohorts can be efficiently mined to identify DILI cases related to the use of common antibiotics.
Collapse
Affiliation(s)
- Shaopeng Gu
- Department of Internal Medicine, Sanford School of Medicine, University of South Dakota, Sioux Falls SD, USA
- Sanford Imagenetics, Sioux Falls SD, USA
| | - Govarthanan Rajendiran
- Department of Internal Medicine, Sanford School of Medicine, University of South Dakota, Sioux Falls SD, USA
- Sanford Medical Center, Section of Gastroenterology/Hepatology, Sioux Falls SD, USA
| | - Kennedy Forest
- Department of Internal Medicine, Sanford School of Medicine, University of South Dakota, Sioux Falls SD, USA
| | - Tam C Tran
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Joshua C Denny
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Eric A Larson
- Department of Internal Medicine, Sanford School of Medicine, University of South Dakota, Sioux Falls SD, USA
- Sanford Imagenetics, Sioux Falls SD, USA
| | - Russell A Wilke
- Department of Internal Medicine, Sanford School of Medicine, University of South Dakota, Sioux Falls SD, USA
| |
Collapse
|
42
|
Zaidi AA, Verma A, Morse C, Ritchie MD, Mathieson I. The genetic and phenotypic correlates of mtDNA copy number in a multi-ancestry cohort. HGG ADVANCES 2023; 4:100202. [PMID: 37255673 PMCID: PMC10225932 DOI: 10.1016/j.xhgg.2023.100202] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 04/25/2023] [Indexed: 06/01/2023] Open
Abstract
Mitochondrial DNA copy number (mtCN) is often treated as a proxy for mitochondrial (dys-) function and disease risk. Pathological changes in mtCN are common symptoms of rare mitochondrial disorders, but reported associations between mtCN and common diseases vary across studies. To understand the biology of mtCN, we carried out genome- and phenome-wide association studies of mtCN in 30,666 individuals from the Penn Medicine BioBank (PMBB)-a diverse cohort of largely African and European ancestry. We estimated mtCN in peripheral blood using exome sequence data, taking cell composition into account. We replicated known genetic associations of mtCN in the PMBB and found that their effects are highly correlated between individuals of European and African ancestry. However, the heritability of mtCN was much higher among individuals of largely African ancestry ( h 2 = 0.3 ) compared with European ancestry individuals( h 2 = 0.1 ) . Admixture mapping suggests that there are undiscovered variants underlying mtCN that are differentiated in frequency between individuals with African and European ancestry. We show that mtCN is associated with many health-related phenotypes. We discovered robust associations between mtDNA copy number and diseases of metabolically active tissues, such as cardiovascular disease and liver damage, that were consistent across African and European ancestry individuals. Other associations, such as epilepsy and prostate cancer, were only discovered in either individuals with European or African ancestry but not both. We show that mtCN-phenotype associations can be sensitive to blood cell composition and environmental modifiers, explaining why such associations are inconsistent across studies. Thus, mtCN-phenotype associations must be interpreted with care.
Collapse
Affiliation(s)
- Arslan A. Zaidi
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anurag Verma
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Colleen Morse
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Penn Medicine BioBank
- Center for Translational Bioinformatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Marylyn D. Ritchie
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
43
|
Verma A, Huffman JE, Rodriguez A, Conery M, Liu M, Ho YL, Kim Y, Heise DA, Guare L, Panickan VA, Garcon H, Linares F, Costa L, Goethert I, Tipton R, Honerlaw J, Davies L, Whitbourne S, Cohen J, Posner DC, Sangar R, Murray M, Wang X, Dochtermann DR, Devineni P, Shi Y, Nandi TN, Assimes TL, Brunette CA, Carroll RJ, Clifford R, Duvall S, Gelernter J, Hung A, Iyengar SK, Joseph J, Kember R, Kranzler H, Levey D, Luoh SW, Merritt VC, Overstreet C, Deak JD, Grant SFA, Polimanti R, Roussos P, Sun YV, Venkatesh S, Voloudakis G, Justice A, Begoli E, Ramoni R, Tourassi G, Pyarajan S, Tsao PS, O’Donnell CJ, Muralidhar S, Moser J, Casas JP, Bick AG, Zhou W, Cai T, Voight BF, Cho K, Gaziano MJ, Madduri RK, Damrauer SM, Liao KP. Diversity and Scale: Genetic Architecture of 2,068 Traits in the VA Million Veteran Program. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.28.23291975. [PMID: 37425708 PMCID: PMC10327290 DOI: 10.1101/2023.06.28.23291975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Genome-wide association studies (GWAS) have underrepresented individuals from non-European populations, impeding progress in characterizing the genetic architecture and consequences of health and disease traits. To address this, we present a population-stratified phenome-wide GWAS followed by a multi-population meta-analysis for 2,068 traits derived from electronic health records of 635,969 participants in the Million Veteran Program (MVP), a longitudinal cohort study of diverse U.S. Veterans genetically similar to the respective African (121,177), Admixed American (59,048), East Asian (6,702), and European (449,042) superpopulations defined by the 1000 Genomes Project. We identified 38,270 independent variants associating with one or more traits at experiment-wide P < 4.6 × 10 - 11 significance; fine-mapping 6,318 signals identified from 613 traits to single-variant resolution. Among these, a third (2,069) of the associations were found only among participants genetically similar to non-European reference populations, demonstrating the importance of expanding diversity in genetic studies. Our work provides a comprehensive atlas of phenome-wide genetic associations for future studies dissecting the architecture of complex traits in diverse populations.
Collapse
Affiliation(s)
- Anurag Verma
- Corporal Michael Crescenz VA Medical Center, Philadelphia, PA, 19104, USA
- Department of Medicine, Division of Translational Medicine and Human Genetics, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA
- Institute for Biomedical Informatics, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA
| | - Jennifer E Huffman
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, 02130, USA
- Palo Alto Veterans Institute for Research (PAVIR), Palo Alto Health Care System, Palo Alto, CA, 94304, USA
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
| | - Alex Rodriguez
- Data Science and Learning, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Mitchell Conery
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA
| | - Molei Liu
- Department of Biostatistics, Columbia University’s Mailman School of Public Health, New York, NY, 10032, USA
| | - Yuk-Lam Ho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, 02130, USA
| | - Youngdae Kim
- Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - David A Heise
- National Security Sciences Directorate, Cyber Resilience and Intelligence Division, Oak Ridge National Laboratory, Dept of Energy, Oak Ridge, TN, 37831, USA
| | - Lindsay Guare
- Department of Medicine, Division of Translational Medicine and Human Genetics, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA
| | | | - Helene Garcon
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, 02130, USA
| | - Franciel Linares
- R&D Systems Engineering, Information Technology Services Directorate, Oak Ridge National Laboratory, Dept of Energy, Oak Ridge, TN, 37831, USA
| | - Lauren Costa
- MVP Boston Coordinating Center, VA Boston Healthcare System, Boston, MA, 02111, USA
| | - Ian Goethert
- Data Management and Engineering, Information Technology Services Division, Oak Ridge National Laboratory, Dept of Energy, Oak Ridge, TN, 37831, USA
| | - Ryan Tipton
- Knowledge Discovery Infrastructure, Information Technology Services Division, Oak Ridge National Laboratory, Dept of Energy, Oak Ridge, TN, 37831, USA
| | - Jacqueline Honerlaw
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, 02130, USA
| | - Laura Davies
- Computing and Computational Sciences Dir PMO, PMO, Oak Ridge National Laboratory, Dept of Energy, Oak Ridge, TN, 37831, USA
| | - Stacey Whitbourne
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
- MVP Boston Coordinating Center, VA Boston Healthcare System, Boston, MA, 02111, USA
- Department of Medicine, Division of Aging, Brigham and Women’s Hospital, Boston, MA, 02115, USA
| | - Jeremy Cohen
- National Security Sciences Directorate, Cyber Resilience and Intelligence Division, Oak Ridge National Laboratory, Dept of Energy, Oak Ridge, TN, 37831, USA
| | - Daniel C Posner
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, 02130, USA
| | - Rahul Sangar
- MVP Boston Coordinating Center, VA Boston Healthcare System, Boston, MA, 02111, USA
| | - Michael Murray
- MVP Boston Coordinating Center, VA Boston Healthcare System, Boston, MA, 02111, USA
| | - Xuan Wang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| | - Daniel R Dochtermann
- VA Cooperative Studies Program, VA Boston Healthcare System, Boston, MA, 02130, USA
| | - Poornima Devineni
- VA Cooperative Studies Program, VA Boston Healthcare System, Boston, MA, 02130, USA
| | - Yunling Shi
- VA Cooperative Studies Program, VA Boston Healthcare System, Boston, MA, 02130, USA
| | - Tarak Nath Nandi
- Data Science and Learning, Argonne National Laboratory, Lemont, IL, 60439, USA
| | | | - Charles A Brunette
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
- Research Service, VA Boston Healthcare System, Boston, MA, 02130, USA
| | - Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37211, USA
| | - Royce Clifford
- Research Department, VA San Diego Healthcare System, San Diego, CA, 92161, USA
- Surgery, Otolaryngology, UCSD San Diego, La Jolla, California, 92093, USA
| | - Scott Duvall
- VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, UT, 84148, USA
- Internal Medicine, Epidemiology, University of Utah School of Medicine, Salt Lake City, UT, 84132, USA
| | - Joel Gelernter
- Psychiatry, Human Genetics, Yale University, New Haven, CT, 06520, USA
- VA Connecticut Healthcare System West Haven, West Haven, CT, 06516, USA
| | - Adriana Hung
- Medicine, Nephrology & Hypertension, VA Tennessee Valley Healthcare System & Vanderbilt University, Nashville, TN, 37232, USA
| | - Sudha K Iyengar
- Population and Quantitative Health Sciences, Case Western Reserve University, School of Medicine, Cleveland, OH, 44106, USA
| | - Jacob Joseph
- Medicine, Cardiology Section, VA Providence Healthcare System, Providence, RI, 02908, USA
- Department of Medicine, Brown University, Providence, RI, 02908, USA
| | - Rachel Kember
- Mental Illness Research, Education and Clinical Center, Corporal Michael Crescenz VA Medical Center, Philadelphia, PA, 19104, USA
- Department of Psychiatry, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA
| | - Henry Kranzler
- Mental Illness Research, Education and Clinical Center, Corporal Michael Crescenz VA Medical Center, Philadelphia, PA, 19104, USA
- Department of Psychiatry, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA
| | - Daniel Levey
- Psychiatry, Human Genetics, Yale University, New Haven, CT, 06520, USA
- Medicine, VA Connecticut Healthcare System West Haven, West Haven, CT, 06516, USA
| | - Shiuh-Wen Luoh
- VA Portland Health Care System, Portland, OR, 97239, USA
- Division of Hematology and Medical Oncology, Knight Cancer Institute, Oregon Health and Science University, Portland, OR, 97239, USA
| | - Victoria C Merritt
- Research Department, VA San Diego Healthcare System, San Diego, CA, 92161, USA
| | - Cassie Overstreet
- Psychiatry, Human Genetics, Yale University, New Haven, CT, 06520, USA
| | - Joseph D Deak
- Psychiatry, Yale University, New Haven, CT, 06520, USA
- Psychiatry, VA Connecticut Healthcare System West Haven, West Haven, CT, 06516, USA
| | - Struan F A Grant
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pediatrics, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA
- Divisions of Human Genetics and Endocrinology and Diabetes, Children’s Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Genetics, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA
| | | | - Panos Roussos
- Psychiatry, Mental Illness Research, Education and Clinical Center, James J. Peters VA Medical Center; Icahn School of Medicine at Mount Sinai, Bronx, NY, 10468, USA
| | - Yan V Sun
- Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, 30322, USA
| | - Sanan Venkatesh
- Psychiatry, Mental Illness Research, Education and Clinical Center, James J. Peters VA Medical Center; Icahn School of Medicine at Mount Sinai, Bronx, NY, 10468, USA
| | - Georgios Voloudakis
- Psychiatry, Mental Illness Research, Education and Clinical Center, James J. Peters VA Medical Center; Icahn School of Medicine at Mount Sinai, Bronx, NY, 10468, USA
| | - Amy Justice
- Medicine, VA Connecticut Healthcare System West Haven, West Haven, CT, 06516, USA
- Internal Medicine, General Medicine, Yale University, New Haven, CT, 06520, USA
- Health Policy, Yale School of Public Health, New Haven, CT, 06520, USA
| | - Edmon Begoli
- Oak Ridge National Laboratory, Dept of Energy, Oak Ridge, TN, 37831, USA
| | - Rachel Ramoni
- Office of Research and Development, Department of Veterans Affairs, Washington, DC, 20420, USA
| | - Georgia Tourassi
- National Center for Computational Sciences, Oak Ridge National Laboratory, Dept of Energy, Oak Ridge, TN, 37831, USA
| | - Saiju Pyarajan
- VA Cooperative Studies Program, VA Boston Healthcare System, Boston, MA, 02130, USA
| | - Philip S Tsao
- Medicine, Cardiology, VA Palo Alto Healthcare System, Palo Alto, CA, 94304, USA
- Department of Medicine, Stanford University, Palo Alto, CA, 94304, USA
| | | | - Sumitra Muralidhar
- Office of Research and Development, Department of Veterans Affairs, Washington, DC, 20420, USA
| | - Jennifer Moser
- Office of Research and Development, Department of Veterans Affairs, Washington, DC, 20420, USA
| | - Juan P Casas
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, 02130, USA
| | - Alexander G Bick
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University, Nashville, TN, 37325, USA
| | - Wei Zhou
- Department of Medicine, Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
- Stanley Center for Psychiatric Research, Cambridge, MA, 02142, USA
- Program in Medical and Population Genetics, Cambridge, MA, 02142, USA
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| | - Benjamin F Voight
- Corporal Michael Crescenz VA Medical Center, Philadelphia, PA, 19104, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA
- Department of Genetics, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA
- Institute of Translational Medicine and Therapeutics, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA
| | - Kelly Cho
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
- MVP Boston Coordinating Center, VA Boston Healthcare System, Boston, MA, 02111, USA
- Department of Medicine, Division of Aging, Brigham and Women’s Hospital, Boston, MA, 02115, USA
| | - Michael J Gaziano
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
- MVP Boston Coordinating Center, VA Boston Healthcare System, Boston, MA, 02111, USA
- Department of Medicine, Division of Aging, Brigham and Women’s Hospital, Boston, MA, 02115, USA
| | - Ravi K Madduri
- Data Science and Learning, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Scott M Damrauer
- Corporal Michael Crescenz VA Medical Center, Philadelphia, PA, 19104, USA
- Department of Genetics, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA
- Department of Surgery, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA
- Cardiovascular Institute, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA
| | - Katherine P Liao
- Medicine, Rheumatology, VA Boston Healthcare System, Boston, MA, 02130, USA
- Department of Medicine, Division of Rheumatology, Inflammation, and Immunity, Brigham and Women’s Hospital, Boston, MA, 02115, USA
| |
Collapse
|
44
|
Stead WW, Lewis A, Giuse NB, Koonce TY, Bastarache L. Knowledgebase strategies to aid interpretation of clinical correlation research. J Am Med Inform Assoc 2023; 30:1257-1265. [PMID: 37164621 PMCID: PMC10280353 DOI: 10.1093/jamia/ocad078] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 04/09/2023] [Accepted: 04/25/2023] [Indexed: 05/12/2023] Open
Abstract
OBJECTIVE Knowledgebases are needed to clarify correlations observed in real-world electronic health record (EHR) data. We posit design principles, present a unifying framework, and report a test of concept. MATERIALS AND METHODS We structured a knowledge framework along 3 axes: condition of interest, knowledge source, and taxonomy. In our test of concept, we used hypertension as our condition of interest, literature and VanderbiltDDx knowledgebase as sources, and phecodes as our taxonomy. In a cohort of 832 566 deidentified EHRs, we modeled blood pressure and heart rate by sex and age, classified individuals by hypertensive status, and ran a Phenome-wide Association Study (PheWAS) for hypertension. We compared the correlations from PheWAS to the associations in our knowledgebase. RESULTS We produced PhecodeKbHtn: a knowledgebase comprising 167 hypertension-associated diseases, 15 of which were also negatively associated with blood pressure (pos+neg). Our hypertension PheWAS included 1914 phecodes, 129 of which were in the PhecodeKbHtn. Among the PheWAS association results, phecodes that were in PhecodeKbHtn had larger effect sizes compared with those phecodes not in the knowledgebase. DISCUSSION Each source contributed unique and additive associations. Models of blood pressure and heart rate by age and sex were consistent with prior cohort studies. All but 4 PheWAS positive and negative correlations for phecodes in PhecodeKbHtn may be explained by knowledgebase associations, hypertensive cardiac complications, or causes of hypertension independently associated with hypotension. CONCLUSION It is feasible to assemble a knowledgebase that is compatible with EHR data to aid interpretation of clinical correlation research.
Collapse
Affiliation(s)
- William W Stead
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Adam Lewis
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Nunzia B Giuse
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Center for Knowledge Management, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Taneya Y Koonce
- Center for Knowledge Management, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
45
|
Chatham AH, Bradley ED, Schirle L, Sanchez-Roige S, Samuels DC, Jeffery AD. Detecting Problematic Opioid Use in the Electronic Health Record: Automation of the Addiction Behaviors Checklist in a Chronic Pain Population. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.08.23290894. [PMID: 37398208 PMCID: PMC10312835 DOI: 10.1101/2023.06.08.23290894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Importance Individuals whose chronic pain is managed with opioids are at high risk of developing an opioid use disorder. Large data sets, such as electronic health records, are required for conducting studies that assist with identification and management of problematic opioid use. Objective Determine whether regular expressions, a highly interpretable natural language processing technique, could automate a validated clinical tool (Addiction Behaviors Checklist1) to expedite the identification of problematic opioid use in the electronic health record. Design This cross-sectional study reports on a retrospective cohort with data analyzed from 2021 through 2023. The approach was evaluated against a blinded, manually reviewed holdout test set of 100 patients. Setting The study used data from Vanderbilt University Medical Center's Synthetic Derivative, a de-identified version of the electronic health record for research purposes. Participants This cohort comprised 8,063 individuals with chronic pain. Chronic pain was defined by International Classification of Disease codes occurring on at least two different days.18 We collected demographic, billing code, and free-text notes from patients' electronic health records. Main Outcomes and Measures The primary outcome was the evaluation of the automated method in identifying patients demonstrating problematic opioid use and its comparison to opioid use disorder diagnostic codes. We evaluated the methods with F1 scores and areas under the curve - indicators of sensitivity, specificity, and positive and negative predictive value. Results The cohort comprised 8,063 individuals with chronic pain (mean [SD] age at earliest chronic pain diagnosis, 56.2 [16.3] years; 5081 [63.0%] females; 2982 [37.0%] male patients; 76 [1.0%] Asian, 1336 [16.6%] Black, 56 [1.0%] other, 30 [0.4%] unknown race patients, and 6499 [80.6%] White; 135 [1.7%] Hispanic/Latino, 7898 [98.0%] Non-Hispanic/Latino, and 30 [0.4%] unknown ethnicity patients). The automated approach identified individuals with problematic opioid use that were missed by diagnostic codes and outperformed diagnostic codes in F1 scores (0.74 vs. 0.08) and areas under the curve (0.82 vs 0.52). Conclusions and Relevance This automated data extraction technique can facilitate earlier identification of people at-risk for, and suffering from, problematic opioid use, and create new opportunities for studying long-term sequelae of opioid pain management.
Collapse
Affiliation(s)
| | - Eli D. Bradley
- Vanderbilt University School of Nursing, Nashville, TN, USA
| | - Lori Schirle
- Vanderbilt University School of Nursing, Nashville, TN, USA
- Department of Anesthesiology, School of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Sandra Sanchez-Roige
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - David C. Samuels
- Department of Molecular Physiology and Biophysics, Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - Alvin D. Jeffery
- Vanderbilt University School of Nursing, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
46
|
Vessels T, Strayer N, Choi KW, Lee H, Zhang S, Han L, Morley TJ, Smoller JW, Xu Y, Ruderfer DM. Identifying modifiable comorbidities of schizophrenia by integrating electronic health records and polygenic risk. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.01.23290057. [PMID: 37333378 PMCID: PMC10274978 DOI: 10.1101/2023.06.01.23290057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Patients with schizophrenia have substantial comorbidity contributing to reduced life expectancy of 10-20 years. Identifying which comorbidities might be modifiable could improve rates of premature mortality in this population. We hypothesize that conditions that frequently co-occur but lack shared genetic risk with schizophrenia are more likely to be products of treatment, behavior, or environmental factors and therefore potentially modifiable. To test this hypothesis, we calculated phenome-wide comorbidity from electronic health records (EHR) in 250,000 patients in each of two independent health care institutions (Vanderbilt University Medical Center and Mass General Brigham) and association with schizophrenia polygenic risk scores (PRS) across the same phenotypes (phecodes) in linked biobanks. Comorbidity with schizophrenia was significantly correlated across institutions (r = 0.85) and consistent with prior literature. After multiple test correction, there were 77 significant phecodes comorbid with schizophrenia. Overall, comorbidity and PRS association were highly correlated (r = 0.55, p = 1.29×10-118), however, 36 of the EHR identified comorbidities had significantly equivalent schizophrenia PRS distributions between cases and controls. Fifteen of these lacked any PRS association and were enriched for phenotypes known to be side effects of antipsychotic medications (e.g., "movement disorders", "convulsions", "tachycardia") or other schizophrenia related factors such as from smoking ("bronchitis") or reduced hygiene (e.g., "diseases of the nail") highlighting the validity of this approach. Other phenotypes implicated by this approach where the contribution from shared common genetic risk with schizophrenia was minimal included tobacco use disorder, diabetes, and dementia. This work demonstrates the consistency and robustness of EHR-based schizophrenia comorbidities across independent institutions and with the existing literature. It identifies comorbidities with an absence of shared genetic risk indicating other causes that might be more modifiable and where further study of causal pathways could improve outcomes for patients.
Collapse
Affiliation(s)
- Tess Vessels
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
| | - Nicholas Strayer
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville TN
| | - Karmel W. Choi
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston MA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston MA
| | - Hyunjoon Lee
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston MA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston MA
| | - Siwei Zhang
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville TN
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN
| | - Lide Han
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
| | - Theodore J. Morley
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
| | - Jordan W. Smoller
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston MA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston MA
- Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA
| | - Yaomin Xu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville TN
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN
| | - Douglas M. Ruderfer
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville TN
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
47
|
Upadhyaya P, Ling Y, Chen L, Kim Y, Jiang X. Inferring Personalized Treatment Effect of Antihypertensives on Alzheimer's Disease Using Deep Learning. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2023; 2023:49-57. [PMID: 38516035 PMCID: PMC10956734 DOI: 10.1109/ichi57859.2023.00018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Alzheimer's disease (AD) is one of the leading causes of death in the United States, especially among the elderly. Recent studies have shown how hypertension is related to cognitive decline in elderly patients, which in turn leads to increased mortality as well as morbidity. There have been various studies that have looked at the effect of antihypertensive drugs in reducing cognitive decline, and their results have proved inconclusive. However, most of these studies assume the treatment effect is similar for all patients, thus considering only the average treatment effects of antihypertensive drugs. In this paper, we assume that the effect of antihypertensives on the onset of AD depends on patient characteristics. We develop a deep learning method called LASSO-Dragonnet to estimate the individualized treatment effects of each patient. We considered six antihypertensive drugs, and each of the six models considered one of the drugs as the treatment and the remaining as control. Our studies showed that although many antihypertensives have a positive impact in delaying AD onset on average, the impact varies from individual to individual, depending on their various characteristics. We also analyzed the importance of various covariates in such an estimation. Our results showed that the individualized treatment effects of each patient could be estimated accurately using a deep learning method, and that the importance of various covariates could be determined.
Collapse
Affiliation(s)
| | - Yaobin Ling
- School of Biomedical Informatics, UT Health, Houston, USA
| | - Luyao Chen
- School of Biomedical Informatics, UT Health, Houston, USA
| | - Yejin Kim
- School of Biomedical Informatics, UT Health, Houston, USA
| | - Xiaoqian Jiang
- School of Biomedical Informatics, UT Health, Houston, USA
| |
Collapse
|
48
|
Nguyen NH, Sarangi S, McChesney EM, Sheng S, Porter AW, Kleyman TR, Pitluk ZW, Brodsky JL. Genome mining yields new disease-associated ROMK variants with distinct defects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.05.539609. [PMID: 37214976 PMCID: PMC10197530 DOI: 10.1101/2023.05.05.539609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Bartter syndrome is a group of rare genetic disorders that compromise kidney function by impairing electrolyte reabsorption. Left untreated, the resulting hyponatremia, hypokalemia, and dehydration can be fatal. Although there is no cure for this disease, specific genes that lead to different Bartter syndrome subtypes have been identified. Bartter syndrome type II specifically arises from mutations in the KCNJ1 gene, which encodes the renal outer medullary potassium channel, ROMK. To date, over 40 Bartter syndrome-associated mutations in KCNJ1 have been identified. Yet, their molecular defects are mostly uncharacterized. Nevertheless, a subset of disease-linked mutations compromise ROMK folding in the endoplasmic reticulum (ER), which in turn results in premature degradation via the ER associated degradation (ERAD) pathway. To identify uncharacterized human variants that might similarly lead to premature degradation and thus disease, we mined three genomic databases. First, phenotypic data in the UK Biobank were analyzed using a recently developed computational platform to identify individuals carrying KCNJ1 variants with clinical features consistent with Bartter syndrome type II. In parallel, we examined ROMK genomic data in both the NIH TOPMed and ClinVar databases with the aid of a computational algorithm that predicts protein misfolding and disease severity. Subsequent phenotypic studies using a high throughput yeast screen to assess ROMK function-and analyses of ROMK biogenesis in yeast and human cells-identified four previously uncharacterized mutations. Among these, one mutation uncovered from the two parallel approaches (G228E) destabilized ROMK and targeted it for ERAD, resulting in reduced protein expression at the cell surface. Another ERAD-targeted ROMK mutant (L320P) was found in only one of the screens. In contrast, another mutation (T300R) was ERAD-resistant, but defects in ROMK activity were apparent after expression and two-electrode voltage clamp measurements in Xenopus oocytes. Together, our results outline a new computational and experimental pipeline that can be applied to identify disease-associated alleles linked to a range of other potassium channels, and further our understanding of the ROMK structure-function relationship that may aid future therapeutic strategies. Author Summary Bartter syndrome is a rare genetic disorder characterized by defective renal electrolyte handing, leading to debilitating symptoms and, in some patients, death in infancy. Currently, there is no cure for this disease. Bartter syndrome is divided into five types based on the causative gene. Bartter syndrome type II results from genetic variants in the gene encoding the ROMK protein, which is expressed in the kidney and assists in regulating sodium, potassium, and water homeostasis. Prior work established that some disease-associated ROMK mutants misfold and are destroyed soon after their synthesis in the endoplasmic reticulum (ER). Because a growing number of drugs have been identified that correct defective protein folding, we wished to identify an expanded cohort of similarly misshapen and unstable disease-associated ROMK variants. To this end, we developed a pipeline that employs computational analyses of human genome databases with genetic and biochemical assays. Next, we both confirmed the identity of known variants and uncovered previously uncharacterized ROMK variants associated with Bartter syndrome type II. Further analyses indicated that select mutants are targeted for ER-associated degradation, while another mutant compromises ROMK function. This work sets-the-stage for continued mining for ROMK loss of function alleles as well as other potassium channels, and positions select Bartter syndrome mutations for correction using emerging pharmaceuticals.
Collapse
|
49
|
Roger J, Xie F, Costello J, Tang A, Liu J, Oskotsky T, Woldemariam S, Kosti I, Le B, Snyder MP, Giudice LC, Torgerson D, Shaw GM, Stevenson DK, Rajkovic A, Glymour MM, Aghaeepour N, Cakmak H, Lathi RB, Sirota M. Leveraging electronic health records to identify risk factors for recurrent pregnancy loss across two medical centers: a case-control study. RESEARCH SQUARE 2023:rs.3.rs-2631220. [PMID: 36993325 PMCID: PMC10055527 DOI: 10.21203/rs.3.rs-2631220/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Abstract
Recurrent pregnancy loss (RPL), defined as 2 or more pregnancy losses, affects 5-6% of ever-pregnant individuals. Approximately half of these cases have no identifiable explanation. To generate hypotheses about RPL etiologies, we implemented a case-control study comparing the history of over 1,600 diagnoses between RPL and live-birth patients, leveraging the University of California San Francisco (UCSF) and Stanford University electronic health record databases. In total, our study included 8,496 RPL (UCSF: 3,840, Stanford: 4,656) and 53,278 Control (UCSF: 17,259, Stanford: 36,019) patients. Menstrual abnormalities and infertility-associated diagnoses were significantly positively associated with RPL in both medical centers. Age-stratified analysis revealed that the majority of RPL-associated diagnoses had higher odds ratios for patients <35 compared with 35+ patients. While Stanford results were sensitive to control for healthcare utilization, UCSF results were stable across analyses with and without utilization. Intersecting significant results between medical centers was an effective filter to identify associations that are robust across center-specific utilization patterns.
Collapse
Affiliation(s)
- Jacquelyn Roger
- Bakar Computational Health Sciences Institute, University of California San Francisco
| | - Feng Xie
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University
- Department of Pediatrics, Stanford University
- Department of Biomedical Data Science, Stanford University
| | - Jean Costello
- Bakar Computational Health Sciences Institute, University of California San Francisco
| | - Alice Tang
- Bakar Computational Health Sciences Institute, University of California San Francisco
| | - Jay Liu
- Bakar Computational Health Sciences Institute, University of California San Francisco
| | - Tomiko Oskotsky
- Bakar Computational Health Sciences Institute, University of California San Francisco
| | - Sarah Woldemariam
- Bakar Computational Health Sciences Institute, University of California San Francisco
| | - Idit Kosti
- Bakar Computational Health Sciences Institute, University of California San Francisco
| | - Brian Le
- Bakar Computational Health Sciences Institute, University of California San Francisco
| | | | - Linda C. Giudice
- Department of Obstetrics and Gynecology, University of California San Francisco
| | - Dara Torgerson
- Department of Epidemiology and Biostatistics, University of California San Francisco
| | | | | | - Aleksandar Rajkovic
- Department of Pathology, University of California San Francisco
- Institute of Human Genetics, University of California San Francisco
| | - M. Maria Glymour
- Department of Epidemiology and Biostatistics, University of California San Francisco
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University
- Department of Pediatrics, Stanford University
- Department of Biomedical Data Science, Stanford University
| | - Hakan Cakmak
- Department of Obstetrics and Gynecology, University of California San Francisco
| | - Ruth B. Lathi
- Department of Obstetrics and Gynecology, Stanford University
| | - Marina Sirota
- Bakar Computational Health Sciences Institute, University of California San Francisco
| |
Collapse
|
50
|
Yang L, Sadler MC, Altman RB. Genetic association studies using disease liabilities from deep neural networks. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.18.23284383. [PMID: 36712099 PMCID: PMC9882423 DOI: 10.1101/2023.01.18.23284383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
The case-control study is a widely used method for investigating the genetic landscape of binary traits. However, the health-related outcome or disease status of participants in long-term, prospective cohort studies such as the UK Biobank are subject to change. Here, we develop an approach for the genetic association study leveraging disease liabilities computed from a deep patient phenotyping framework (AI-based liability). Analyzing 44 common traits in 261,807 participants from the UK Biobank, we identified novel loci compared to the conventional case-control (CC) association studies. Our results showed that combining liability scores with CC status was more powerful than the CC-GWAS in detecting independent genetic loci across different diseases. This boost in statistical power was further reflected in increased SNP-based heritability estimates. Moreover, polygenic risk scores calculated from AI-based liabilities better identified newly diagnosed cases in the 2022 release of the UK Biobank that served as controls in the 2019 version (6.2% percentile rank increase on average). These findings demonstrate the utility of deep neural networks that are able to model disease liabilities from high-dimensional phenotypic data in large-scale population cohorts. Our pipeline of genome-wide association studies with disease liabilities can be applied to other biobanks with rich phenotype and genotype data.
Collapse
Affiliation(s)
- Lu Yang
- Deparment of Bioengineering, Stanford University, Stanford, CA, 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| | - Marie C. Sadler
- Deparment of Bioengineering, Stanford University, Stanford, CA, 94305, USA
- University Center for Primary Care and Public Health, Lausanne, 1010, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - Russ B. Altman
- Deparment of Bioengineering, Stanford University, Stanford, CA, 94305, USA
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
- Department of Medicine, Stanford University, Stanford, CA, 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|