1
|
Merritt VC, Chen AW, Bonzel CL, Hong C, Sangar R, Morini Sweet S, Sorg SF, Chanfreau-Coffinier C. Development and validation of an electronic health record-based algorithm for identifying TBI in the VA: A VA Million Veteran Program study. Brain Inj 2024:1-9. [PMID: 39004925 DOI: 10.1080/02699052.2024.2373920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 06/24/2024] [Indexed: 07/16/2024]
Abstract
The purpose of this study was to develop and validate an algorithm for identifying Veterans with a history of traumatic brain injury (TBI) in the Veterans Affairs (VA) electronic health record using VA Million Veteran Program (MVP) data. Manual chart review (n = 200) was first used to establish 'gold standard' diagnosis labels for TBI ('Yes TBI' vs. 'No TBI'). To develop our algorithm, we used PheCAP, a semi-supervised pipeline that relied on the chart review diagnosis labels to train and create a prediction model for TBI. Cross-validation was used to train and evaluate the proposed algorithm, 'TBI-PheCAP.' TBI-PheCAP performance was compared to existing TBI algorithms and phenotyping methods, and the final algorithm was run on all MVP participants (n = 702,740) to assign a predicted probability for TBI and a binary classification status choosing specificity = 90%. The TBI-PheCAP algorithm had an area under the receiver operating characteristic curve of 0.92, sensitivity of 84%, and positive predictive value (PPV) of 98% at specificity = 90%. TBI-PheCAP generally performed better than other classification methods, with equivalent or higher sensitivity and PPV than existing rules-based TBI algorithms and MVP TBI-related survey data. Given its strong classification metrics, the TBI-PheCAP algorithm is recommended for use in future population-based TBI research.
Collapse
Affiliation(s)
- Victoria C Merritt
- VA San Diego Healthcare System (VASDHS), San Diego, CA, USA
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Center of Excellence for Stress and Mental Health, VASDHS, San Diego, CA, USA
| | | | | | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NH, USA
| | | | | | - Scott F Sorg
- Home Base, A Red Sox Foundation and Massachusetts General Hospital Program, Boston, MA, USA
| | | |
Collapse
|
2
|
Chen JS, Copado IA, Vallejos C, Kalaw FGP, Soe P, Cai CX, Toy BC, Borkar D, Sun CQ, Shantha JG, Baxter SL. Variations in Electronic Health Record-Based Definitions of Diabetic Retinopathy Cohorts: A Literature Review and Quantitative Analysis. OPHTHALMOLOGY SCIENCE 2024; 4:100468. [PMID: 38560278 PMCID: PMC10973665 DOI: 10.1016/j.xops.2024.100468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 01/04/2024] [Accepted: 01/11/2024] [Indexed: 04/04/2024]
Abstract
Purpose Use of the electronic health record (EHR) has motivated the need for data standardization. A gap in knowledge exists regarding variations in existing terminologies for defining diabetic retinopathy (DR) cohorts. This study aimed to review the literature and analyze variations regarding codified definitions of DR. Design Literature review and quantitative analysis. Subjects Published manuscripts. Methods Four graders reviewed PubMed and Google Scholar for peer-reviewed studies. Studies were included if they used codified definitions of DR (e.g., billing codes). Data elements such as author names, publication year, purpose, data set type, and DR definitions were manually extracted. Each study was reviewed by ≥ 2 authors to validate inclusion eligibility. Quantitative analyses of the codified definitions were then performed to characterize the variation between DR cohort definitions. Main Outcome Measures Number of studies included and numeric counts of billing codes used to define codified cohorts. Results In total, 43 studies met the inclusion criteria. Half of the included studies used datasets based on structured EHR data (i.e., data registries, institutional EHR review), and half used claims data. All but 1 of the studies used billing codes such as the International Classification of Diseases 9th or 10th edition (ICD-9 or ICD-10), either alone or in addition to another terminology for defining disease. Of the 27 included studies that used ICD-9 and the 20 studies that used ICD-10 codes, the most common codes used pertained to the full spectrum of DR severity. Diabetic retinopathy complications (e.g., vitreous hemorrhage) were also used to define some DR cohorts. Conclusions Substantial variations exist among codified definitions for DR cohorts within retrospective studies. Variable definitions may limit generalizability and reproducibility of retrospective studies. More work is needed to standardize disease cohorts. Financial Disclosures Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Collapse
Affiliation(s)
- Jimmy S Chen
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Ivan A Copado
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Cecilia Vallejos
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Fritz Gerald P Kalaw
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Priyanka Soe
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Cindy X Cai
- Wilmer Eye Institute, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Brian C Toy
- Department of Ophthalmology, Roski Eye Institute, Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Durga Borkar
- Department of Ophthalmology, Duke Eye Center, Duke University, Durham, North Carolina
| | - Catherine Q Sun
- F.I. Proctor Foundation, University of California San Francisco, San Francisco, California
- Department of Ophthalmology, University of California San Francisco, San Francisco, California
| | - Jessica G Shantha
- F.I. Proctor Foundation, University of California San Francisco, San Francisco, California
- Department of Ophthalmology, University of California San Francisco, San Francisco, California
| | - Sally L Baxter
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| |
Collapse
|
3
|
Jafari E, Blackman MH, Karnes JH, Van Driest SL, Crawford DC, Choi L, McDonough CW. Using electronic health records for clinical pharmacology research: Challenges and considerations. Clin Transl Sci 2024; 17:e13871. [PMID: 38943244 PMCID: PMC11213823 DOI: 10.1111/cts.13871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/21/2024] [Accepted: 05/24/2024] [Indexed: 07/01/2024] Open
Abstract
Electronic health records (EHRs) contain a vast array of phenotypic data on large numbers of individuals, often collected over decades. Due to the wealth of information, EHR data have emerged as a powerful resource to make first discoveries and identify disparities in our healthcare system. While the number of EHR-based studies has exploded in recent years, most of these studies are directed at associations with disease rather than pharmacotherapeutic outcomes, such as drug response or adverse drug reactions. This is largely due to challenges specific to deriving drug-related phenotypes from the EHR. There is great potential for EHR-based discovery in clinical pharmacology research, and there is a critical need to address specific challenges related to accurate and reproducible derivation of drug-related phenotypes from the EHR. This review provides a detailed evaluation of challenges and considerations for deriving drug-related data from EHRs. We provide an examination of EHR-based computable phenotypes and discuss cutting-edge approaches to map medication information for clinical pharmacology research, including medication-based computable phenotypes and natural language processing. We also discuss additional considerations such as data structure, heterogeneity and missing data, rare phenotypes, and diversity within the EHR. By further understanding the complexities associated with conducting clinical pharmacology research using EHR-based data, investigators will be better equipped to design thoughtful studies with more reproducible results. Progress in utilizing EHRs for clinical pharmacology research should lead to significant advances in our ability to understand differential drug response and predict adverse drug reactions.
Collapse
Affiliation(s)
- Eissa Jafari
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, College of PharmacyUniversity of FloridaGainesvilleFloridaUSA
- Department of Pharmacy Practice, College of PharmacyJazan UniversityJazanSaudi Arabia
| | - Marisa H. Blackman
- Department of BiostatisticsVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Jason H. Karnes
- Department of Pharmacy Practice and ScienceUniversity of Arizona R. Ken Coit College of PharmacyTucsonArizonaUSA
| | - Sara L. Van Driest
- Department of PediatricsVanderbilt University Medical Center (VUMC)NashvilleTennesseeUSA
- Present address:
All of US Research Program, National Institutes of HealthBethesdaMarylandUSA
| | - Dana C. Crawford
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational BiologyCase Western Reserve UniversityClevelandOhioUSA
- Department of Genetics and Genome Sciences, Cleveland Institute for Computational BiologyCase Western Reserve UniversityClevelandOhioUSA
| | - Leena Choi
- Department of Biostatistics and Biomedical InformaticsVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Caitrin W. McDonough
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, College of PharmacyUniversity of FloridaGainesvilleFloridaUSA
| |
Collapse
|
4
|
Miller M, Jorm L, Partyka C, Burns B, Habig K, Oh C, Immens S, Ballard N, Gallego B. Identifying prehospital trauma patients from ambulance patient care records; comparing two methods using linked data in New South Wales, Australia. Injury 2024; 55:111570. [PMID: 38664086 DOI: 10.1016/j.injury.2024.111570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 04/11/2024] [Accepted: 04/14/2024] [Indexed: 06/16/2024]
Abstract
BACKGROUND Linked datasets for trauma system monitoring should ideally follow patients from the prehospital scene to hospital admission and post-discharge. Having a well-defined cohort when using administrative datasets is essential because they must capture the representative population. Unlike hospital electronic health records (EHR), ambulance patient-care records lack access to sources beyond immediate clinical notes. Relying on a limited set of variables to define a study population might result in missed patient inclusion. We aimed to compare two methods of identifying prehospital trauma patients: one using only those documented under a trauma protocol and another incorporating additional data elements from ambulance patient care records. METHODS We analyzed data from six routinely collected administrative datasets from 2015 to 2018, including ambulance patient-care records, aeromedical data, emergency department visits, hospitalizations, rehabilitation outcomes, and death records. Three prehospital trauma cohorts were created: an Extended-T-protocol cohort (patients transported under a trauma protocol and/or patients with prespecified criteria from structured data fields), T-protocol cohort (only patients documented as transported under a trauma protocol) and non-T-protocol (extended-T-protocol population not in the T-protocol cohort). Patient-encounter characteristics, mortality, clinical and post-hospital discharge outcomes were compared. A conservative p-value of 0.01 was considered significant RESULTS: Of 1 038 263 patient-encounters included in the extended-T-population 814 729 (78.5 %) were transported, with 438 893 (53.9 %) documented as a T-protocol patient. Half (49.6 %) of the non-T-protocol sub-cohort had an International Classification of Disease 10th edition injury or external cause code, indicating 79644 missed patients when a T-protocol-only definition was used. The non-T-protocol sub-cohort also identified additional patients with intubation, prehospital blood transfusion and positive eFAST. A higher proportion of non-T protocol patients than T-protocol patients were admitted to the ICU (4.6% vs 3.6 %), ventilated (1.8% vs 1.3 %), received in-hospital transfusion (7.9 vs 6.8 %) or died (1.8% vs 1.3 %). Urgent trauma surgery was similar between groups (1.3% vs 1.4 %). CONCLUSION The extended-T-population definition identified 50 % more admitted patients with an ICD-10-AM code consistent with an injury, including patients with severe trauma. Developing an EHR phenotype incorporating multiple data fields of ambulance-transported trauma patients for use with linked data may avoid missing these patients.
Collapse
Affiliation(s)
- Matthew Miller
- Aeromedical Operations, New South Wales Ambulance, Rozelle, NSW 2039, Australia; Department of Anesthesia, St George Hospital, Kogarah, NSW 2217 Australia; Centre for Big Data Research in Health at UNSW Sydney, Kensington, NSW 2052, Australia.
| | - Louisa Jorm
- Foundation Director of the Centre for Big Data Research in Health at UNSW Sydney, Kensington 2052, Australia
| | - Chris Partyka
- Aeromedical Operations, New South Wales Ambulance, Rozelle, NSW 2039, Australia; Department of Emergency Medicine, Royal North Shore Hospital, St Leonards, NSW 2065, Australia
| | - Brian Burns
- Aeromedical Operations, New South Wales Ambulance, Rozelle, NSW 2039, Australia; Royal North Shore Hospital, St Leonards, NSW 2065, Australia; Faculty of Medicine & Health, University of Sydney, Camperdown, NSW 2050, Australia
| | - Karel Habig
- Aeromedical Operations, New South Wales Ambulance, Rozelle, NSW 2039, Australia
| | - Carissa Oh
- Aeromedical Operations, New South Wales Ambulance, Rozelle, NSW 2039, Australia; Department of Emergency Medicine, St George Hospital, Kogarah, NSW 2217 Australia
| | - Sam Immens
- Aeromedical Operations, New South Wales Ambulance, Rozelle, NSW 2039, Australia
| | - Neil Ballard
- Aeromedical Operations, New South Wales Ambulance, Rozelle, NSW 2039, Australia; Department of Paediatric Emergency Medicine, Sydney Children's Hospital, Randwick, NSW 2031, Australia; Department of Emergency Medicine, Royal Prince Alfred Hospital, Camperdown, NSW 2050, Australia
| | - Blanca Gallego
- Clinical analytics and machine learning unit, Centre for Big Data Research in Health at UNSW Sydney, Kensington 2052, Australia
| |
Collapse
|
5
|
Newby D, Taylor N, Joyce DW, Winchester LM. Optimising the use of electronic medical records for large scale research in psychiatry. Transl Psychiatry 2024; 14:232. [PMID: 38824136 PMCID: PMC11144247 DOI: 10.1038/s41398-024-02911-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/13/2024] [Accepted: 04/15/2024] [Indexed: 06/03/2024] Open
Abstract
The explosion and abundance of digital data could facilitate large-scale research for psychiatry and mental health. Research using so-called "real world data"-such as electronic medical/health records-can be resource-efficient, facilitate rapid hypothesis generation and testing, complement existing evidence (e.g. from trials and evidence-synthesis) and may enable a route to translate evidence into clinically effective, outcomes-driven care for patient populations that may be under-represented. However, the interpretation and processing of real-world data sources is complex because the clinically important 'signal' is often contained in both structured and unstructured (narrative or "free-text") data. Techniques for extracting meaningful information (signal) from unstructured text exist and have advanced the re-use of routinely collected clinical data, but these techniques require cautious evaluation. In this paper, we survey the opportunities, risks and progress made in the use of electronic medical record (real-world) data for psychiatric research.
Collapse
Affiliation(s)
- Danielle Newby
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, UK
| | - Niall Taylor
- Department of Psychiatry, University of Oxford, Oxford, UK
| | - Dan W Joyce
- Department of Primary Care and Mental Health and Civic Health, Innovation Labs, Institute of Population Health, University of Liverpool, Liverpool, UK
| | | |
Collapse
|
6
|
Bazemore K, Joo J, Hwang WT, Himes BE. Clarifying Chronic Obstructive Pulmonary Disease Genetic Associations Observed in Biobanks via Mediation Analysis of Smoking. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:499-508. [PMID: 38827081 PMCID: PMC11141825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Varying case definitions of COPD have heterogenous genetic risk profiles, potentially reflective of disease subtypes or classification bias (e.g., smokers more likely to be diagnosed with COPD). To better understand differences in genetic loci associated with ICD-defined versus spirometry-defined COPD we contrasted their GWAS results with those for heavy smoking among 337,138 UK Biobank participants. Overlapping risk loci were found in/near the genes ZEB2, FAM136B, CHRNA3, and CHRNA4, with the CHRNA3 locus shared across all three traits. Mediation analysis to estimate the effects of lead genotyped variants mediated by smoking found significant indirect effects for the FAM136B, CHRNA3, and CHRNA4 loci for both COPD definitions. Adjustment for mediator-outcome confounders modestly attenuated indirect effects, though in the CHRNA4 locus for spirometry-defined COPD the proportion mediated increased an additional 8.47%. Our results suggest that differences between ICD-defined and spirometry-defined COPD associated genetic loci are not a result of smoking biasing classification.
Collapse
Affiliation(s)
- Katrina Bazemore
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jaehyun Joo
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Wei-Ting Hwang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Blanca E Himes
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
7
|
Mathis M, Steffner KR, Subramanian H, Gill GP, Girardi NI, Bansal S, Bartels K, Khanna AK, Huang J. Overview and Clinical Applications of Artificial Intelligence and Machine Learning in Cardiac Anesthesiology. J Cardiothorac Vasc Anesth 2024; 38:1211-1220. [PMID: 38453558 PMCID: PMC10999327 DOI: 10.1053/j.jvca.2024.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 01/30/2024] [Accepted: 02/05/2024] [Indexed: 03/09/2024]
Abstract
Artificial intelligence- (AI) and machine learning (ML)-based applications are becoming increasingly pervasive in the healthcare setting. This has in turn challenged clinicians, hospital administrators, and health policymakers to understand such technologies and develop frameworks for safe and sustained clinical implementation. Within cardiac anesthesiology, challenges and opportunities for AI/ML to support patient care are presented by the vast amounts of electronic health data, which are collected rapidly, interpreted, and acted upon within the periprocedural area. To address such challenges and opportunities, in this article, the authors review 3 recent applications relevant to cardiac anesthesiology, including depth of anesthesia monitoring, operating room resource optimization, and transthoracic/transesophageal echocardiography, as conceptual examples to explore strengths and limitations of AI/ML within healthcare, and characterize this evolving landscape. Through reviewing such applications, the authors introduce basic AI/ML concepts and methodologies, as well as practical considerations and ethical concerns for initiating and maintaining safe clinical implementation of AI/ML-based algorithms for cardiac anesthesia patient care.
Collapse
Affiliation(s)
- Michael Mathis
- Department of Anesthesiology, University of Michigan Medicine, Ann Arbor, MI
| | - Kirsten R Steffner
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA
| | - Harikesh Subramanian
- Department of Anesthesiology and Perioperative Medicine, University of Pittsburgh, Pittsburgh, PA
| | - George P Gill
- Department of Anesthesiology, Cedars Sinai, Los Angeles, CA
| | | | - Sagar Bansal
- Department of Anesthesiology and Perioperative Medicine, University of Missouri School of Medicine, Columbia, MO
| | - Karsten Bartels
- Department of Anesthesiology, University of Nebraska Medical Center, Omaha, NE
| | - Ashish K Khanna
- Department of Anesthesiology, Section on Critical Care Medicine, School of Medicine, Wake Forest University, Atrium Health Wake Forest Baptist Medical Center, Winston-Salem, NC
| | - Jiapeng Huang
- Department of Anesthesiology and Perioperative Medicine, University of Louisville, Louisville, KY.
| |
Collapse
|
8
|
Cao X, Zhang S, Sha Q. A novel method for multiple phenotype association studies based on genotype and phenotype network. PLoS Genet 2024; 20:e1011245. [PMID: 38728360 PMCID: PMC11111089 DOI: 10.1371/journal.pgen.1011245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 05/22/2024] [Accepted: 03/29/2024] [Indexed: 05/12/2024] Open
Abstract
Joint analysis of multiple correlated phenotypes for genome-wide association studies (GWAS) can identify and interpret pleiotropic loci which are essential to understand pleiotropy in diseases and complex traits. Meanwhile, constructing a network based on associations between phenotypes and genotypes provides a new insight to analyze multiple phenotypes, which can explore whether phenotypes and genotypes might be related to each other at a higher level of cellular and organismal organization. In this paper, we first develop a bipartite signed network by linking phenotypes and genotypes into a Genotype and Phenotype Network (GPN). The GPN can be constructed by a mixture of quantitative and qualitative phenotypes and is applicable to binary phenotypes with extremely unbalanced case-control ratios in large-scale biobank datasets. We then apply a powerful community detection method to partition phenotypes into disjoint network modules based on GPN. Finally, we jointly test the association between multiple phenotypes in a network module and a single nucleotide polymorphism (SNP). Simulations and analyses of 72 complex traits in the UK Biobank show that multiple phenotype association tests based on network modules detected by GPN are much more powerful than those without considering network modules. The newly proposed GPN provides a new insight to investigate the genetic architecture among different types of phenotypes. Multiple phenotypes association studies based on GPN are improved by incorporating the genetic information into the phenotype clustering. Notably, it might broaden the understanding of genetic architecture that exists between diagnoses, genes, and pleiotropy.
Collapse
Affiliation(s)
- Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
9
|
Choudhary T, Upadhyaya P, Davis CM, Yang P, Tallowin S, Lisboa FA, Schobel SA, Coopersmith CM, Elster EA, Buchman TG, Dente CJ, Kamaleswaran R. Derivation and Validation of Generalized Sepsis-induced Acute Respiratory Failure Phenotypes Among Critically Ill Patients: A Retrospective Study. RESEARCH SQUARE 2024:rs.3.rs-4307475. [PMID: 38746442 PMCID: PMC11092838 DOI: 10.21203/rs.3.rs-4307475/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Background Septic patients who develop acute respiratory failure (ARF) requiring mechanical ventilation represent a heterogenous subgroup of critically ill patients with widely variable clinical characteristics. Identifying distinct phenotypes of these patients may reveal insights about the broader heterogeneity in the clinical course of sepsis. We aimed to derive novel phenotypes of sepsis-induced ARF using observational clinical data and investigate their generalizability across multi-ICU specialties, considering multi-organ dynamics. Methods We performed a multi-center retrospective study of ICU patients with sepsis who required mechanical ventilation for ≥24 hours. Data from two different high-volume academic hospital systems were used as a derivation set with N=3,225 medical ICU (MICU) patients and a validation set with N=848 MICU patients. For the multi-ICU validation, we utilized retrospective data from two surgical ICUs at the same hospitals (N=1,577). Clinical data from 24 hours preceding intubation was used to derive distinct phenotypes using an explainable machine learning-based clustering model interpreted by clinical experts. Results Four distinct ARF phenotypes were identified: A (severe multi-organ dysfunction (MOD) with a high likelihood of kidney injury and heart failure), B (severe hypoxemic respiratory failure [median P/F=123]), C (mild hypoxia [median P/F=240]), and D (severe MOD with a high likelihood of hepatic injury, coagulopathy, and lactic acidosis). Patients in each phenotype showed differences in clinical course and mortality rates despite similarities in demographics and admission co-morbidities. The phenotypes were reproduced in external validation utilizing an external MICU from second hospital and SICUs from both centers. Kaplan-Meier analysis showed significant difference in 28-day mortality across the phenotypes (p<0.01) and consistent across both centers. The phenotypes demonstrated differences in treatment effects associated with high positive end-expiratory pressure (PEEP) strategy. Conclusion The phenotypes demonstrated unique patterns of organ injury and differences in clinical outcomes, which may help inform future research and clinical trial design for tailored management strategies.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Eric A Elster
- Uniformed Services University of the Health Sciences
| | | | | | | |
Collapse
|
10
|
Lemas DJ, Du X, Rouhizadeh M, Lewis B, Frank S, Wright L, Spirache A, Gonzalez L, Cheves R, Magalhães M, Zapata R, Reddy R, Xu K, Parker L, Harle C, Young B, Louis-Jaques A, Zhang B, Thompson L, Hogan WR, Modave F. Classifying early infant feeding status from clinical notes using natural language processing and machine learning. Sci Rep 2024; 14:7831. [PMID: 38570569 PMCID: PMC10991582 DOI: 10.1038/s41598-024-58299-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 03/27/2024] [Indexed: 04/05/2024] Open
Abstract
The objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother's milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients.
Collapse
Affiliation(s)
- Dominick J Lemas
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA.
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Gainesville, FL, 32610, USA.
| | - Xinsong Du
- Division of General Internal Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, 02115, USA
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
| | - Masoud Rouhizadeh
- Department of Pharmaceutical Outcomes and Policy, University of Florida College of Medicine, Gainesville, FL, 32610, USA
- Biomedical Informatics and Data Science Section, Division of General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Braeden Lewis
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Simon Frank
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Lauren Wright
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Alex Spirache
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Lisa Gonzalez
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Ryan Cheves
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Marina Magalhães
- Division of Neonatal and Developmental Medicine, Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, 94305, USA
| | - Ruben Zapata
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Rahul Reddy
- Department of Computer and Information Science, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, 32611, USA
| | - Ke Xu
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA
| | - Leslie Parker
- Department of Biobehavioral Nursing Science, University of Florida College of Nursing, Gainesville, FL, 32603, USA
| | - Chris Harle
- Health Policy and Management Department, Richard M. Fairbanks School of Public Health, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202, USA
| | - Bridget Young
- Division of Breastfeeding and Lactation Medicine, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Adetola Louis-Jaques
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Gainesville, FL, 32610, USA
| | - Bouri Zhang
- Health Science Center Libraries, University of Florida, Gainesville, FL, 32610, USA
| | - Lindsay Thompson
- Department of Pediatrics, Wake Forest School of Medicine, Winston-Salem, NC, 27101, USA
| | - William R Hogan
- Data Science Institute, Medical College of Wisconsin, Milwaukee, WI, 53226, USA
| | - François Modave
- Department of Anesthesiology, University of Florida College of Medicine, Gainesville, FL, 32610, USA
| |
Collapse
|
11
|
Levites Strekalova YA, Wang X, Sanchez O, Midence S. Trends in publication and levels of social determinants of health reporting in Journal of Clinical and Translational Science from 2017 to 2023. J Clin Transl Sci 2024; 8:e58. [PMID: 38655458 PMCID: PMC11036436 DOI: 10.1017/cts.2024.508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 03/13/2024] [Accepted: 03/19/2024] [Indexed: 04/26/2024] Open
Abstract
Social determinants of health affect clinical and translational research processes and outcomes but remain underreported in empirical studies. This scoping review examined the rate and types of social determinants of health (SDoH) variables included in the JCTS translational research studies published between 2017 and 2023 and included 129 studies. Most papers (91.7%) reported at least one SDoH variable with age, race and ethnicity, and sex included most often. Future studies to inform the role of SDoH data in translational research and science are recommended, and a draft SDoH data checklist is provided.
Collapse
Affiliation(s)
- Yulia A. Levites Strekalova
- Department of Health Services Research, Management and Policy, College of
Public Health and Health Professions, University of Florida,
Gainesville, FL, USA
- Clinical and Translational Science Institute, University of
Florida, Gainesville, FL, USA
| | - Xiangren Wang
- Department of Health Services Research, Management and Policy, College of
Public Health and Health Professions, University of Florida,
Gainesville, FL, USA
| | - Orlando Sanchez
- Clinical and Translational Science Institute, University of
Florida, Gainesville, FL, USA
| | - Sara Midence
- Department of Health Services Research, Management and Policy, College of
Public Health and Health Professions, University of Florida,
Gainesville, FL, USA
| |
Collapse
|
12
|
Clarke H, Fitzcharles MA. Are Electronic Health Records Sufficiently Accurate to Phenotype Rheumatology Patients With Chronic Pain? J Rheumatol 2024; 51:218-220. [PMID: 38224990 DOI: 10.3899/jrheum.2023-1227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]
Affiliation(s)
- Hance Clarke
- H. Clarke, MD, PhD, Department of Anesthesiology and Pain Medicine, University of Toronto, Department of Anesthesia and Pain Management, Pain Research Unit, Toronto General Hospital, and Transitional Pain Service, Toronto General Hospital, Toronto, Ontario
| | - Mary-Ann Fitzcharles
- M.A. Fitzcharles, MB ChB, Department of Rheumatology, McGill University, Montreal, and Alan Edwards Pain Management Unit, McGill University, Montreal, Canada.
| |
Collapse
|
13
|
Al-Sahab B, Leviton A, Loddenkemper T, Paneth N, Zhang B. Biases in Electronic Health Records Data for Generating Real-World Evidence: An Overview. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024; 8:121-139. [PMID: 38273982 PMCID: PMC10805748 DOI: 10.1007/s41666-023-00153-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/05/2023] [Accepted: 11/07/2023] [Indexed: 01/27/2024]
Abstract
Electronic Health Records (EHR) are increasingly being perceived as a unique source of data for clinical research as they provide unprecedentedly large volumes of real-time data from real-world settings. In this review of the secondary uses of EHR, we identify the anticipated breadth of opportunities, pointing out the data deficiencies and potential biases that are likely to limit the search for true causal relationships. This paper provides a comprehensive overview of the types of biases that arise along the pathways that generate real-world evidence and the sources of these biases. We distinguish between two levels in the production of EHR data where biases are likely to arise: (i) at the healthcare system level, where the principal source of bias resides in access to, and provision of, medical care, and in the acquisition and documentation of medical and administrative data; and (ii) at the research level, where biases arise from the processes of extracting, analyzing, and interpreting these data. Due to the plethora of biases, mainly in the form of selection and information bias, we conclude with advising extreme caution about making causal inferences based on secondary uses of EHRs.
Collapse
Affiliation(s)
- Ban Al-Sahab
- Department of Family Medicine, College of Human Medicine, Michigan State University, B100 Clinical Center, 788 Service Road, East Lansing, MI USA
| | - Alan Leviton
- Department of Neurology, Harvard Medical School, Boston, MA USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA USA
| | - Tobias Loddenkemper
- Department of Neurology, Harvard Medical School, Boston, MA USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA USA
| | - Nigel Paneth
- Department of Epidemiology and Biostatistics, College of Human Medicine, Michigan State University, East Lansing, MI USA
- Department of Pediatrics and Human Development, College of Human Medicine, Michigan State University, East Lansing, MI USA
| | - Bo Zhang
- Department of Neurology, Boston Children’s Hospital, Boston, MA USA
- Biostatistics and Research Design, Institutional Centers of Clinical and Translational Research, Boston Children’s Hospital, Boston, MA USA
- Harvard Medical School, Boston, MA USA
| |
Collapse
|
14
|
Acharya A, Shrestha S, Chen A, Conte J, Avramovic S, Sikdar S, Anastasopoulos A, Das S. Clinical risk prediction using language models: benefits and considerations. J Am Med Inform Assoc 2024:ocae030. [PMID: 38412328 DOI: 10.1093/jamia/ocae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 01/11/2024] [Accepted: 02/03/2024] [Indexed: 02/29/2024] Open
Abstract
OBJECTIVE The use of electronic health records (EHRs) for clinical risk prediction is on the rise. However, in many practical settings, the limited availability of task-specific EHR data can restrict the application of standard machine learning pipelines. In this study, we investigate the potential of leveraging language models (LMs) as a means to incorporate supplementary domain knowledge for improving the performance of various EHR-based risk prediction tasks. METHODS We propose two novel LM-based methods, namely "LLaMA2-EHR" and "Sent-e-Med." Our focus is on utilizing the textual descriptions within structured EHRs to make risk predictions about future diagnoses. We conduct a comprehensive comparison with previous approaches across various data types and sizes. RESULTS Experiments across 6 different methods and 3 separate risk prediction tasks reveal that employing LMs to represent structured EHRs, such as diagnostic histories, results in significant performance improvements when evaluated using standard metrics such as area under the receiver operating characteristic (ROC) curve and precision-recall (PR) curve. Additionally, they offer benefits such as few-shot learning, the ability to handle previously unseen medical concepts, and adaptability to various medical vocabularies. However, it is noteworthy that outcomes may exhibit sensitivity to a specific prompt. CONCLUSION LMs encompass extensive embedded knowledge, making them valuable for the analysis of EHRs in the context of risk prediction. Nevertheless, it is important to exercise caution in their application, as ongoing safety concerns related to LMs persist and require continuous consideration.
Collapse
Affiliation(s)
| | | | - Anyi Chen
- Staten Island Performing Provider System, Staten Island, NY, United States
| | - Joseph Conte
- Staten Island Performing Provider System, Staten Island, NY, United States
| | | | | | | | - Sanmay Das
- George Mason University, Fairfax, VA, United States
| |
Collapse
|
15
|
Kashkoush J, Gupta M, Meissner MA, Nielsen ME, Kirchner HL, Garg T. Performance Characteristics of a Rule-Based Electronic Health Record Algorithm to Identify Patients with Gross and Microscopic Hematuria. Methods Inf Med 2023; 62:183-192. [PMID: 37666279 DOI: 10.1055/a-2165-5552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
BACKGROUND Two million patients per year are referred to urologists for hematuria, or blood in the urine. The American Urological Association recently adopted a risk-stratified hematuria evaluation guideline to limit multi-phase computed tomography to individuals at highest risk of occult malignancy. OBJECTIVES To understand population-level hematuria evaluations, we developed an algorithm to accurately identify hematuria cases from electronic health records (EHRs). METHODS We used International Classification of Diseases (ICD)-9/ICD-10 diagnosis codes, urine color, and urine microscopy values to identify hematuria cases and to differentiate between gross and microscopic hematuria. Using an iterative process, we refined the ICD-9 algorithm on a gold standard, chart-reviewed cohort of 3,094 hematuria cases, and the ICD-10 algorithm on a 300 patient cohort. We applied the algorithm to Geisinger patients ≥35 years (n = 539,516) and determined performance by conducting chart review (n = 500). RESULTS After applying the hematuria algorithm, we identified 51,500 hematuria cases and 488,016 clean controls. Of the hematuria cases, 11,435 were categorized as gross, 26,658 as microscopic, 12,562 as indeterminate, and 845 were uncategorized. The positive predictive value (PPV) of identifying hematuria cases using the algorithm was 100% and the negative predictive value (NPV) was 99%. The gross hematuria algorithm had a PPV of 100% and NPV of 99%. The microscopic hematuria algorithm had lower PPV of 78% and NPV of 100%. CONCLUSION We developed an algorithm utilizing diagnosis codes and urine laboratory values to accurately identify hematuria and categorize as gross or microscopic in EHRs. Applying the algorithm will help researchers to understand patterns of care for this common condition.
Collapse
Affiliation(s)
- Jasmine Kashkoush
- Department of Urology, Geisinger, Danville, Pennsylvania, United States
| | - Mudit Gupta
- Phenomic Analytics and Clinical Data Core, Geisinger, Danville, Pennsylvania, United States
| | | | - Matthew E Nielsen
- Department of Urology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, North Carolina, United States
- Department of Epidemiology, University of North Carolina at Chapel Hill, Gillings School of Global Public Health, Chapel Hill, North Carolina, United States
- Department of Health Policy & Management, University of North Carolina at Chapel Hill, Gillings School of Global Public Health, Chapel Hill, North Carolina, United States
| | - H Lester Kirchner
- Department of Population Health Sciences, Geisinger, Danville, Pennsylvania, United States
| | - Tullika Garg
- Department of Population Health Sciences, Geisinger, Danville, Pennsylvania, United States
- Department of Urology, Penn State Health Milton S. Hershey Medical Center, Hershey, Pennsylvania, United States
| |
Collapse
|
16
|
Chen Q, Dwaraka VB, Carreras-Gallo N, Mendez K, Chen Y, Begum S, Kachroo P, Prince N, Went H, Mendez T, Lin A, Turner L, Moqri M, Chu SH, Kelly RS, Weiss ST, Rattray NJ, Gladyshev VN, Karlson E, Wheelock C, Mathé EA, Dahlin A, McGeachie MJ, Smith R, Lasky-Su JA. OMICmAge: An integrative multi-omics approach to quantify biological age with electronic medical records. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.16.562114. [PMID: 37904959 PMCID: PMC10614756 DOI: 10.1101/2023.10.16.562114] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Biological aging is a multifactorial process involving complex interactions of cellular and biochemical processes that is reflected in omic profiles. Using common clinical laboratory measures in ~30,000 individuals from the MGB-Biobank, we developed a robust, predictive biological aging phenotype, EMRAge, that balances clinical biomarkers with overall mortality risk and can be broadly recapitulated across EMRs. We then applied elastic-net regression to model EMRAge with DNA-methylation (DNAm) and multiple omics, generating DNAmEMRAge and OMICmAge, respectively. Both biomarkers demonstrated strong associations with chronic diseases and mortality that outperform current biomarkers across our discovery (MGB-ABC, n=3,451) and validation (TruDiagnostic, n=12,666) cohorts. Through the use of epigenetic biomarker proxies, OMICmAge has the unique advantage of expanding the predictive search space to include epigenomic, proteomic, metabolomic, and clinical data while distilling this in a measure with DNAm alone, providing opportunities to identify clinically-relevant interconnections central to the aging process.
Collapse
Affiliation(s)
- Qingwen Chen
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | | | | | - Kevin Mendez
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Yulu Chen
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Sofina Begum
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Priyadarshini Kachroo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Nicole Prince
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | | | | | - Aaron Lin
- TruDiagnostic, Inc., Lexington, KY USA
| | | | - Mahdi Moqri
- Division of Genetics, Dept. of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA
| | - Su H. Chu
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Rachel S. Kelly
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Scott T. Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Nicholas J.W Rattray
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
- Strathclyde Centre for Molecular Bioscience, University of Strathclyde, Glasgow, UK
| | - Vadim N. Gladyshev
- Division of Genetics, Dept. of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Elizabeth Karlson
- Department of Personalized Medicine, Mass General Brigham and Harvard Medical School, Boston, MA, USA
| | - Craig Wheelock
- Division of Physiological Chemistry 2, Dept of Medical Biochemistry and Biophysics, Karolinska Institute, Stockholm, Sweden
| | - Ewy A. Mathé
- Division of Preclinical Innovation, National Center for Advancing Translational Science, National Institutes of Health, Rockville, MD, USA
| | - Amber Dahlin
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Michae J. McGeachie
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Jessica A. Lasky-Su
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
17
|
Nealon CL, Halladay CW, Gorman BR, Simpson P, Roncone DP, Canania RL, Anthony SA, Rogers LRS, Leber JN, Dougherty JM, Bailey JNC, Crawford DC, Sullivan JM, Galor A, Wu WC, Greenberg PB, Lass JH, Iyengar SK, Peachey NS. Association Between Fuchs Endothelial Corneal Dystrophy, Diabetes Mellitus, and Multimorbidity. Cornea 2023; 42:1140-1149. [PMID: 37170406 PMCID: PMC10523841 DOI: 10.1097/ico.0000000000003311] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 04/11/2023] [Indexed: 05/13/2023]
Abstract
PURPOSE The aim of this study was to assess risk for demographic variables and other health conditions that are associated with Fuchs endothelial corneal dystrophy (FECD). METHODS We developed a FECD case-control algorithm based on structured electronic health record data and confirmed accuracy by individual review of charts at 3 Veterans Affairs (VA) Medical Centers. This algorithm was applied to the Department of VA Million Veteran Program cohort from whom sex, genetic ancestry, comorbidities, diagnostic phecodes, and laboratory values were extracted. Single-variable and multiple variable logistic regression models were used to determine the association of these risk factors with FECD diagnosis. RESULTS Being a FECD case was associated with female sex, European genetic ancestry, and a greater number of comorbidities. Of 1417 diagnostic phecodes evaluated, 213 had a significant association with FECD, falling in both ocular and nonocular conditions, including diabetes mellitus (DM). Five of 69 laboratory values were associated with FECD, with the direction of change for 4 being consistent with DM. Insulin dependency and type 1 DM raised risk to a greater degree than type 2 DM, like other microvascular diabetic complications. CONCLUSIONS Female sex, European ancestry, and multimorbidity increased FECD risk. Endocrine/metabolic clinic encounter codes and altered patterns of laboratory values support DM increasing FECD risk. Our results evoke a threshold model in which the FECD phenotype is intensified by DM and potentially other health conditions that alter corneal physiology. Further studies to better understand the relationship between FECD and DM are indicated and may help identify opportunities for slowing FECD progression.
Collapse
Affiliation(s)
- Cari L. Nealon
- Eye Clinic, VA Northeast Ohio Healthcare System, Cleveland, Ohio, USA
| | - Christopher W. Halladay
- Center of Innovation in Long Term Services and Supports, Providence VA Medical Center, Providence, Rhode Island, USA
| | - Bryan R. Gorman
- VA Cooperative Studies Program, VA Boston Healthcare System, Boston, Massachusetts
- Booz Allen Hamilton, McLean, Virginia, USA
| | - Piana Simpson
- Eye Clinic, VA Northeast Ohio Healthcare System, Cleveland, Ohio, USA
| | - David P. Roncone
- Eye Clinic, VA Northeast Ohio Healthcare System, Cleveland, Ohio, USA
| | | | - Scott A. Anthony
- Eye Clinic, VA Northeast Ohio Healthcare System, Cleveland, Ohio, USA
| | | | - Jenna N. Leber
- Ophthalmology Section, VA Western NY Health Care System, Buffalo, New York, USA
| | | | - Jessica N. Cooke Bailey
- Cleveland Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
- Department of Population & Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
- Research Service, VA Northeast Ohio Healthcare System, Cleveland, Ohio, USA
| | - Dana C. Crawford
- Cleveland Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
- Department of Population & Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
- Research Service, VA Northeast Ohio Healthcare System, Cleveland, Ohio, USA
| | - Jack M. Sullivan
- Ophthalmology Section, VA Western NY Health Care System, Buffalo, New York, USA
- Research Service, VA Western NY Health Care System, Buffalo, New York, USA
- Department of Ophthalmology (Ross Eye Institute), University at Buffalo-SUNY, Buffalo, New York, USA
| | - Anat Galor
- Miami Veterans Affairs Medical Center, Miami, Florida, USA
- Bascom Palmer Eye Institute, University of Miami, Miami, Florida, USA
| | - Wen-Chih Wu
- Cardiology Section, Medical Service, Providence VA Medical Center, Providence, Rhode Island, USA
| | - Paul B. Greenberg
- Ophthalmology Section, Providence VA Medical Center, Providence, Rhode Island, USA
- Division of Ophthalmology, Alpert Medical School, Brown University, Providence, Rhode Island, USA
| | | | - Jonathan H. Lass
- Department of Ophthalmology & Visual Sciences, Case Western Reserve University, Cleveland, Ohio, USA
- University Hospitals Eye Institute, Cleveland, Ohio, USA
| | - Sudha K. Iyengar
- Cleveland Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
- Department of Population & Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
- Research Service, VA Northeast Ohio Healthcare System, Cleveland, Ohio, USA
| | - Neal S. Peachey
- Research Service, VA Northeast Ohio Healthcare System, Cleveland, Ohio, USA
- Cole Eye Institute, Cleveland Clinic Foundation, Cleveland, Ohio, USA
- Department of Ophthalmology, Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, Ohio, USA
| |
Collapse
|
18
|
Sathe NA, Xian S, Mabrey FL, Crosslin DR, Mooney SD, Morrell ED, Lybarger K, Yetisgen M, Jarvik GP, Bhatraju PK, Wurfel MM. Evaluating construct validity of computable acute respiratory distress syndrome definitions in adults hospitalized with COVID-19: an electronic health records based approach. BMC Pulm Med 2023; 23:292. [PMID: 37559024 PMCID: PMC10413524 DOI: 10.1186/s12890-023-02560-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 07/11/2023] [Indexed: 08/11/2023] Open
Abstract
BACKGROUND Evolving ARDS epidemiology and management during COVID-19 have prompted calls to reexamine the construct validity of Berlin criteria, which have been rarely evaluated in real-world data. We developed a Berlin ARDS definition (EHR-Berlin) computable in electronic health records (EHR) to (1) assess its construct validity, and (2) assess how expanding its criteria affected validity. METHODS We performed a retrospective cohort study at two tertiary care hospitals with one EHR, among adults hospitalized with COVID-19 February 2020-March 2021. We assessed five candidate definitions for ARDS: the EHR-Berlin definition modeled on Berlin criteria, and four alternatives informed by recent proposals to expand criteria and include patients on high-flow oxygen (EHR-Alternative 1), relax imaging criteria (EHR-Alternatives 2-3), and extend timing windows (EHR-Alternative 4). We evaluated two aspects of construct validity for the EHR-Berlin definition: (1) criterion validity: agreement with manual ARDS classification by experts, available in 175 patients; (2) predictive validity: relationships with hospital mortality, assessed by Pearson r and by area under the receiver operating curve (AUROC). We assessed predictive validity and timing of identification of EHR-Berlin definition compared to alternative definitions. RESULTS Among 765 patients, mean (SD) age was 57 (18) years and 471 (62%) were male. The EHR-Berlin definition classified 171 (22%) patients as ARDS, which had high agreement with manual classification (kappa 0.85), and was associated with mortality (Pearson r = 0.39; AUROC 0.72, 95% CI 0.68, 0.77). In comparison, EHR-Alternative 1 classified 219 (29%) patients as ARDS, maintained similar relationships to mortality (r = 0.40; AUROC 0.74, 95% CI 0.70, 0.79, Delong test P = 0.14), and identified patients earlier in their hospitalization (median 13 vs. 15 h from admission, Wilcoxon signed-rank test P < 0.001). EHR-Alternative 3, which removed imaging criteria, had similar correlation (r = 0.41) but better discrimination for mortality (AUROC 0.76, 95% CI 0.72, 0.80; P = 0.036), and identified patients median 2 h (P < 0.001) from admission. CONCLUSIONS The EHR-Berlin definition can enable ARDS identification with high criterion validity, supporting large-scale study and surveillance. There are opportunities to expand the Berlin criteria that preserve predictive validity and facilitate earlier identification.
Collapse
Affiliation(s)
- Neha A Sathe
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of Washington, 325 9th Avenue HMC #359640, Seattle, WA, 98104-2499, USA.
| | - Su Xian
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - F Linzee Mabrey
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of Washington, 325 9th Avenue HMC #359640, Seattle, WA, 98104-2499, USA
| | - David R Crosslin
- Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA, USA
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Eric D Morrell
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of Washington, 325 9th Avenue HMC #359640, Seattle, WA, 98104-2499, USA
| | - Kevin Lybarger
- Department of Information Sciences and Technology, George Mason University, Fairfax, VA, USA
| | - Meliha Yetisgen
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Gail P Jarvik
- Department of Genome Sciences and Division of Medical Genetics, Department of Medicine, University of Washington Medical Center, Seattle, WA, USA
| | - Pavan K Bhatraju
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of Washington, 325 9th Avenue HMC #359640, Seattle, WA, 98104-2499, USA
| | - Mark M Wurfel
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of Washington, 325 9th Avenue HMC #359640, Seattle, WA, 98104-2499, USA
| |
Collapse
|
19
|
Penrod N, Okeh C, Velez Edwards DR, Barnhart K, Senapati S, Verma SS. Leveraging electronic health record data for endometriosis research. Front Digit Health 2023; 5:1150687. [PMID: 37342866 PMCID: PMC10278662 DOI: 10.3389/fdgth.2023.1150687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 05/10/2023] [Indexed: 06/23/2023] Open
Abstract
Endometriosis is a chronic, complex disease for which there are vast disparities in diagnosis and treatment between sociodemographic groups. Clinical presentation of endometriosis can vary from asymptomatic disease-often identified during (in)fertility consultations-to dysmenorrhea and debilitating pelvic pain. Because of this complexity, delayed diagnosis (mean time to diagnosis is 1.7-3.6 years) and misdiagnosis is common. Early and accurate diagnosis of endometriosis remains a research priority for patient advocates and healthcare providers. Electronic health records (EHRs) have been widely adopted as a data source in biomedical research. However, they remain a largely untapped source of data for endometriosis research. EHRs capture diverse, real-world patient populations and care trajectories and can be used to learn patterns of underlying risk factors for endometriosis which, in turn, can be used to inform screening guidelines to help clinicians efficiently and effectively recognize and diagnose the disease in all patient populations reducing inequities in care. Here, we provide an overview of the advantages and limitations of using EHR data to study endometriosis. We describe the prevalence of endometriosis observed in diverse populations from multiple healthcare institutions, examples of variables that can be extracted from EHRs to enhance the accuracy of endometriosis prediction, and opportunities to leverage longitudinal EHR data to improve our understanding of long-term health consequences for all patients.
Collapse
Affiliation(s)
- Nadia Penrod
- College of Agriculture and Life Sciences, Texas A&M University, College Station, TX, United States
| | - Chelsea Okeh
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, Philadelphia, PA, United States
| | - Digna R. Velez Edwards
- Department of Obstetrics and Gynecology, Vanderbilt University, Nashville, TN, United States
| | - Kurt Barnhart
- Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Suneeta Senapati
- Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Shefali S. Verma
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, Philadelphia, PA, United States
| |
Collapse
|
20
|
Deutsch AJ, Stalbow L, Majarian TD, Mercader JM, Manning AK, Florez JC, Loos RJ, Udler MS. Polygenic Scores Help Reduce Racial Disparities in Predictive Accuracy of Automated Type 1 Diabetes Classification Algorithms. Diabetes Care 2023; 46:794-800. [PMID: 36745605 PMCID: PMC10090893 DOI: 10.2337/dc22-1833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 01/10/2023] [Indexed: 02/07/2023]
Abstract
OBJECTIVE Automated algorithms to identify individuals with type 1 diabetes using electronic health records are increasingly used in biomedical research. It is not known whether the accuracy of these algorithms differs by self-reported race. We investigated whether polygenic scores improve identification of individuals with type 1 diabetes. RESEARCH DESIGN AND METHODS We investigated two large hospital-based biobanks (Mass General Brigham [MGB] and BioMe) and identified individuals with type 1 diabetes using an established automated algorithm. We performed medical record reviews to validate the diagnosis of type 1 diabetes. We implemented two published polygenic scores for type 1 diabetes (developed in individuals of European or African ancestry). We assessed the classification algorithm before and after incorporating polygenic scores. RESULTS The automated algorithm was more likely to incorrectly assign a diagnosis of type 1 diabetes in self-reported non-White individuals than in self-reported White individuals (odds ratio 3.45; 95% CI 1.54-7.69; P = 0.0026). After incorporating polygenic scores into the MGB Biobank, the positive predictive value of the type 1 diabetes algorithm increased from 70 to 97% for self-reported White individuals (meaning that 97% of those predicted to have type 1 diabetes indeed had type 1 diabetes) and from 53 to 100% for self-reported non-White individuals. Similar results were found in BioMe. CONCLUSIONS Automated phenotyping algorithms may exacerbate health disparities because of an increased risk of misclassification of individuals from underrepresented populations. Polygenic scores may be used to improve the performance of phenotyping algorithms and potentially reduce this disparity.
Collapse
Affiliation(s)
- Aaron J. Deutsch
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| | - Lauren Stalbow
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Timothy D. Majarian
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Josep M. Mercader
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| | - Alisa K. Manning
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
- Clinical and Translational Epidemiology Unit, Mongan Institute, Massachusetts General Hospital, Boston, MA
| | - Jose C. Florez
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| | - Ruth J.F. Loos
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Miriam S. Udler
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| |
Collapse
|
21
|
Roy S, Bruehl S, Feng X, Shotwell MS, Van De Ven T, Shaw AD, Kertai MD. Developing a risk stratification tool for predicting opioid-related respiratory depression after non-cardiac surgery: a retrospective study. BMJ Open 2022; 12:e064089. [PMID: 36219738 PMCID: PMC9445779 DOI: 10.1136/bmjopen-2022-064089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
OBJECTIVES Accurately assessing the probability of significant respiratory depression following opioid administration can potentially enhance perioperative risk assessment and pain management. We developed and validated a risk prediction tool to estimate the probability of significant respiratory depression (indexed by naloxone administration) in patients undergoing noncardiac surgery. DESIGN Retrospective cohort study. SETTING Single academic centre. PARTICIPANTS We studied n=63 084 patients (mean age 47.1±18.2 years; 50% men) who underwent emergency or elective non-cardiac surgery between 1 January 2007 and 30 October 2017. INTERVENTIONS A derivation subsample reflecting two-thirds of available patients (n=42 082) was randomly selected for model development, and associations were identified between predictor variables and naloxone administration occurring within 5 days following surgery. The resulting probability model for predicting naloxone administration was then cross-validated in a separate validation cohort reflecting the remaining one-third of patients (n=21 002). RESULTS The rate of naloxone administration was identical in the derivation (n=2720 (6.5%)) and validation (n=1360 (6.5%)) cohorts. The risk prediction model identified female sex (OR: 3.01; 95% CI: 2.73 to 3.32), high-risk surgical procedures (OR: 4.16; 95% CI: 3.78 to 4.58), history of drug abuse (OR: 1.81; 95% CI: 1.52 to 2.16) and any opioids being administered on a scheduled rather than as-needed basis (OR: 8.31; 95% CI: 7.26 to 9.51) as risk factors for naloxone administration. Advanced age (OR: 0.971; 95% CI: 0.968 to 0.973), opioids administered via patient-controlled analgesia pump (OR: 0.55; 95% CI: 0.49 to 0.62) and any scheduled non-opioids (OR: 0.63; 95% CI: 0.58 to 0.69) were associated with decreased risk of naloxone administration. An overall risk prediction model incorporating the common clinically available variables above displayed excellent discriminative ability in both the derivation and validation cohorts (c-index=0.820 and 0.814, respectively). CONCLUSION Our cross-validated clinical predictive model accurately estimates the risk of serious opioid-related respiratory depression requiring naloxone administration in postoperative patients.
Collapse
Affiliation(s)
- Sounak Roy
- Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Stephen Bruehl
- Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Xiaoke Feng
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Matthew S Shotwell
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Thomas Van De Ven
- Department of Anesthesiology, Duke University Medical Center, Durham, North Carolina, USA
| | - Andrew D Shaw
- Department of Intensive Care and Resuscitation, Cleveland Clinic, Cleveland, Ohio, USA
| | - Miklos D Kertai
- Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
22
|
Avery CL, Howard AG, Ballou AF, Buchanan VL, Collins JM, Downie CG, Engel SM, Graff M, Highland HM, Lee MP, Lilly AG, Lu K, Rager JE, Staley BS, North KE, Gordon-Larsen P. Strengthening Causal Inference in Exposomics Research: Application of Genetic Data and Methods. ENVIRONMENTAL HEALTH PERSPECTIVES 2022; 130:55001. [PMID: 35533073 PMCID: PMC9084332 DOI: 10.1289/ehp9098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Advances in technologies to measure a broad set of exposures have led to a range of exposome research efforts. Yet, these efforts have insufficiently integrated methods that incorporate genetic data to strengthen causal inference, despite evidence that many exposome-associated phenotypes are heritable. Objective: We demonstrate how integration of methods and study designs that incorporate genetic data can strengthen causal inference in exposomics research by helping address six challenges: reverse causation and unmeasured confounding, comprehensive examination of phenotypic effects, low efficiency, replication, multilevel data integration, and characterization of tissue-specific effects. Examples are drawn from studies of biomarkers and health behaviors, exposure domains where the causal inference methods we describe are most often applied. Discussion: Technological, computational, and statistical advances in genotyping, imputation, and analysis, combined with broad data sharing and cross-study collaborations, offer multiple opportunities to strengthen causal inference in exposomics research. Full application of these opportunities will require an expanded understanding of genetic variants that predict exposome phenotypes as well as an appreciation that the utility of genetic variants for causal inference will vary by exposure and may depend on large sample sizes. However, several of these challenges can be addressed through international scientific collaborations that prioritize data sharing. Ultimately, we anticipate that efforts to better integrate methods that incorporate genetic data will extend the reach of exposomics research by helping address the challenges of comprehensively measuring the exposome and its health effects across studies, the life course, and in varied contexts and diverse populations. https://doi.org/10.1289/EHP9098.
Collapse
Affiliation(s)
- Christy L Avery
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Annie Green Howard
- Department of Biostatistics, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Anna F Ballou
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Victoria L Buchanan
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Jason M Collins
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Carolina G Downie
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Stephanie M Engel
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Mariaelisa Graff
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Heather M Highland
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Moa P Lee
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Adam G Lilly
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Sociology, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kun Lu
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Julia E Rager
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Brooke S Staley
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kari E North
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Penny Gordon-Larsen
- Department of Nutrition, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
23
|
Almowil Z, Zhou SM, Brophy S, Croxall J. Concept Libraries for Repeatable and Reusable Research: Qualitative Study Exploring the Needs of Users. JMIR Hum Factors 2022; 9:e31021. [PMID: 35289755 PMCID: PMC8965669 DOI: 10.2196/31021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 11/17/2021] [Accepted: 12/05/2021] [Indexed: 12/05/2022] Open
Abstract
Background Big data research in the field of health sciences is hindered by a lack of agreement on how to identify and define different conditions and their medications. This means that researchers and health professionals often have different phenotype definitions for the same condition. This lack of agreement makes it difficult to compare different study findings and hinders the ability to conduct repeatable and reusable research. Objective This study aims to examine the requirements of various users, such as researchers, clinicians, machine learning experts, and managers, in the development of a data portal for phenotypes (a concept library). Methods This was a qualitative study using interviews and focus group discussion. One-to-one interviews were conducted with researchers, clinicians, machine learning experts, and senior research managers in health data science (N=6) to explore their specific needs in the development of a concept library. In addition, a focus group discussion with researchers (N=14) working with the Secured Anonymized Information Linkage databank, a national eHealth data linkage infrastructure, was held to perform a SWOT (strengths, weaknesses, opportunities, and threats) analysis for the phenotyping system and the proposed concept library. The interviews and focus group discussion were transcribed verbatim, and 2 thematic analyses were performed. Results Most of the participants thought that the prototype concept library would be a very helpful resource for conducting repeatable research, but they specified that many requirements are needed before its development. Although all the participants stated that they were aware of some existing concept libraries, most of them expressed negative perceptions about them. The participants mentioned several facilitators that would stimulate them to share their work and reuse the work of others, and they pointed out several barriers that could inhibit them from sharing their work and reusing the work of others. The participants suggested some developments that they would like to see to improve reproducible research output using routine data. Conclusions The study indicated that most interviewees valued a concept library for phenotypes. However, only half of the participants felt that they would contribute by providing definitions for the concept library, and they reported many barriers regarding sharing their work on a publicly accessible platform. Analysis of interviews and the focus group discussion revealed that different stakeholders have different requirements, facilitators, barriers, and concerns about a prototype concept library.
Collapse
Affiliation(s)
- Zahra Almowil
- Data Science Building, Medical School, Swansea University, Swansea, Wales, United Kingdom
| | - Shang-Ming Zhou
- Centre For Health Technology, Faculty of Health, University of Plymouth, Plymouth, United Kingdom
| | - Sinead Brophy
- Data Science Building, Medical School, Swansea University, Swansea, Wales, United Kingdom
| | - Jodie Croxall
- Data Science Building, Medical School, Swansea University, Swansea, Wales, United Kingdom
| |
Collapse
|
24
|
Cereceda K, Jorquera R, Villarroel-Espíndola F. Advances in mass cytometry and its applicability to digital pathology in clinical-translational cancer research. ADVANCES IN LABORATORY MEDICINE 2022; 3:5-29. [PMID: 37359436 PMCID: PMC10197474 DOI: 10.1515/almed-2021-0075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 07/16/2021] [Indexed: 06/28/2023]
Abstract
The development and subsequent adaptation of mass cytometry for the histological analysis of tissue sections has allowed the simultaneous spatial characterization of multiple components. This is useful to find the correlation between the genotypic and phenotypic profile of tumor cells and their environment in clinical-translational studies. In this revision, we provide an overview of the most relevant hallmarks in the development, implementation and application of multiplexed imaging in the study of cancer and other conditions. A special focus is placed on studies based on imaging mass cytometry (IMC) and multiplexed ion beam imaging (MIBI). The purpose of this review is to help our readers become familiar with the verification techniques employed on this tool and outline the multiple applications reported in the literature. This review will also provide guidance on the use of IMC or MIBI in any field of biomedical research.
Collapse
Affiliation(s)
- Karina Cereceda
- Laboratorio de Medicina Traslacional, Instituto Oncológico Fundación Arturo López Pérez, Santiago, Chile
| | - Roddy Jorquera
- Laboratorio de Medicina Traslacional, Instituto Oncológico Fundación Arturo López Pérez, Santiago, Chile
| | - Franz Villarroel-Espíndola
- Laboratorio de Medicina Traslacional, Instituto Oncológico Fundación Arturo López Pérez, Santiago, Chile
| |
Collapse
|
25
|
Seedahmed MI, Mogilnicka I, Zeng S, Luo G, Whooley MA, McCulloch CE, Koth L, Arjomandi M. Performance of a Computational Phenotyping Algorithm for Sarcoidosis Using Diagnostic Codes in Electronic Medical Records: A Pilot Study from Two Veterans Affairs Medical Centers. JMIR Form Res 2022; 6:e31615. [PMID: 35081036 PMCID: PMC8928044 DOI: 10.2196/31615] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 01/24/2022] [Accepted: 01/24/2022] [Indexed: 11/29/2022] Open
Abstract
Background Electronic medical records (EMRs) offer the promise of computationally identifying sarcoidosis cases. However, the accuracy of identifying these cases in the EMR is unknown. Objective The aim of this study is to determine the statistical performance of using the International Classification of Diseases (ICD) diagnostic codes to identify patients with sarcoidosis in the EMR. Methods We used the ICD diagnostic codes to identify sarcoidosis cases by searching the EMRs of the San Francisco and Palo Alto Veterans Affairs medical centers and randomly selecting 200 patients. To improve the diagnostic accuracy of the computational algorithm in cases where histopathological data are unavailable, we developed an index of suspicion to identify cases with a high index of suspicion for sarcoidosis (confirmed and probable) based on clinical and radiographic features alone using the American Thoracic Society practice guideline. Through medical record review, we determined the positive predictive value (PPV) of diagnosing sarcoidosis by two computational methods: using ICD codes alone and using ICD codes plus the high index of suspicion. Results Among the 200 patients, 158 (79%) had a high index of suspicion for sarcoidosis. Of these 158 patients, 142 (89.9%) had documentation of nonnecrotizing granuloma, confirming biopsy-proven sarcoidosis. The PPV of using ICD codes alone was 79% (95% CI 78.6%-80.5%) for identifying sarcoidosis cases and 71% (95% CI 64.7%-77.3%) for identifying histopathologically confirmed sarcoidosis in the EMRs. The inclusion of the generated high index of suspicion to identify confirmed sarcoidosis cases increased the PPV significantly to 100% (95% CI 96.5%-100%). Histopathology documentation alone was 90% sensitive compared with high index of suspicion. Conclusions ICD codes are reasonable classifiers for identifying sarcoidosis cases within EMRs with a PPV of 79%. Using a computational algorithm to capture index of suspicion data elements could significantly improve the case-identification accuracy.
Collapse
Affiliation(s)
- Mohamed Ismail Seedahmed
- Division of Pulmonary, Critical care, allergy and Immunology, and Sleep., Department of Medicine, University of California San Francisco, 513 Parnassus Ave.HSE 1314, Box 0111, San Francisco, US.,San Francisco Veterans Affairs Medical Center, 4150 Clement Street, San Francisco, US
| | - Izabella Mogilnicka
- Department of Experimental Physiology and Pathophysiology, Laboratory of the Centre for Preclinical Research, Medical University of Warsaw, Warsaw, PL
| | - Siyang Zeng
- San Francisco Veterans Affairs Medical Center, 4150 Clement Street, San Francisco, US.,Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, US
| | - Gang Luo
- Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, US
| | - Mary A Whooley
- San Francisco Veterans Affairs Medical Center, 4150 Clement Street, San Francisco, US.,Department of Medicine, University of California San Francisco, San Francisco, US.,Measurement Science Quality Enhancement Research Initiative, San Francisco Veterans Affairs Healthcare System, San Francisco, US
| | - Charles E McCulloch
- Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, US
| | - Laura Koth
- Division of Pulmonary, Critical care, allergy and Immunology, and Sleep., Department of Medicine, University of California San Francisco, 513 Parnassus Ave.HSE 1314, Box 0111, San Francisco, US
| | - Mehrdad Arjomandi
- San Francisco Veterans Affairs Medical Center, 4150 Clement Street, San Francisco, US.,Division of Pulmonary, Critical care, allergy and Immunology, and Sleep., Department of Medicine, University of California San Francisco, 513 Parnassus Ave.HSE 1314, Box 0111, San Francisco, US
| |
Collapse
|
26
|
Sulieman L, Cronin RM, Carroll RJ, Natarajan K, Marginean K, Mapes B, Roden D, Harris P, Ramirez A. OUP accepted manuscript. J Am Med Inform Assoc 2022; 29:1131-1141. [PMID: 35396991 PMCID: PMC9196700 DOI: 10.1093/jamia/ocac046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 02/18/2022] [Accepted: 03/23/2022] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE A participant's medical history is important in clinical research and can be captured from electronic health records (EHRs) and self-reported surveys. Both can be incomplete, EHR due to documentation gaps or lack of interoperability and surveys due to recall bias or limited health literacy. This analysis compares medical history collected in the All of Us Research Program through both surveys and EHRs. MATERIALS AND METHODS The All of Us medical history survey includes self-report questionnaire that asks about diagnoses to over 150 medical conditions organized into 12 disease categories. In each category, we identified the 3 most and least frequent self-reported diagnoses and retrieved their analogues from EHRs. We calculated agreement scores and extracted participant demographic characteristics for each comparison set. RESULTS The 4th All of Us dataset release includes data from 314 994 participants; 28.3% of whom completed medical history surveys, and 65.5% of whom had EHR data. Hearing and vision category within the survey had the highest number of responses, but the second lowest positive agreement with the EHR (0.21). The Infectious disease category had the lowest positive agreement (0.12). Cancer conditions had the highest positive agreement (0.45) between the 2 data sources. DISCUSSION AND CONCLUSION Our study quantified the agreement of medical history between 2 sources-EHRs and self-reported surveys. Conditions that are usually undocumented in EHRs had low agreement scores, demonstrating that survey data can supplement EHR data. Disagreement between EHR and survey can help identify possible missing records and guide researchers to adjust for biases.
Collapse
Affiliation(s)
- Lina Sulieman
- Corresponding Author: Lina Sulieman, PhD, Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Ave, Suite 1475, Nashville, TN 37202, USA;
| | - Robert M Cronin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Medicine, The Ohio State University, Columbus, Ohio, USA
| | - Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Karthik Natarajan
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Kayla Marginean
- Vanderbilt Institute of Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Brandy Mapes
- Vanderbilt Institute of Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Dan Roden
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Paul Harris
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Vanderbilt Institute of Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Andrea Ramirez
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Office of data and analytics, All of Us Research Program, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
27
|
Barajas R, Hair B, Lai G, Rotunno M, Shams-White MM, Gillanders EM, Mechanic LE. Facilitating cancer systems epidemiology research. PLoS One 2022; 16:e0255328. [PMID: 34972102 PMCID: PMC8719747 DOI: 10.1371/journal.pone.0255328] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Systems epidemiology offers a more comprehensive and holistic approach to studies of cancer in populations by considering high dimensionality measures from multiple domains, assessing the inter-relationships among risk factors, and considering changes over time. These approaches offer a framework to account for the complexity of cancer and contribute to a broader understanding of the disease. Therefore, NCI sponsored a workshop in February 2019 to facilitate discussion about the opportunities and challenges of the application of systems epidemiology approaches for cancer research. Eight key themes emerged from the discussion: transdisciplinary collaboration and a problem-based approach; methods and modeling considerations; interpretation, validation, and evaluation of models; data needs and opportunities; sharing of data and models; enhanced training practices; dissemination of systems models; and building a systems epidemiology community. This manuscript summarizes these themes, highlights opportunities for cancer systems epidemiology research, outlines ways to foster this research area, and introduces a collection of papers, "Cancer System Epidemiology Insights and Future Opportunities" that highlight findings based on systems epidemiology approaches.
Collapse
Affiliation(s)
- Rolando Barajas
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences (DCCPS), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Brionna Hair
- DCCPS, NCI, NIH, Bethesda, Maryland, United States of America
| | - Gabriel Lai
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences (DCCPS), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Melissa Rotunno
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences (DCCPS), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Marissa M. Shams-White
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences (DCCPS), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Elizabeth M. Gillanders
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences (DCCPS), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Leah E. Mechanic
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences (DCCPS), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
28
|
Association of step counts over time with the risk of chronic disease in the All of Us Research Program. Nat Med 2022; 28:2301-2308. [PMID: 36216933 PMCID: PMC9671804 DOI: 10.1038/s41591-022-02012-w] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 08/15/2022] [Indexed: 01/14/2023]
Abstract
The association between physical activity and human disease has not been examined using commercial devices linked to electronic health records. Using the electronic health records data from the All of Us Research Program, we show that step count volumes as captured by participants' own Fitbit devices were associated with risk of chronic disease across the entire human phenome. Of the 6,042 participants included in the study, 73% were female, 84% were white and 71% had a college degree, and participants had a median age of 56.7 (interquartile range 41.5-67.6) years and body mass index of 28.1 (24.3-32.9) kg m-2. Participants walked a median of 7,731.3 (5,866.8-9,826.8) steps per day over the median activity monitoring period of 4.0 (2.2-5.6) years with a total of 5.9 million person-days of monitoring. The relationship between steps per day and incident disease was inverse and linear for obesity (n = 368), sleep apnea (n = 348), gastroesophageal reflux disease (n = 432) and major depressive disorder (n = 467), with values above 8,200 daily steps associated with protection from incident disease. The relationships with incident diabetes (n = 156) and hypertension (n = 482) were nonlinear with no further risk reduction above 8,000-9,000 steps. Although validation in a more diverse sample is needed, these findings provide a real-world evidence-base for clinical guidance regarding activity levels that are necessary to reduce disease risk.
Collapse
|
29
|
Greer ML, Davis K, Stack BC. Machine learning can identify patients at risk of hyperparathyroidism without known calcium and intact parathyroid hormone. Head Neck 2021; 44:817-822. [PMID: 34953008 DOI: 10.1002/hed.26970] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/01/2021] [Accepted: 12/16/2021] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND To prove the concept of diagnosing primary hyperparathyroidism (pHPT) without calcium and parathyroid hormone (PTH) values and identifying potential risk factors for pHPT. METHODS Data were extracted from the clinical data warehouse (CDW) at the University of Arkansas for Medical Sciences (UAMS) Epic EHR (2014-2019). RESULTS 1737 patients with over 185 000 rows of clinical data were provided in a relational structure and processed/flattened to facilitate modeling. Phenotype elements were identified for pHPT without advance knowledge of calcium and PTH levels. The area under the curve (AUC) for the prediction of pHPT using our model was 0.86 with sensitivity and specificity of 0.8953 and 0.6686, respectively, using a 0.45 probability threshold. CONCLUSION Primary hyperparathyroidism was predicted from a dataset excluding calcium and PTH data with 86% accuracy. This approach needs to be validated/refined on larger samples of data and plans are in place to do this with other regional/national datasets.
Collapse
Affiliation(s)
- Melody L Greer
- Department of Health Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - Kyle Davis
- Department of Otolaryngology - Head and Neck Surgery, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - Brendan C Stack
- Department of Otolaryngology - Head and Neck Surgery, Southern Illinois University School of Medicine, Springfield, Illinois, USA
| |
Collapse
|
30
|
Wyatt B, Perumalswami PV, Mageras A, Miller M, Harty A, Ma N, Bowman CA, Collado F, Jeon J, Paulino L, Dinani A, Dieterich D, Li L, Vandromme M, Branch AD. A Digital Case-Finding Algorithm for Diagnosed but Untreated Hepatitis C: A Tool for Increasing Linkage to Treatment and Cure. Hepatology 2021; 74:2974-2987. [PMID: 34333777 PMCID: PMC9299620 DOI: 10.1002/hep.32086] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 06/29/2021] [Accepted: 07/22/2021] [Indexed: 12/20/2022]
Abstract
BACKGROUND AND AIMS Although chronic HCV infection increases mortality, thousands of patients remain diagnosed-but-untreated (DBU). We aimed to (1) develop a DBU phenotyping algorithm, (2) use it to facilitate case finding and linkage to care, and (3) identify barriers to successful treatment. APPROACH AND RESULTS We developed a phenotyping algorithm using Java and SQL and applied it to ~2.5 million EPIC electronic medical records (EMRs; data entered January 2003 to December 2017). Approximately 72,000 EMRs contained an HCV International Classification of Diseases code and/or diagnostic test. The algorithm classified 10,614 cases as DBU (HCV-RNA positive and alive). Its positive and negative predictive values were 88% and 97%, respectively, as determined by manual review of 500 EMRs randomly selected from the ~72,000. Navigators reviewed the charts of 6,187 algorithm-defined DBUs and they attempted to contact potential treatment candidates by phone. By June 2020, 30% (n = 1,862) had completed an HCV-related appointment. Outcomes analysis revealed that DBU patients enrolled in our care coordination program were more likely to complete treatment (72% [n = 219] vs. 54% [n = 256]; P < 0.001) and to have a verified sustained virological response (67% vs. 46%; P < 0.001) than other patients. Forty-eight percent (n = 2,992) of DBU patients could not be reached by phone, which was a major barrier to engagement. Nearly half of these patients had Fibrosis-4 scores ≥ 2.67, indicating significant fibrosis. Multivariable logistic regression showed that DBUs who could not be contacted were less likely to have private insurance than those who could (18% vs. 50%; P < 0.001). CONCLUSIONS The digital DBU case-finding algorithm efficiently identified potential HCV treatment candidates, freeing resources for navigation and coordination. The algorithm is portable and accelerated HCV elimination when incorporated in our comprehensive program.
Collapse
Affiliation(s)
- Brooke Wyatt
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Ponni V. Perumalswami
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY,Division of Gastroenterology and HepatologyUniversity of MichiganAnn ArborMI,Gastroenterology SectionVeterans AffairsAnn Arbor Healthcare SystemAnn ArborMI
| | - Anna Mageras
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Mark Miller
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Alyson Harty
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Ning Ma
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Chip A. Bowman
- Department of MedicineIcahn School of Medicine Mount SinaiNew YorkNY
| | - Francina Collado
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Jihae Jeon
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Lismeiry Paulino
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Amreen Dinani
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Douglas Dieterich
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Li Li
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Maxence Vandromme
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| | - Andrea D. Branch
- Division of Liver DiseasesIcahn School of Medicine Mount SinaiNew YorkNY
| |
Collapse
|
31
|
Daniels H, Jones KH, Heys S, Ford DV. Exploring the Use of Genomic and Routinely Collected Data: Narrative Literature Review and Interview Study. J Med Internet Res 2021; 23:e15739. [PMID: 34559060 PMCID: PMC8501405 DOI: 10.2196/15739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 10/01/2020] [Accepted: 07/15/2021] [Indexed: 11/13/2022] Open
Abstract
Background Advancing the use of genomic data with routinely collected health data holds great promise for health care and research. Increasing the use of these data is a high priority to understand and address the causes of disease. Objective This study aims to provide an outline of the use of genomic data alongside routinely collected data in health research to date. As this field prepares to move forward, it is important to take stock of the current state of play in order to highlight new avenues for development, identify challenges, and ensure that adequate data governance models are in place for safe and socially acceptable progress. Methods We conducted a literature review to draw information from past studies that have used genomic and routinely collected data and conducted interviews with individuals who use these data for health research. We collected data on the following: the rationale of using genomic data in conjunction with routinely collected data, types of genomic and routinely collected data used, data sources, project approvals, governance and access models, and challenges encountered. Results The main purpose of using genomic and routinely collected data was to conduct genome-wide and phenome-wide association studies. Routine data sources included electronic health records, disease and death registries, health insurance systems, and deprivation indices. The types of genomic data included polygenic risk scores, single nucleotide polymorphisms, and measures of genetic activity, and biobanks generally provided these data. Although the literature search showed that biobanks released data to researchers, the case studies revealed a growing tendency for use within a data safe haven. Challenges of working with these data revolved around data collection, data storage, technical, and data privacy issues. Conclusions Using genomic and routinely collected data holds great promise for progressing health research. Several challenges are involved, particularly in terms of privacy. Overcoming these barriers will ensure that the use of these data to progress health research can be exploited to its full potential.
Collapse
Affiliation(s)
- Helen Daniels
- Population Data Science, Swansea University, Swansea, United Kingdom
| | | | - Sharon Heys
- Population Data Science, Swansea University, Swansea, United Kingdom
| | | |
Collapse
|
32
|
Almowil ZA, Zhou SM, Brophy S. Concept libraries for automatic electronic health record based phenotyping: A review. Int J Popul Data Sci 2021; 6:1362. [PMID: 34189274 PMCID: PMC8210840 DOI: 10.23889/ijpds.v5i1.1362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Introduction Electronic health records (EHR) are linked together to examine disease history and to undertake research into the causes and outcomes of disease. However, the process of constructing algorithms for phenotyping (e.g., identifying disease characteristics) or health characteristics (e.g., smoker) is very time consuming and resource costly. In addition, results can vary greatly between researchers. Reusing or building on algorithms that others have created is a compelling solution to these problems. However, sharing algorithms is not a common practice and many published studies do not detail the clinical code lists used by the researchers in the disease/characteristic definition. To address these challenges, a number of centres across the world have developed health data portals which contain concept libraries (e.g., algorithms for defining concepts such as disease and characteristics) in order to facilitate disease phenotyping and health studies. Objectives This study aims to review the literature of existing concept libraries, examine their utilities, identify the current gaps, and suggest future developments. Methods The five-stage framework of Arksey and O'Malley was used for the literature search. This approach included defining the research questions, identifying relevant studies through literature review, selecting eligible studies, charting and extracting data, and summarising and reporting the findings. Results This review identified seven publicly accessible Electronic Health data concept libraries which were developed in different countries including UK, USA, and Canada. The concept libraries (n = 7) investigated were either general libraries that hold phenotypes of multiple specialties (n = 4) or specialized libraries that manage only certain specialities such as rare diseases (n = 3). There were some clear differences between the general libraries such as archiving data from different electronic sources, and using a range of different types of coding systems. However, they share some clear similarities such as enabling users to upload their own code lists, and allowing users to use/download the publicly accessible code. In addition, there were some differences between the specialized libraries such as difference in ability to search, and if it was possible to use different searching queries such as simple or complex searches. Conversely, there were some similarities between the specialized libraries such as enabling users to upload their own concepts into the libraries and to show where they were published, which facilitates assessing the validity of the concepts. All the specialized libraries aimed to encourage the reuse of research methods such as lists of clinical code and/or metadata. Conclusion The seven libraries identified have been developed independently and appear to replicate similar concepts but in different ways. Collaboration between similar libraries would greatly facilitate the use of these libraries for the user. The process of building code lists takes time and effort. Access to existing code lists increases consistency and accuracy of definitions across studies. Concept library developers should collaborate with each other to raise awareness of their existence and of their various functions, which could increase users’ contributions to those libraries and promote their wide-ranging adoption.
Collapse
Affiliation(s)
| | - Shang-Ming Zhou
- Centre for Health Technology, Faculty of Health, University of Plymouth, Plymouth, PL4 8AA, UK
| | | |
Collapse
|
33
|
Tam CS, Gullick J, Saavedra A, Vernon ST, Figtree GA, Chow CK, Cretikos M, Morris RW, William M, Morris J, Brieger D. Combining structured and unstructured data in EMRs to create clinically-defined EMR-derived cohorts. BMC Med Inform Decis Mak 2021; 21:91. [PMID: 33685456 PMCID: PMC7938556 DOI: 10.1186/s12911-021-01441-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 02/15/2021] [Indexed: 11/29/2022] Open
Abstract
Background There have been few studies describing how production EMR systems can be systematically queried to identify clinically-defined populations and limited studies utilising free-text in this process. The aim of this study is to provide a generalisable methodology for constructing clinically-defined EMR-derived patient cohorts using structured and unstructured data in EMRs. Methods Patients with possible acute coronary syndrome (ACS) were used as an exemplar. Cardiologists defined clinical criteria for patients presenting with possible ACS. These were mapped to data tables within the production EMR system creating seven inclusion criteria comprised of structured data fields (orders and investigations, procedures, scanned electrocardiogram (ECG) images, and diagnostic codes) and unstructured clinical documentation. Data were extracted from two local health districts (LHD) in Sydney, Australia. Outcome measures included examination of the relative contribution of individual inclusion criteria to the identification of eligible encounters, comparisons between inclusion criterion and evaluation of consistency of data extracts across years and LHDs. Results Among 802,742 encounters in a 5 year dataset (1/1/13–30/12/17), the presence of an ECG image (54.8% of encounters) and symptoms and keywords in clinical documentation (41.4–64.0%) were used most often to identify presentations of possible ACS. Orders and investigations (27.3%) and procedures (1.4%), were less often present for identified presentations. Relevant ICD-10/SNOMED CT codes were present for 3.7% of identified encounters. Similar trends were seen when the two LHDs were examined separately, and across years. Conclusions Clinically-defined EMR-derived cohorts combining structured and unstructured data during cohort identification is a necessary prerequisite for critical validation work required for development of real-time clinical decision support and learning health systems.
Collapse
Affiliation(s)
- Charmaine S Tam
- Centre for Translational Data Science, The University of Sydney, Sydney, Australia. .,Northern Clinical School, The University of Sydney, Sydney, Australia.
| | - Janice Gullick
- Susan Wakil School of Nursing and Midwifery, The University of Sydney, Sydney, Australia
| | - Aldo Saavedra
- Centre for Translational Data Science, The University of Sydney, Sydney, Australia.,Faculty of Health Sciences, The University of Sydney, Sydney, Australia
| | - Stephen T Vernon
- Cardiothoracic and Vascular Health, Kolling Institute of Medical Research and Department of Cardiology, Royal North Shore Hospital, Northern Sydney Local Health District, Sydney, Australia
| | - Gemma A Figtree
- Northern Clinical School, The University of Sydney, Sydney, Australia.,Cardiothoracic and Vascular Health, Kolling Institute of Medical Research and Department of Cardiology, Royal North Shore Hospital, Northern Sydney Local Health District, Sydney, Australia
| | - Clara K Chow
- Westmead Applied Research Centre, The University of Sydney, Sydney, Australia.,Department of Cardiology, Westmead Hospital, Sydney, Australia
| | - Michelle Cretikos
- Centre for Population Health, NSW Ministry of Health, Sydney, Australia
| | - Richard W Morris
- Centre for Translational Data Science, The University of Sydney, Sydney, Australia.,Northern Clinical School, The University of Sydney, Sydney, Australia
| | - Maged William
- Department of Cardiology, Central Coast Local Health District and University of Newcastle, Sydney, Australia
| | - Jonathan Morris
- Northern Clinical School, The University of Sydney, Sydney, Australia.,Clinical and Population Perinatal Health, Northern Sydney Local Health District, Sydney, Australia
| | - David Brieger
- Department of Cardiology, Concord Hospital, Sydney, Australia
| |
Collapse
|
34
|
Walters CE, Nitin R, Margulis K, Boorom O, Gustavson DE, Bush CT, Davis LK, Below JE, Cox NJ, Camarata SM, Gordon RL. Automated Phenotyping Tool for Identifying Developmental Language Disorder Cases in Health Systems Data (APT-DLD): A New Research Algorithm for Deployment in Large-Scale Electronic Health Record Systems. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:3019-3035. [PMID: 32791019 PMCID: PMC7890229 DOI: 10.1044/2020_jslhr-19-00397] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 04/23/2020] [Accepted: 05/19/2020] [Indexed: 05/13/2023]
Abstract
Purpose Data mining algorithms using electronic health records (EHRs) are useful in large-scale population-wide studies to classify etiology and comorbidities (Casey et al., 2016). Here, we apply this approach to developmental language disorder (DLD), a prevalent communication disorder whose risk factors and epidemiology remain largely undiscovered. Method We first created a reliable system for manually identifying DLD in EHRs based on speech-language pathologist (SLP) diagnostic expertise. We then developed and validated an automated algorithmic procedure, called, Automated Phenotyping Tool for identifying DLD cases in health systems data (APT-DLD), that classifies a DLD status for patients within EHRs on the basis of ICD (International Statistical Classification of Diseases and Related Health Problems) codes. APT-DLD was validated in a discovery sample (N = 973) using expert SLP manual phenotype coding as a gold-standard comparison and then applied and further validated in a replication sample of N = 13,652 EHRs. Results In the discovery sample, the APT-DLD algorithm correctly classified 98% (concordance) of DLD cases in concordance with manually coded records in the training set, indicating that APT-DLD successfully mimics a comprehensive chart review. The output of APT-DLD was also validated in relation to independently conducted SLP clinician coding in a subset of records, with a positive predictive value of 95% of cases correctly classified as DLD. We also applied APT-DLD to the replication sample, where it achieved a positive predictive value of 90% in relation to SLP clinician classification of DLD. Conclusions APT-DLD is a reliable, valid, and scalable tool for identifying DLD cohorts in EHRs. This new method has promising public health implications for future large-scale epidemiological investigations of DLD and may inform EHR data mining algorithms for other communication disorders. Supplemental Material https://doi.org/10.23641/asha.12753578.
Collapse
Affiliation(s)
- Courtney E. Walters
- Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN
- Neuroscience Program, College of Arts and Science, Vanderbilt University, Nashville, TN
| | - Rachana Nitin
- Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN
| | - Katherine Margulis
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
- Kennedy Krieger Institute, Baltimore, MD
| | - Olivia Boorom
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Daniel E. Gustavson
- Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
| | - Catherine T. Bush
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Lea K. Davis
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Jennifer E. Below
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Nancy J. Cox
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Stephen M. Camarata
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Reyna L. Gordon
- Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
35
|
Abstract
OBJECTIVES Clinical Research Informatics (CRI) declares its scope in its name, but its content, both in terms of the clinical research it supports-and sometimes initiates-and the methods it has developed over time, reach much further than the name suggests. The goal of this review is to celebrate the extraordinary diversity of activity and of results, not as a prize-giving pageant, but in recognition of the field, the community that both serves and is sustained by it, and of its interdisciplinarity and its international dimension. METHODS Beyond personal awareness of a range of work commensurate with the author's own research, it is clear that, even with a thorough literature search, a comprehensive review is impossible. Moreover, the field has grown and subdivided to an extent that makes it very hard for one individual to be familiar with every branch or with more than a few branches in any depth. A literature survey was conducted that focused on informatics-related terms in the general biomedical and healthcare literature, and specific concerns ("artificial intelligence", "data models", "analytics", etc.) in the biomedical informatics (BMI) literature. In addition to a selection from the results from these searches, suggestive references within them were also considered. RESULTS The substantive sections of the paper-Artificial Intelligence, Machine Learning, and "Big Data" Analytics; Common Data Models, Data Quality, and Standards; Phenotyping and Cohort Discovery; Privacy: Deidentification, Distributed Computation, Blockchain; Causal Inference and Real-World Evidence-provide broad coverage of these active research areas, with, no doubt, a bias towards this reviewer's interests and preferences, landing on a number of papers that stood out in one way or another, or, alternatively, exemplified a particular line of work. CONCLUSIONS CRI is thriving, not only in the familiar major centers of research, but more widely, throughout the world. This is not to pretend that the distribution is uniform, but to highlight the potential for this domain to play a prominent role in supporting progress in medicine, healthcare, and wellbeing everywhere. We conclude with the observation that CRI and its practitioners would make apt stewards of the new medical knowledge that their methods will bring forward.
Collapse
Affiliation(s)
- Anthony Solomonides
- Outcomes Research Network, Research Institute, NorthShore University HealthSystem, Evanston, IL, USA
| |
Collapse
|
36
|
Khalid SI, Omotosho PA, Spagnoli A, Torquati A. Association of Bariatric Surgery With Risk of Fracture in Patients With Severe Obesity. JAMA Netw Open 2020; 3:e207419. [PMID: 32520360 PMCID: PMC7287567 DOI: 10.1001/jamanetworkopen.2020.7419] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
IMPORTANCE Given the complex relationship between body mass index, body composition, and bone density and the correlative nature of the studies that have established the prevailing notion that higher body mass indices may be protective against osteopenia and osteoporosis and, therefore, fracture, the absolute risk of fracture in patients with severe obesity who undergo either Roux-en-Y gastric bypass (RYGB) or sleeve gastrectomy (SG) compared with those who do not undergo bariatric surgery is unknown. OBJECTIVE To assess the rates of fractures associated with obesity and compare rates between those who do not undergo bariatric surgery, those who undergo RYGB, and those who undergo SG. DESIGN, SETTING, AND PARTICIPANTS In this retrospective multicenter cohort study of Medicare Standard Analytic Files derived from Medicare parts A and B records from January 2004 to December 2014, patients classified as eligible for bariatric surgery using the US Centers of Medicare & Medicaid criteria who either did not undergo bariatric surgery or underwent RYGB or SG were exactly matched in a 1:1 fashion based on their age, sex, Elixhauser Comorbidity Index, hypertension, smoking status, nonalcoholic fatty liver disease, hyperlipidemia, type 2 diabetes, osteoporosis, osteoarthritis, and obstructive sleep apnea status. Data were analyzed from November to December 2019. EXPOSURES RYGB or SG. MAIN OUTCOMES AND MEASURES The primary outcome measured in this study was the odds of fracture overall based on exposure to bariatric surgery. Secondary outcomes included the odds of type of fracture (humerus, radius or ulna, pelvis, hip, vertebrae, and total fractures) based on exposure to bariatric surgery. RESULTS A total of 49 113 patients were included and were equally made up of 16 371 bariatric surgery-eligible patients who did not undergo weight loss surgery, 16 371 patients who had undergone RYGB, and 16 371 patients who had undergone SG. Each group consisted of an equal number of 4109 men (25.1%) and 12 262 women (74.9%) and had an equal distribution of ages, with 11 780 patients (72.0%) 64 years or younger, 4230 (25.8%) aged 65 to 69 years, 346 (2.1%) aged 70 to 74 years, and 15 (0.1%) aged 75 to 79 years. Patients undergoing RYGB were found to have no significant difference in odds of fractures compared with bariatric surgery-eligible patients who did not undergo surgery. Patients undergoing undergone SG were found to have decreased odds of fractures of the humerus (odds ratio [OR], 0.57; 95% CI, 0.45-0.73), radius or ulna (OR, 0.38; 95% CI, 0.25-0.58), hip (OR, 0.49; 95% CI, 0.33-0.74), pelvis (OR, 0.34; 95% CI, 0.18-0.64), vertebrae (OR, 0.60; 95% CI, 0.48-0.74), or fractures in general (OR, 0.53; 95% CI, 0.46-0.62). Compared with patients undergoing SG, patients undergoing RYGB had a significantly greater risk of total fractures (OR, 1.79; 95% CI, 1.55-2.06) and humeral fractures (OR, 1.60; 95% CI, 1.24-2.07). CONCLUSIONS AND RELEVANCE In this cohort study, bariatric surgery was associated with a reduced risk of fracture in bariatric surgery-eligible patients. Sleeve gastrectomy might be the best option for weight loss in patients in which fractures could be a concern, as RYGB may be associated with an increased fracture risk compared with SG.
Collapse
Affiliation(s)
- Syed I. Khalid
- Department of Surgery, Rush University Medical Center, Chicago, Illinois
| | - Philip A. Omotosho
- Department of Surgery, Rush University Medical Center, Chicago, Illinois
| | - Anna Spagnoli
- Department of Surgery, Rush University Medical Center, Chicago, Illinois
| | - Alfonso Torquati
- Department of Surgery, Rush University Medical Center, Chicago, Illinois
| |
Collapse
|
37
|
Wu CS, Luedtke AR, Sadikova E, Tsai HJ, Liao SC, Liu CC, Gau SSF, VanderWeele TJ, Kessler RC. Development and Validation of a Machine Learning Individualized Treatment Rule in First-Episode Schizophrenia. JAMA Netw Open 2020; 3:e1921660. [PMID: 32083693 PMCID: PMC7043195 DOI: 10.1001/jamanetworkopen.2019.21660] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 12/23/2019] [Indexed: 12/31/2022] Open
Abstract
Importance Little guidance exists to date on how to select antipsychotic medications for patients with first-episode schizophrenia. Objective To develop a preliminary individualized treatment rule (ITR) for patients with first-episode schizophrenia. Design, Setting, and Participants This prognostic study obtained data from Taiwan's National Health Insurance Research Database on patients with prescribed antipsychotic medications, ambulatory claims, or discharge diagnoses of a schizophrenic disorder between January 1, 2005, and December 31, 2011. An ITR was developed by applying a targeted minimum loss-based ensemble machine learning method to predict treatment success from baseline clinical and demographic data in a 70% training sample. The model was validated in the remaining 30% of the sample. The probability of treatment success was estimated for each medication for each patient under the model. The analysis was conducted between July 16, 2018, and July 15, 2019. Exposures Fifteen different antipsychotic medications. Main Outcomes and Measures Treatment success was defined as not switching medication and not being hospitalized for 12 months. Results Among the 32 277 patients in the analysis, the mean (SD) age was 36.7 (14.3) years, and 15 752 (48.8%) were male. In the validation sample, the treatment success rate (SE) was 51.7% (1.0%) under the ITR and was 44.5% (0.5%) in the observed population (Z = 7.1; P < .001). The estimated treatment success if all patients were given a prescription for 1 medication was significantly lower for each of the 13 medications than under the ITR (Z = 4.2-16.8; all P < .001). Aripiprazole (3088 [31.9%]) and amisulpride (2920 [30.2%]) were the medications most often recommended by the ITR. Only 1054 patients (10.9%) received ITR-recommended medications. Observed treatment success, although lower than the success under the ITR, was nonetheless significantly higher than if medications had been randomized (44.5% [SE, 0.55%] vs 41.3% [SE, 0.4%]; Z = 6.9; P < .001), although only marginally higher than if medications had been randomized in their observed population proportions (44.5% [SE, 0.5%] vs 43.5% [SE, 0.4%]; Z = 2.2; P = .03]). Conclusions and Relevance These results suggest that an ITR may be associatded with an increase in the treatment success rate among patients with first-episode schizophrenia, but experimental evaluation is needed to confirm this possibility. If confirmed, model refinement that investigates biomarkers, clinical observations, and patient reports as additional predictors in iterative pragmatic trials would be needed before clinical implementation.
Collapse
Affiliation(s)
- Chi-Shin Wu
- Department of Psychiatry, National Taiwan University Hospital & College of Medicine, National Taiwan University, Taipei City, Taiwan
| | - Alex R. Luedtke
- Department of Statistics, University of Washington, Seattle
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Ekaterina Sadikova
- Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Hui-Ju Tsai
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan
| | - Shih-Cheng Liao
- Department of Psychiatry, National Taiwan University Hospital & College of Medicine, National Taiwan University, Taipei City, Taiwan
| | - Chen-Chung Liu
- Department of Psychiatry, National Taiwan University Hospital & College of Medicine, National Taiwan University, Taipei City, Taiwan
| | - Susan Shur-Fen Gau
- Department of Psychiatry, National Taiwan University Hospital & College of Medicine, National Taiwan University, Taipei City, Taiwan
| | - Tyler J. VanderWeele
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Ronald C. Kessler
- Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
38
|
Crawford DC, Lin J, Bailey JNC, Kinzy T, Sedor JR, O’Toole JF, Bush WS. Frequency of ClinVar Pathogenic Variants in Chronic Kidney Disease Patients Surveyed for Return of Research Results at a Cleveland Public Hospital. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020; 25:575-586. [PMID: 31797629 PMCID: PMC6931908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Return of results is not common in research settings as standards are not yet in place for what to return, how to return, and to whom. As a pioneer of large-scale of return of research results, the Precision Medicine Initiative Cohort now known of All of Us plans to return pharmacogenomic results and variants of clinical significance to its participants starting late 2019. To better understand the local landscape of possibilities regarding return of research results, we assessed the frequency of pathogenic variants and APOL1 renal risk variants in a small diverse cohort of chronic kidney disease patients (CKD) ascertained from a public hospital in Cleveland, Ohio genotyped on the Illumina Infinium MegaEX. Of the 23,720 ClinVar-designated variants directly assayed by the MegaEX, 8,355 (35%) had at least one alternate allele in the 130 participants genotyped. Of these, 18 ClinVar variants deemed pathogenic by multiple submitters with no conflicts in interpretation were distributed across 27 participants. The majority of these pathogenic ClinVar variants (14/18) were associated with autosomal recessive disorders. Of note were four African American carriers of TTR rs76992529 associated with amyloidogenic transthyretin amyloidosis, otherwise known as familial transthyretin amyloidosis (FTA). FTA, an autosomal dominant disorder with variable penetrance, is more common among African-descent populations compared with European-descent populations. Also common in this CKD population were APOL1 renal risk alleles G1 (rs73885319) and G2 (rs71785313) with 60% of the study population carrying at least one renal risk allele. Both pathogenic ClinVar variants and APOL1 renal risk alleles were distributed among participants who wanted actionable genetic results returned, wanted genetic results returned regardless of actionability, and wanted no results returned. Results from this local genetic study highlight challenges in which variants to report, how to interpret them, and the participant's potential for follow-up, only some of the challenges in return of research results likely facing larger studies such as All of Us.
Collapse
Affiliation(s)
- Dana C. Crawford
- Cleveland Institute for Computational Biology, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH 44106, USA,Department of Population and Quantitative Health Sciences, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH 44106, USA,Department of Genetics and Genome Sciences, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH 44106, USA
| | - John Lin
- Cleveland Institute for Computational Biology, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH 44106, USA
| | - Jessica N. Cooke Bailey
- Cleveland Institute for Computational Biology, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH 44106, USA,Department of Population and Quantitative Health Sciences, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH 44106, USA
| | - Tyler Kinzy
- Cleveland Institute for Computational Biology, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH 44106, USA
| | - John R. Sedor
- Department of Physiology and Biophysics, Case Western Reserve University,Department of Nephrology and Hypertension, Glickman Urology and Kidney and Lerner Research Institutes, Cleveland Clinic, Cleveland, OH 44106, USA
| | - John F. O’Toole
- Department of Nephrology and Hypertension, Glickman Urology and Kidney and Lerner Research Institutes, Cleveland Clinic, Cleveland, OH 44106, USA
| | - William S. Bush
- Cleveland Institute for Computational Biology, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH 44106, USA,Department of Population and Quantitative Health Sciences, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH 44106, USA,Department of Genetics and Genome Sciences, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH 44106, USA
| |
Collapse
|
39
|
Claussnitzer M, Cho JH, Collins R, Cox NJ, Dermitzakis ET, Hurles ME, Kathiresan S, Kenny EE, Lindgren CM, MacArthur DG, North KN, Plon SE, Rehm HL, Risch N, Rotimi CN, Shendure J, Soranzo N, McCarthy MI. A brief history of human disease genetics. Nature 2020; 577:179-189. [PMID: 31915397 PMCID: PMC7405896 DOI: 10.1038/s41586-019-1879-7] [Citation(s) in RCA: 338] [Impact Index Per Article: 84.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 11/13/2019] [Indexed: 12/16/2022]
Abstract
A primary goal of human genetics is to identify DNA sequence variants that influence biomedical traits, particularly those related to the onset and progression of human disease. Over the past 25 years, progress in realizing this objective has been transformed by advances in technology, foundational genomic resources and analytical tools, and by access to vast amounts of genotype and phenotype data. Genetic discoveries have substantially improved our understanding of the mechanisms responsible for many rare and common diseases and driven development of novel preventative and therapeutic strategies. Medical innovation will increasingly focus on delivering care tailored to individual patterns of genetic predisposition.
Collapse
Affiliation(s)
- Melina Claussnitzer
- Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard Cambridge, Cambridge, MA, USA
- Institute of Nutritional Science, University of Hohenheim, Stuttgart, Germany
| | - Judy H Cho
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Rory Collins
- Nuffield Department of Population Health (NDPH), University of Oxford, Oxford, UK
- UK Biobank, Stockport, UK
| | - Nancy J Cox
- Vanderbilt Genetics Institute and Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Emmanouil T Dermitzakis
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
- Health 2030 Genome Center, Geneva, Switzerland
| | | | - Sekar Kathiresan
- Broad Institute of MIT and Harvard Cambridge, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Verve Therapeutics, Cambridge, MA, USA
| | - Eimear E Kenny
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Cecilia M Lindgren
- Broad Institute of MIT and Harvard Cambridge, Cambridge, MA, USA
- Big Data Institute at the Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Daniel G MacArthur
- Broad Institute of MIT and Harvard Cambridge, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Kathryn N North
- Murdoch Children's Research Institute, Parkville, Victoria, Australia
- University of Melbourne, Parkville, Victoria, Australia
| | - Sharon E Plon
- Departments of Pediatrics and Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Texas Children's Cancer Center, Texas Children's Hospital, Houston, TX, USA
| | - Heidi L Rehm
- Broad Institute of MIT and Harvard Cambridge, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
| | - Neil Risch
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Charles N Rotimi
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, Bethesda, MD, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Magnuson Health Sciences Building, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | - Nicole Soranzo
- Wellcome Sanger Institute, Hinxton, UK
- Department of Haematology, University of Cambridge, Cambridge, UK
| | - Mark I McCarthy
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, UK.
- Oxford Centre for Diabetes, Endocrinology and Metabolism, Oxford, UK.
- Oxford NIHR Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, UK.
- Human Genetics, Genentech, South San Francisco, CA, USA.
| |
Collapse
|
40
|
Pendergrass SA, Buyske S, Jeff JM, Frase A, Dudek S, Bradford Y, Ambite JL, Avery CL, Buzkova P, Deelman E, Fesinmeyer MD, Haiman C, Heiss G, Hindorff LA, Hsu CN, Jackson RD, Lin Y, Le Marchand L, Matise TC, Monroe KR, Moreland L, North KE, Park SL, Reiner A, Wallace R, Wilkens LR, Kooperberg C, Ritchie MD, Crawford DC. A phenome-wide association study (PheWAS) in the Population Architecture using Genomics and Epidemiology (PAGE) study reveals potential pleiotropy in African Americans. PLoS One 2019; 14:e0226771. [PMID: 31891604 PMCID: PMC6938343 DOI: 10.1371/journal.pone.0226771] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 12/03/2019] [Indexed: 12/11/2022] Open
Abstract
We performed a hypothesis-generating phenome-wide association study (PheWAS) to identify and characterize cross-phenotype associations, where one SNP is associated with two or more phenotypes, between thousands of genetic variants assayed on the Metabochip and hundreds of phenotypes in 5,897 African Americans as part of the Population Architecture using Genomics and Epidemiology (PAGE) I study. The PAGE I study was a National Human Genome Research Institute-funded collaboration of four study sites accessing diverse epidemiologic studies genotyped on the Metabochip, a custom genotyping chip that has dense coverage of regions in the genome previously associated with cardio-metabolic traits and outcomes in mostly European-descent populations. Here we focus on identifying novel phenome-genome relationships, where SNPs are associated with more than one phenotype. To do this, we performed a PheWAS, testing each SNP on the Metabochip for an association with up to 273 phenotypes in the participating PAGE I study sites. We identified 133 putative pleiotropic variants, defined as SNPs associated at an empirically derived p-value threshold of p<0.01 in two or more PAGE study sites for two or more phenotype classes. We further annotated these PheWAS-identified variants using publicly available functional data and local genetic ancestry. Amongst our novel findings is SPARC rs4958487, associated with increased glucose levels and hypertension. SPARC has been implicated in the pathogenesis of diabetes and is also known to have a potential role in fibrosis, a common consequence of multiple conditions including hypertension. The SPARC example and others highlight the potential that PheWAS approaches have in improving our understanding of complex disease architecture by identifying novel relationships between genetic variants and an array of common human phenotypes.
Collapse
Affiliation(s)
| | - Steven Buyske
- Department of Statistics, Rutgers University, Piscataway, New Jersey, United States of America
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
| | - Janina M. Jeff
- Illumina, Inc., San Diego, California, United States of America
| | - Alex Frase
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Scott Dudek
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Yuki Bradford
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jose-Luis Ambite
- Information Sciences Institute; University of Southern California, Marina del Rey, California, United States of America
| | - Christy L. Avery
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Petra Buzkova
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Ewa Deelman
- Information Sciences Institute; University of Southern California, Marina del Rey, California, United States of America
| | | | - Christopher Haiman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, California, United States of America
| | - Gerardo Heiss
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Lucia A. Hindorff
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Chun-Nan Hsu
- Center for Research in Biological Systems, Department of Neurosciences, University of California, San Diego, La Jolla, California, United States of America
| | | | - Yi Lin
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Loic Le Marchand
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
| | - Tara C. Matise
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
| | - Kristine R. Monroe
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, California, United States of America
| | - Larry Moreland
- University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Kari E. North
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Sungshim L. Park
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, California, United States of America
| | - Alex Reiner
- Department of Epidemiology, University of Washington, Seattle, Washington, United States of America
| | - Robert Wallace
- Departments of Epidemiology and Internal Medicine, University of Iowa, Iowa City, Iowa, United States of America
| | - Lynne R. Wilkens
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Marylyn D. Ritchie
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Dana C. Crawford
- Cleveland Institute for Computational Biology, Cleveland, Ohio, United States of America
- Departments of Population and Quantitative Health Sciences and Genetics and Genome Sciences, Case Western Reserve University, Cleveland, Ohio, United States of America
- * E-mail:
| |
Collapse
|
41
|
Preo N, Capobianco E. Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks. Front Big Data 2019; 2:30. [PMID: 33693353 PMCID: PMC7931876 DOI: 10.3389/fdata.2019.00030] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Accepted: 08/16/2019] [Indexed: 01/11/2023] Open
Abstract
Background: Electronic health records (EHR) play an important role for the redefinition of phenotypes in view of the wealth and heterogeneity of information now available from disparate data sources. A recent cross-sectional retrospective study has described the potential of EHR toward type 2 diabetes mellitus (T2D) screening when ad hoc models are used. About 10,000 US patients have been analyzed through a variety of inference techniques applied to all records with a variable degree of completeness. The analyses conducted in the reference study have indicated that EHR phenotypes significantly improved T2D detection. Methods: With these US patients and the T2D data evidenced in the above study, we propose an integrative inference approach that leverages the prediction power of EHR features selected by two well-known methods, Random Forests and Lasso. The goal is 2-fold: reducing the Big Data redundancies potentially harmful to the predictive learning task and exploiting the interconnectivity of EHR features. A mutual information (MI) network is the inference tool used to identify communities useful to prioritize significant T2D features underlying the similarity between patients. Results: Endowed with a different degree of granularity, the communities detected after the application of both methods were centered especially on T2D comorbidities and risk factors. As such, they appear very relevant for assessment of two main issues, T2D disease burden, and prevention. Conclusions: Our analytical approach offers a solution for managing the EHR scale factor in a complex disease context. EHR are rich sources of phenotypic diversity through which novel stratifications of patients are expected. To enable these results, both pre-screening of variables and calibration of risk prediction methods become necessary steps in EHR analyses. We have presented networks identifying major T2D communities. The specific significance assigned to comorbidities and risk factors in relation to T2D can be inferred with accuracy from just a suitably reduced number of EHR features.
Collapse
Affiliation(s)
| | - Enrico Capobianco
- Center for Computational Science, University of Miami, Miami, FL, United States
| |
Collapse
|
42
|
Abul-Husn NS, Kenny EE. Personalized Medicine and the Power of Electronic Health Records. Cell 2019; 177:58-69. [PMID: 30901549 PMCID: PMC6921466 DOI: 10.1016/j.cell.2019.02.039] [Citation(s) in RCA: 145] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Revised: 02/13/2019] [Accepted: 02/22/2019] [Indexed: 02/06/2023]
Abstract
Personalized medicine has largely been enabled by the integration of genomic and other data with electronic health records (EHRs) in the United States and elsewhere. Increased EHR adoption across various clinical settings and the establishment of EHR-linked population-based biobanks provide unprecedented opportunities for the types of translational and implementation research that drive personalized medicine. We review advances in the digitization of health information and the proliferation of genomic research in health systems and provide insights into emerging paths for the widespread implementation of personalized medicine.
Collapse
Affiliation(s)
- Noura S Abul-Husn
- The Center for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA; The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eimear E Kenny
- The Center for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA; The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|