Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Li R, Chen Y, Moore JH. Integration of genetic and clinical information to improve imputation of data missing from electronic health records. J Am Med Inform Assoc 2021;26:1056-1063. [PMID: 31329892 DOI: 10.1093/jamia/ocz041] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Revised: 03/12/2019] [Accepted: 03/18/2019] [Indexed: 01/29/2023] Open

For:	Li R, Chen Y, Moore JH. Integration of genetic and clinical information to improve imputation of data missing from electronic health records. J Am Med Inform Assoc 2021;26:1056-1063. [PMID: 31329892 DOI: 10.1093/jamia/ocz041] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Revised: 03/12/2019] [Accepted: 03/18/2019] [Indexed: 01/29/2023] Open

Number

Cited by Other Article(s)

Jordan DM, Vy HMT, Do R. A deep learning transformer model predicts high rates of undiagnosed rare disease in large electronic health systems. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.21.23300393. [PMID: 38196638 PMCID: PMC10775679 DOI: 10.1101/2023.12.21.23300393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]

Abstract

It is estimated that as many as 1 in 16 people worldwide suffer from rare diseases. Rare disease patients face difficulty finding diagnosis and treatment for their conditions, including long diagnostic odysseys, multiple incorrect diagnoses, and unavailable or prohibitively expensive treatments. As a result, it is likely that large electronic health record (EHR) systems include high numbers of participants suffering from undiagnosed rare disease. While this has been shown in detail for specific diseases, these studies are expensive and time consuming and have only been feasible to perform for a handful of the thousands of known rare diseases. The bulk of these undiagnosed cases are effectively hidden, with no straightforward way to differentiate them from healthy controls. The ability to access them at scale would enormously expand our capacity to study and develop drugs for rare diseases, adding to tools aimed at increasing availability of study cohorts for rare disease. In this study, we train a deep learning transformer algorithm, RarePT (Rare-Phenotype Prediction Transformer), to impute undiagnosed rare disease from EHR diagnosis codes in 436,407 participants in the UK Biobank and validated on an independent cohort from 3,333,560 individuals from the Mount Sinai Health System. We applied our model to 155 rare diagnosis codes with fewer than 250 cases each in the UK Biobank and predicted participants with elevated risk for each diagnosis, with the number of participants predicted to be at risk ranging from 85 to 22,000 for different diagnoses. These risk predictions are significantly associated with increased mortality for 65% of diagnoses, with disease burden expressed as disability-adjusted life years (DALY) for 73% of diagnoses, and with 72% of available disease-specific diagnostic tests. They are also highly enriched for known rare diagnoses in patients not included in the training set, with an odds ratio (OR) of 48.0 in cross-validation cohorts of the UK Biobank and an OR of 30.6 in the independent Mount Sinai Health System cohort. Most importantly, RarePT successfully screens for undiagnosed patients in 32 rare diseases with available diagnostic tests in the UK Biobank. Using the trained model to estimate the prevalence of undiagnosed disease in the UK Biobank for these 32 rare phenotypes, we find that at least 50% of patients remain undiagnosed for 20 of 32 diseases. These estimates provide empirical evidence of a high prevalence of undiagnosed rare disease, as well as demonstrating the enormous potential benefit of using RarePT to screen for undiagnosed rare disease patients in large electronic health systems.

Collapse

Stellmach C, Sass J, Auber B, Boeker M, Wienker T, Heidel AJ, Benary M, Schumacher S, Ossowski S, Klauschen F, Möller Y, Schmutzler R, Ustjanzew A, Werner P, Tomczak A, Hölter T, Thun S. Creation of a structured molecular genomics report for Germany as a local adaption of HL7's Genomic Reporting Implementation Guide. J Am Med Inform Assoc 2023;30:1179-1189. [PMID: 37080557 DOI: 10.1093/jamia/ocad061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 03/22/2023] [Accepted: 03/28/2023] [Indexed: 04/22/2023] Open

Affiliation(s)

Caroline Stellmach Core Facility Digital Medicine and Interoperability, Berlin Institute of Health (BIH) at Charité - Universitätsmedizin Berlin, Berlin, Germany
Julian Sass Core Facility Digital Medicine and Interoperability, Berlin Institute of Health (BIH) at Charité - Universitätsmedizin Berlin, Berlin, Germany
Bernd Auber Department of Human Genetics, Hannover Medical School, Hannover, Germany
Martin Boeker Fakultät für Medizin, Technische Universität München, Munich, Germany
Thomas Wienker Emeritus Ropers, Max Planck Institute for Molecular Genetics, Berlin, Germany
Andrew J Heidel IT Department, Universitätsklinikum Jena, Jena, Germany
Manuela Benary Core Unit Bioinformatics, Berlin Institute of Health (BIH) at Charité - Universitätsmedizin Berlin, Berlin, Germany
Simon Schumacher Medical Data Integration Center (MeDIC), Universitätsklinikum Köln, Cologne, Germany
Stephan Ossowski Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
Frederick Klauschen Institut für Pathologie, Charité - Universitätsmedizin Berlin, Berlin, Germany Pathologisches Institut, Ludwig-Maximilians-Universität München, Munich, Germany Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
Yvonne Möller Center for personalized medicine (ZPM), Universitätsklinikum Tübingen, Tübingen, Germany
Rita Schmutzler Center Familial Breast and Ovarian Cancer, National Center of Familial Tumor Diseases and Center of Integrated Oncology, Universitätsklinikum Köln, Cologne, Germany
Arsenij Ustjanzew Institut für Medizinische, Biometrie, Epidemiologie und Informatik Mainz, Universitätsmedizin der Johannes Gutenberg-Universität Mainz, Mainz, Germany
Patrick Werner MOLIT Institut gGmbH, Heilbronn, Germany
Aurelie Tomczak Liver Cancer Centre Heidelberg, Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
Thimo Hölter Core Facility Digital Medicine and Interoperability, Berlin Institute of Health (BIH) at Charité - Universitätsmedizin Berlin, Berlin, Germany
Sylvia Thun Core Facility Digital Medicine and Interoperability, Berlin Institute of Health (BIH) at Charité - Universitätsmedizin Berlin, Berlin, Germany

Collapse

Yang S, Varghese P, Stephenson E, Tu K, Gronsbell J. Machine learning approaches for electronic health records phenotyping: a methodical review. J Am Med Inform Assoc 2023;30:367-381. [PMID: 36413056 PMCID: PMC9846699 DOI: 10.1093/jamia/ocac216] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/27/2022] [Accepted: 10/27/2022] [Indexed: 11/23/2022] Open

Abstract

OBJECTIVE

Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used.

MATERIALS AND METHODS

We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies.

RESULTS

Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions.

DISCUSSION

Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released.

CONCLUSION

Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.

Collapse

Hall AG, Davlyatov GK, Orewa GN, Mehta TS, Feldman SS. Multiple Electronic Health Record-Based Measures of Social Determinants of Health to Predict Return to the Emergency Department Following Discharge. Popul Health Manag 2022;25:771-780. [PMID: 36315199 DOI: 10.1089/pop.2022.0088] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Ruddle RA, Adnan M, Hall M. Using set visualisation to find and explain patterns of missing values: a case study with NHS hospital episode statistics data. BMJ Open 2022;12:e064887. [PMID: 36410820 PMCID: PMC9680176 DOI: 10.1136/bmjopen-2022-064887] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Abstract

OBJECTIVES

Missing data is the most common data quality issue in electronic health records (EHRs). Missing data checks implemented in common analytical software are typically limited to counting the number of missing values in individual fields, but researchers and organisations also need to understand multifield missing data patterns to better inform advanced missing data strategies for which counts or numerical summaries are poorly suited. This study shows how set-based visualisation enables multifield missing data patterns to be discovered and investigated.

DESIGN

Development and evaluation of interactive set visualisation techniques to find patterns of missing data and generate actionable insights. The visualisations comprised easily interpretable bar charts for sets, heatmaps for set intersections and histograms for distributions of both sets and intersections.

SETTING AND PARTICIPANTS

Anonymised admitted patient care health records for National Health Service (NHS) hospitals and independent sector providers in England. The visualisation and data mining software was run over 16 million records and 86 fields in the dataset.

RESULTS

The dataset contained 960 million missing values. Set visualisation bar charts showed how those values were distributed across the fields, including several fields that, unexpectedly, were not complete. Set intersection heatmaps revealed unexpected gaps in diagnosis, operation and date fields because diagnosis and operation fields were not filled up sequentially and some operations did not have corresponding dates. Information gain ratio and entropy calculations allowed us to identify the origin of each unexpected pattern, in terms of the values of other fields.

CONCLUSIONS

Our findings show how set visualisation reveals important insights about multifield missing data patterns in large EHR datasets. The study revealed both rare and widespread data quality issues that were previously unknown, and allowed a particular part of a specific hospital to be pinpointed as the origin of rare issues that NHS Digital did not know exist.

Collapse

Sun Y, Liu F, Zhang Y, Lu Y, Su Z, Ji H, Cheng Y, Song W, Hidru TH, Yang X, Jiang Y. The relationship of endothelial function and arterial stiffness with subclinical target organ damage in essential hypertension. J Clin Hypertens (Greenwich) 2022;24:418-429. [PMID: 35238151 PMCID: PMC8989756 DOI: 10.1111/jch.14447] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 02/02/2022] [Accepted: 02/06/2022] [Indexed: 12/02/2022]

Gianfrancesco MA, Goldstein ND. A narrative review on the validity of electronic health record-based research in epidemiology. BMC Med Res Methodol 2021;21:234. [PMID: 34706667 PMCID: PMC8549408 DOI: 10.1186/s12874-021-01416-5] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 09/28/2021] [Indexed: 11/10/2022] Open

Tan Q, Ye M, Ma AJ, Yip TCF, Wong GLH, Yuen PC. Importance-aware personalized learning for early risk prediction using static and dynamic health data. J Am Med Inform Assoc 2021;28:713-726. [PMID: 33496786 DOI: 10.1093/jamia/ocaa306] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 11/17/2020] [Accepted: 11/21/2020] [Indexed: 01/23/2023] Open

Abstract

OBJECTIVE

Accurate risk prediction is important for evaluating early medical treatment effects and improving health care quality. Existing methods are usually designed for dynamic medical data, which require long-term observations. Meanwhile, important personalized static information is ignored due to the underlying uncertainty and unquantifiable ambiguity. It is urgent to develop an early risk prediction method that can adaptively integrate both static and dynamic health data.

MATERIALS AND METHODS

Data were from 6367 patients with Peptic Ulcer Bleeding between 2007 and 2016. This article develops a novel End-to-end Importance-Aware Personalized Deep Learning Approach (eiPDLA) to achieve accurate early clinical risk prediction. Specifically, eiPDLA introduces a long short-term memory with temporal attention to learn sequential dependencies from time-stamped records and simultaneously incorporating a residual network with correlation attention to capture their influencing relationship with static medical data. Furthermore, a new multi-residual multi-scale network with the importance-aware mechanism is designed to adaptively fuse the learned multisource features, automatically assigning larger weights to important features while weakening the influence of less important features.

RESULTS

Extensive experimental results on a real-world dataset illustrate that our method significantly outperforms the state-of-the-arts for early risk prediction under various settings (eg, achieving an AUC score of 0.944 at 1 year ahead of risk prediction). Case studies indicate that the achieved prediction results are highly interpretable.

CONCLUSION

These results reflect the importance of combining static and dynamic health data, mining their influencing relationship, and incorporating the importance-aware mechanism to automatically identify important features. The achieved accurate early risk prediction results save precious time for doctors to timely design effective treatments and improve clinical outcomes.

Collapse

Trinder M, Brunham LR. Polygenic scores for dyslipidemia: the emerging genomic model of plasma lipoprotein trait inheritance. Curr Opin Lipidol 2021;32:103-111. [PMID: 33395106 DOI: 10.1097/mol.0000000000000737] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Haneuse S, Arterburn D, Daniels MJ. Assessing Missing Data Assumptions in EHR-Based Studies: A Complex and Underappreciated Task. JAMA Netw Open 2021;4:e210184. [PMID: 33635321 DOI: 10.1001/jamanetworkopen.2021.0184] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Increasing the Density of Laboratory Measures for Machine Learning Applications. J Clin Med 2020;10:jcm10010103. [PMID: 33396741 PMCID: PMC7795258 DOI: 10.3390/jcm10010103] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 12/23/2020] [Accepted: 12/25/2020] [Indexed: 12/12/2022] Open

Abstract

Background. The imputation of missingness is a key step in Electronic Health Records (EHR) mining, as it can significantly affect the conclusions derived from the downstream analysis in translational medicine. The missingness of laboratory values in EHR is not at random, yet imputation techniques tend to disregard this key distinction. Consequently, the development of an adaptive imputation strategy designed specifically for EHR is an important step in improving the data imbalance and enhancing the predictive power of modeling tools for healthcare applications. Method. We analyzed the laboratory measures derived from Geisinger’s EHR on patients in three distinct cohorts—patients tested for Clostridioides difficile (Cdiff) infection, patients with a diagnosis of inflammatory bowel disease (IBD), and patients with a diagnosis of hip or knee osteoarthritis (OA). We extracted Logical Observation Identifiers Names and Codes (LOINC) from which we excluded those with 75% or more missingness. The comorbidities, primary or secondary diagnosis, as well as active problem lists, were also extracted. The adaptive imputation strategy was designed based on a hybrid approach. The comorbidity patterns of patients were transformed into latent patterns and then clustered. Imputation was performed on a cluster of patients for each cohort independently to show the generalizability of the method. The results were compared with imputation applied to the complete dataset without incorporating the information from comorbidity patterns. Results. We analyzed a total of 67,445 patients (11,230 IBD patients, 10,000 OA patients, and 46,215 patients tested for C. difficile infection). We extracted 495 LOINC and 11,230 diagnosis codes for the IBD cohort, 8160 diagnosis codes for the Cdiff cohort, and 2042 diagnosis codes for the OA cohort based on the primary/secondary diagnosis and active problem list in the EHR. Overall, the most improvement from this strategy was observed when the laboratory measures had a higher level of missingness. The best root mean square error (RMSE) difference for each dataset was recorded as −35.5 for the Cdiff, −8.3 for the IBD, and −11.3 for the OA dataset. Conclusions. An adaptive imputation strategy designed specifically for EHR that uses complementary information from the clinical profile of the patient can be used to improve the imputation of missing laboratory values, especially when laboratory codes with high levels of missingness are included in the analysis.

Collapse

Li R, Chen Y, Ritchie MD, Moore JH. Electronic health records and polygenic risk scores for predicting disease risk. Nat Rev Genet 2020;21:493-502. [PMID: 32235907 DOI: 10.1038/s41576-020-0224-1] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/02/2020] [Indexed: 01/03/2023]