1
|
Li Y, Yang AY, Marelli A, Li Y. MixEHR-SurG: A joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records. J Biomed Inform 2024; 153:104638. [PMID: 38631461 DOI: 10.1016/j.jbi.2024.104638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 03/07/2024] [Accepted: 04/03/2024] [Indexed: 04/19/2024]
Abstract
Survival models can help medical practitioners to evaluate the prognostic importance of clinical variables to patient outcomes such as mortality or hospital readmission and subsequently design personalized treatment regimes. Electronic Health Records (EHRs) hold the promise for large-scale survival analysis based on systematically recorded clinical features for each patient. However, existing survival models either do not scale to high dimensional and multi-modal EHR data or are difficult to interpret. In this study, we present a supervised topic model called MixEHR-SurG to simultaneously integrate heterogeneous EHR data and model survival hazard. Our contributions are three-folds: (1) integrating EHR topic inference with Cox proportional hazards likelihood; (2) integrating patient-specific topic hyperparameters using the PheCode concepts such that each topic can be identified with exactly one PheCode-associated phenotype; (3) multi-modal survival topic inference. This leads to a highly interpretable survival topic model that can infer PheCode-specific phenotype topics associated with patient mortality. We evaluated MixEHR-SurG using a simulated dataset and two real-world EHR datasets: the Quebec Congenital Heart Disease (CHD) data consisting of 8211 subjects with 75,187 outpatient claim records of 1767 unique ICD codes; the MIMIC-III consisting of 1458 subjects with multi-modal EHR records. Compared to the baselines, MixEHR-SurG achieved a superior dynamic AUROC for mortality prediction, with a mean AUROC score of 0.89 in the simulation dataset and a mean AUROC of 0.645 on the CHD dataset. Qualitatively, MixEHR-SurG associates severe cardiac conditions with high mortality risk among the CHD patients after the first heart failure hospitalization and critical brain injuries with increased mortality among the MIMIC-III patients after their ICU discharge. Together, the integration of the Cox proportional hazards model and EHR topic inference in MixEHR-SurG not only leads to competitive mortality prediction but also meaningful phenotype topics for in-depth survival analysis. The software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-SurG.
Collapse
Affiliation(s)
- Yixuan Li
- Department of Mathematics and Statistics, McGill University, Montreal, Canada; Mila - Quebec AI institute, Montreal, Canada
| | - Archer Y Yang
- Department of Mathematics and Statistics, McGill University, Montreal, Canada; Mila - Quebec AI institute, Montreal, Canada; School of Computer Science, McGill University, Montreal, Canada.
| | - Ariane Marelli
- McGill Adult Unit for Congenital Heart Disease (MAUDE Unit), McGill University of Health Centre, Montreal, Canada.
| | - Yue Li
- Mila - Quebec AI institute, Montreal, Canada; School of Computer Science, McGill University, Montreal, Canada.
| |
Collapse
|
2
|
Seki T, Kawazoe Y, Ohe K. Clinical Feature Vector Generation using Unsupervised Graph Representation Learning from Heterogeneous Medical Records. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024; 2023:618-623. [PMID: 38222342 PMCID: PMC10785854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
The diversity of patient information recorded on electronic medical records generally, presents a challenge for converting it into fixed-length vectors that align with clinical characteristics. To address this issue, this study aimed to utilize an unsupervised graph representation learning method to transform the unstructured inpatient information from electronic medical records into a fixed-length vector. Infograph, one of the unsupervised graph representation learning algorithms was applied to the graphed inpatient information, resulting in embedded vectors of fixed length. The embedded vectors were then evaluated for whether the clinical information was preserved in it. The results indicated that the embedded representation contained information that could predict readmission within 30 days, demonstrating the feasibility of using unsupervised graph representation learning to transform patient information into fixed-length vectors that retain clinical characteristics.
Collapse
Affiliation(s)
- Tomohisa Seki
- Department of Healthcare Information Management, The University of Tokyo Hospital, Tokyo, Japan
| | - Yoshimasa Kawazoe
- Artificial Intelligence in Healthcare, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Kazuhiko Ohe
- Department of Biomedical Informatics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
3
|
Woodman RJ, Mangoni AA. A comprehensive review of machine learning algorithms and their application in geriatric medicine: present and future. Aging Clin Exp Res 2023; 35:2363-2397. [PMID: 37682491 PMCID: PMC10627901 DOI: 10.1007/s40520-023-02552-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 08/24/2023] [Indexed: 09/09/2023]
Abstract
The increasing access to health data worldwide is driving a resurgence in machine learning research, including data-hungry deep learning algorithms. More computationally efficient algorithms now offer unique opportunities to enhance diagnosis, risk stratification, and individualised approaches to patient management. Such opportunities are particularly relevant for the management of older patients, a group that is characterised by complex multimorbidity patterns and significant interindividual variability in homeostatic capacity, organ function, and response to treatment. Clinical tools that utilise machine learning algorithms to determine the optimal choice of treatment are slowly gaining the necessary approval from governing bodies and being implemented into healthcare, with significant implications for virtually all medical disciplines during the next phase of digital medicine. Beyond obtaining regulatory approval, a crucial element in implementing these tools is the trust and support of the people that use them. In this context, an increased understanding by clinicians of artificial intelligence and machine learning algorithms provides an appreciation of the possible benefits, risks, and uncertainties, and improves the chances for successful adoption. This review provides a broad taxonomy of machine learning algorithms, followed by a more detailed description of each algorithm class, their purpose and capabilities, and examples of their applications, particularly in geriatric medicine. Additional focus is given on the clinical implications and challenges involved in relying on devices with reduced interpretability and the progress made in counteracting the latter via the development of explainable machine learning.
Collapse
Affiliation(s)
- Richard J Woodman
- Centre of Epidemiology and Biostatistics, College of Medicine and Public Health, Flinders University, GPO Box 2100, Adelaide, SA, 5001, Australia.
| | - Arduino A Mangoni
- Discipline of Clinical Pharmacology, College of Medicine and Public Health, Flinders University, Adelaide, SA, Australia
- Department of Clinical Pharmacology, Flinders Medical Centre, Southern Adelaide Local Health Network, Adelaide, SA, Australia
| |
Collapse
|
4
|
Zhang Y, Jiang X, Mentzer AJ, McVean G, Lunter G. Topic modeling identifies novel genetic loci associated with multimorbidities in UK Biobank. CELL GENOMICS 2023; 3:100371. [PMID: 37601973 PMCID: PMC10435382 DOI: 10.1016/j.xgen.2023.100371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 05/04/2023] [Accepted: 07/07/2023] [Indexed: 08/22/2023]
Abstract
Many diseases show patterns of co-occurrence, possibly driven by systemic dysregulation of underlying processes affecting multiple traits. We have developed a method (treeLFA) for identifying such multimorbidities from routine health-care data, which combines topic modeling with an informative prior derived from medical ontology. We apply treeLFA to UK Biobank data and identify a variety of topics representing multimorbidity clusters, including a healthy topic. We find that loci identified using topic weights as traits in a genome-wide association study (GWAS) analysis, which we validated with a range of approaches, only partially overlap with loci from GWASs on constituent single diseases. We also show that treeLFA improves upon existing methods like latent Dirichlet allocation in various ways. Overall, our findings indicate that topic models can characterize multimorbidity patterns and that genetic analysis of these patterns can provide insight into the etiology of complex traits that cannot be determined from the analysis of constituent traits alone.
Collapse
Affiliation(s)
- Yidong Zhang
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Chinese Academy of Medical Sciences Oxford Institute, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
- Department of Radiation Oncology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100006, China
| | - Xilin Jiang
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge CB2 0SR, UK
- Heart and Lung Research Institute, University of Cambridge, Cambridge CB2 0BB, UK
| | - Alexander J. Mentzer
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Gerton Lunter
- MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford OX3 9DS, UK
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen 9700 RB, the Netherlands
| |
Collapse
|
5
|
Wang Y, Grant AV, Li Y. Implementation of a graph-embedded topic model for analysis of population-level electronic health records. STAR Protoc 2023; 4:101966. [PMID: 36583962 PMCID: PMC9807818 DOI: 10.1016/j.xpro.2022.101966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 10/03/2022] [Accepted: 12/08/2022] [Indexed: 12/29/2022] Open
Abstract
To address the need for systematic investigation of the phenome enabled by ever-growing genotype and phenotype data, we describe our step-by-step software implementation of a graph-embedded topic model, including data preprocessing, graph learning, topic inference, and phenotype prediction. As a demonstration, we use simulated data that mimic the UK Biobank data as in our original study. We will demonstrate topic analysis to discover disease comorbidities and computational phenotyping via the inferred topic mixture for each subject. For complete details on the use and execution of this protocol, please refer to Wang et al. (2022).1.
Collapse
Affiliation(s)
- Yuening Wang
- School of Computer Science, McGill University, Montreal, QC H3A 0G4, Canada
| | - Audrey V Grant
- Department of Anesthesia, McGill University, Montreal, QC H2A 0G4, Canada
| | - Yue Li
- School of Computer Science, McGill University, Montreal, QC H3A 0G4, Canada.
| |
Collapse
|
6
|
Zou Y, Pesaranghader A, Song Z, Verma A, Buckeridge DL, Li Y. Modeling electronic health record data using an end-to-end knowledge-graph-informed topic model. Sci Rep 2022; 12:17868. [PMID: 36284225 PMCID: PMC9596500 DOI: 10.1038/s41598-022-22956-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 10/21/2022] [Indexed: 01/20/2023] Open
Abstract
The rapid growth of electronic health record (EHR) datasets opens up promising opportunities to understand human diseases in a systematic way. However, effective extraction of clinical knowledge from EHR data has been hindered by the sparse and noisy information. We present Graph ATtention-Embedded Topic Model (GAT-ETM), an end-to-end taxonomy-knowledge-graph-based multimodal embedded topic model. GAT-ETM distills latent disease topics from EHR data by learning the embedding from a constructed medical knowledge graph. We applied GAT-ETM to a large-scale EHR dataset consisting of over 1 million patients. We evaluated its performance based on topic quality, drug imputation, and disease diagnosis prediction. GAT-ETM demonstrated superior performance over the alternative methods on all tasks. Moreover, GAT-ETM learned clinically meaningful graph-informed embedding of the EHR codes and discovered interpretable and accurate patient representations for patient stratification and drug recommendations. GAT-ETM code is available at https://github.com/li-lab-mcgill/GAT-ETM .
Collapse
Affiliation(s)
- Yuesong Zou
- grid.14709.3b0000 0004 1936 8649School of Computer Science, McGill University, Montreal, Canada
| | - Ahmad Pesaranghader
- grid.14709.3b0000 0004 1936 8649School of Computer Science, McGill University, Montreal, Canada
| | - Ziyang Song
- grid.14709.3b0000 0004 1936 8649School of Computer Science, McGill University, Montreal, Canada
| | - Aman Verma
- grid.14709.3b0000 0004 1936 8649School of Population and Global Health, McGill University, Montreal, Canada
| | - David L. Buckeridge
- grid.14709.3b0000 0004 1936 8649School of Population and Global Health, McGill University, Montreal, Canada
| | - Yue Li
- grid.14709.3b0000 0004 1936 8649School of Computer Science, McGill University, Montreal, Canada
| |
Collapse
|