1
|
Wen J, Hou J, Bonzel CL, Zhao Y, Castro VM, Gainer VS, Weisenfeld D, Cai T, Ho YL, Panickan VA, Costa L, Hong C, Gaziano JM, Liao KP, Lu J, Cho K, Cai T. LATTE: Label-efficient incident phenotyping from longitudinal electronic health records. PATTERNS (NEW YORK, N.Y.) 2024; 5:100906. [PMID: 38264714 PMCID: PMC10801250 DOI: 10.1016/j.patter.2023.100906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 09/06/2023] [Accepted: 12/01/2023] [Indexed: 01/25/2024]
Abstract
Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning. LATTE models the sequential dependency between the target event and visit embeddings to derive the timings. To improve label efficiency, LATTE constructs longitudinal silver-standard labels from unlabeled patients to perform semi-supervised training. LATTE is evaluated on the onset of type 2 diabetes, heart failure, and relapses of multiple sclerosis. LATTE consistently achieves substantial improvements over benchmark methods while providing high prediction interpretability. The event timings are shown to help discover risk factors of heart failure among patients with rheumatoid arthritis.
Collapse
Affiliation(s)
- Jun Wen
- Harvard Medical School, Boston, MA, USA
- VA Boston Healthcare System, Boston, MA, USA
| | - Jue Hou
- University of Minnesota, Minneapolis, MN, USA
| | - Clara-Lea Bonzel
- Harvard Medical School, Boston, MA, USA
- VA Boston Healthcare System, Boston, MA, USA
| | | | | | | | | | - Tianrun Cai
- VA Boston Healthcare System, Boston, MA, USA
- Mass General Brigham, Boston, MA, USA
| | - Yuk-Lam Ho
- VA Boston Healthcare System, Boston, MA, USA
| | - Vidul A. Panickan
- Harvard Medical School, Boston, MA, USA
- VA Boston Healthcare System, Boston, MA, USA
| | | | | | - J. Michael Gaziano
- Harvard Medical School, Boston, MA, USA
- VA Boston Healthcare System, Boston, MA, USA
- Brigham and Women’s Hospital, Boston, MA, USA
| | - Katherine P. Liao
- Harvard Medical School, Boston, MA, USA
- VA Boston Healthcare System, Boston, MA, USA
- Brigham and Women’s Hospital, Boston, MA, USA
| | - Junwei Lu
- VA Boston Healthcare System, Boston, MA, USA
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Kelly Cho
- Harvard Medical School, Boston, MA, USA
- VA Boston Healthcare System, Boston, MA, USA
- Brigham and Women’s Hospital, Boston, MA, USA
| | - Tianxi Cai
- Harvard Medical School, Boston, MA, USA
- VA Boston Healthcare System, Boston, MA, USA
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
2
|
Zhu S, Zheng W, Pang H. CPAE: Contrastive predictive autoencoder for unsupervised pre-training in health status prediction. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 234:107484. [PMID: 37030137 DOI: 10.1016/j.cmpb.2023.107484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 02/20/2023] [Accepted: 03/12/2023] [Indexed: 06/19/2023]
Abstract
BACKGROUND AND OBJECTIVE Fully-supervised learning approaches have shown promising results in some health status prediction tasks using Electronic Health Records (EHRs). These traditional approaches rely on sufficient labeled data to learn from. However, in practice, acquiring large-scaled labeled medical data for various prediction tasks is often not feasible. Thus, it is of great interest to utilize contrastive pre-training to leverage the unlabeled information. METHODS In this work, we propose a novel data-efficient framework, contrastive predictive autoencoder (CPAE), to first learn without labels from the EHR data in the pre-training process, and then fine-tune on the downstream tasks. Our framework comprises of two parts: (i) a contrastive learning process, inherited from contrastive predictive coding (CPC), which aims to extract global slow-varying features, and (ii) a reconstruction process, which forces the encoder to capture local features. We also introduce the attention mechanism in one variant of our framework to balance the above two processes. RESULTS Experiments on real-world EHR dataset verify the effectiveness of our proposed framework on two downstream tasks (i.e., in-hospital mortality prediction and length-of-stay prediction), compared to their supervised counterparts, the CPC model, and other baseline models. CONCLUSIONS By comprising of both contrastive learning components and reconstruction components, CPAE aims to extract both global slow-varying information and local transient information. The best results on two downstream tasks are all achieved by CPAE. The variant AtCPAE is particularly superior when fine-tuned on very small training data. Further work may incorporate techniques of multi-task learning to optimize the pre-training process of CPAEs. Moreover, this work is based on the benchmark MIMIC-III dataset which only includes 17 variables. Future work may extend to a larger number of variables.
Collapse
Affiliation(s)
- Shuying Zhu
- Li Ka Shing Faculty of Medicine, the University of Hong Kong, Hong Kong SAR, China.
| | - Weizhong Zheng
- Li Ka Shing Faculty of Medicine, the University of Hong Kong, Hong Kong SAR, China.
| | - Herbert Pang
- Li Ka Shing Faculty of Medicine, the University of Hong Kong, Hong Kong SAR, China; Department of Biostatistics and Bioinformatics, Duke University School of Medicine, NC, USA.
| |
Collapse
|