1
|
Zhang J, Xu Y, Ye B, Zhao Y, Sun X, Meng Q, Zhang Y, Cui L. EAPR: explainable and augmented patient representation learning for disease prediction. Health Inf Sci Syst 2023; 11:53. [PMID: 37974902 PMCID: PMC10645955 DOI: 10.1007/s13755-023-00256-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 10/17/2023] [Indexed: 11/19/2023] Open
Abstract
Patient representation learning aims to encode meaningful information about the patient's Electronic Health Records (EHR) in the form of a mathematical representation. Recent advances in deep learning have empowered Patient representation learning methods with greater representational power, allowing the learned representations to significantly improve the performance of disease prediction models. However, the inherent shortcomings of deep learning models, such as the need for massive amounts of labeled data and inexplicability, limit the performance of deep learning-based Patient representation learning methods to further improvements. In particular, learning robust patient representations is challenging when patient data is missing or insufficient. Although data augmentation techniques can tackle this deficiency, the complex data processing further weakens the inexplicability of patient representation learning models. To address the above challenges, this paper proposes an Explainable and Augmented Patient Representation Learning for disease prediction (EAPR). EAPR utilizes data augmentation controlled by confidence interval to enhance patient representation in the presence of limited patient data. Moreover, EAPR proposes to use two-stage gradient backpropagation to address the problem of unexplainable patient representation learning models due to the complex data enhancement process. The experimental results on real clinical data validate the effectiveness and explainability of the proposed approach.
Collapse
Affiliation(s)
- Jiancheng Zhang
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Jinan, China
- School of Software, Shandong University, Jinan, China
| | - Yonghui Xu
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Jinan, China
- School of Software, Shandong University, Jinan, China
| | - Bicui Ye
- Wuzhou Red Cross Hospital, Wuzhou, China
- Jinan University, Jinan, China
| | - Yibowen Zhao
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Jinan, China
- School of Software, Shandong University, Jinan, China
| | - Xiaofang Sun
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Jinan, China
- School of Software, Shandong University, Jinan, China
| | - Qi Meng
- Department of Radiology, Qilu Hospital of Shandong University, Jinan, China
| | - Yang Zhang
- Department of Radiology, Qilu Hospital of Shandong University, Jinan, China
| | - Lizhen Cui
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Jinan, China
- School of Software, Shandong University, Jinan, China
| |
Collapse
|
2
|
Xiang J, Xu H, Pokharel S, Li J, Xue F, Zhang P. Building a knowledge base for colorectal cancer patient care using formal concept analysis. BMC Med Inform Decis Mak 2022; 21:369. [DOI: 10.1186/s12911-021-01728-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 12/17/2021] [Indexed: 11/25/2022] Open
Abstract
Abstract
Background
Colorectal cancer (CRC) is a heterogeneous disease with different responses to targeted therapies due to various factors, and the treatment effect differs significantly between individuals. Personalize medical treatment (PMT) is a method that takes individual patient characteristics into consideration, making it the most effective way to deal with this issue. Patient similarity and clustering analysis is an important aspect of PMT. This paper describes how to build a knowledge base using formal concept analysis (FCA), which clusters patients based on their similarity and preserves the relations between clusters in hierarchical structural form.
Methods
Prognostic factors (attributes) of 2442 CRC patients, including patient age, cancer cell differentiation, lymphatic invasion and metastasis stages were used to build a formal context in FCA. A concept was defined as a set of patients with their shared attributes. The formal context was formed based on the similarity scores between each concept identified from the dataset, which can be used as a knowledge base.
Results
A hierarchical knowledge base was constructed along with the clinical records of the diagnosed CRC patients. For each new patient, a similarity score to each existing concept in the knowledge base can be retrieved with different similarity calculations. The ranked similarity scores that are associated with the concepts can offer references for treatment plans.
Conclusions
Patients that share the same concept indicates the potential similar effect from same clinical procedures or treatments. In conjunction with a clinician’s ability to undergo flexible analyses and apply appropriate judgement, the knowledge base allows faster and more effective decisions to be made for patient treatment and care.
Collapse
|
3
|
Oei RW, Hsu W, Lee ML, Tan NC. Using similar patients to predict complication in patients with diabetes, hypertension, and lipid disorder: a domain knowledge-infused convolutional neural network approach. J Am Med Inform Assoc 2022; 30:273-281. [PMID: 36343096 PMCID: PMC9846687 DOI: 10.1093/jamia/ocac212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 09/27/2022] [Accepted: 10/21/2022] [Indexed: 11/09/2022] Open
Abstract
OBJECTIVE This study aims to develop a convolutional neural network-based learning framework called domain knowledge-infused convolutional neural network (DK-CNN) for retrieving clinically similar patient and to personalize the prediction of macrovascular complication using the retrieved patients. MATERIALS AND METHODS We use the electronic health records of 169 434 patients with diabetes, hypertension, and/or lipid disorder. Patients are partitioned into 7 subcohorts based on their comorbidities. DK-CNN integrates both domain knowledge and disease trajectory of patients over multiple visits to retrieve similar patients. We use normalized discounted cumulative gain (nDCG) and macrovascular complication prediction performance to evaluate the effectiveness of DK-CNN compared to state-of-the-art models. Ablation studies are conducted to compare DK-CNN with reduced models that do not use domain knowledge as well as models that do not consider short-term, medium-term, and long-term trajectory over multiple visits. RESULTS Key findings from this study are: (1) DK-CNN is able to retrieve clinically similar patients and achieves the highest nDCG values in all 7 subcohorts; (2) DK-CNN outperforms other state-of-the-art approaches in terms of complication prediction performance in all 7 subcohorts; and (3) the ablation studies show that the full model achieves the highest nDCG compared with other 2 reduced models. DISCUSSION AND CONCLUSIONS DK-CNN is a deep learning-based approach which incorporates domain knowledge and patient trajectory data to retrieve clinically similar patients. It can be used to assist physicians who may refer to the outcomes and past treatments of similar patients as a guide for choosing an effective treatment for patients.
Collapse
Affiliation(s)
- Ronald Wihal Oei
- Corresponding Author: Ronald Wihal Oei, MBBS, Institute of Data Science, National University of Singapore, Innovation 4.0, #04-06, 3 Research Link, 117602 Singapore;
| | - Wynne Hsu
- Institute of Data Science, National University of Singapore, Singapore,School of Computing, National University of Singapore, Singapore
| | - Mong Li Lee
- Institute of Data Science, National University of Singapore, Singapore,School of Computing, National University of Singapore, Singapore
| | | |
Collapse
|
4
|
Memarzadeh H, Ghadiri N, Samwald M, Lotfi Shahreza M. A study into patient similarity through representation learning from medical records. Knowl Inf Syst 2022. [DOI: 10.1007/s10115-022-01740-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
5
|
Daniel C, Bellamine A, Kalra D. Key Contributions in Clinical Research Informatics. Yearb Med Inform 2021; 30:233-238. [PMID: 34479395 PMCID: PMC8416193 DOI: 10.1055/s-0041-1726514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Objectives:
To summarize key contributions to current research in the field of Clinical Research Informatics (CRI) and to select best papers published in 2020.
Method:
A bibliographic search using a combination of Medical Subject Headings (MeSH) descriptors and free-text terms on CRI was performed using PubMed, followed by a double-blind review in order to select a list of candidate best papers to be then peer-reviewed by external reviewers. After peer-review ranking, a consensus meeting between two section editors and the editorial team was organized to finally conclude on the selected four best papers.
Results:
Among the 877 papers published in 2020 and returned by the search, there were four best papers selected. The first best paper describes a method for mining temporal sequences from clinical documents to infer disease trajectories and enhancing high-throughput phenotyping. The authors of the second best paper demonstrate that the generation of synthetic Electronic Health Record (EHR) data through Generative Adversarial Networks (GANs) could be substantially improved by more appropriate training and evaluation criteria. The third best paper offers an efficient advance on methods to detect adverse drug events by computer-assisting expert reviewers with annotated candidate mentions in clinical documents. The large-scale data quality assessment study reported by the fourth best paper has clinical research informatics implications, in terms of the trustworthiness of inferences made from analysing electronic health records.
Conclusions:
The most significant research efforts in the CRI field are currently focusing on data science with active research in the development and evaluation of Artificial Intelligence/Machine Learning (AI/ML) algorithms based on ever more intensive use of real-world data and especially EHR real or synthetic data. A major lesson that the coronavirus disease 2019 (COVID-19) pandemic has already taught the scientific CRI community is that timely international high-quality data-sharing and collaborative data analysis is absolutely vital to inform policy decisions.
Collapse
Affiliation(s)
- Christel Daniel
- Information Technology Department, AP-HP, F-75012 Paris, France.,Sorbonne University, University Paris 13, Sorbonne Paris Cité, INSERM UMR_S 1142, LIMICS, F-75006 Paris, France
| | - Ali Bellamine
- Information Technology Department, AP-HP, F-75012 Paris, France
| | | | | |
Collapse
|
6
|
Oei RW, Fang HSA, Tan WY, Hsu W, Lee ML, Tan NC. Using Domain Knowledge and Data-Driven Insights for Patient Similarity Analytics. J Pers Med 2021; 11:jpm11080699. [PMID: 34442343 PMCID: PMC8398126 DOI: 10.3390/jpm11080699] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 07/15/2021] [Accepted: 07/21/2021] [Indexed: 12/23/2022] Open
Abstract
Patient similarity analytics has emerged as an essential tool to identify cohorts of patients who have similar clinical characteristics to some specific patient of interest. In this study, we propose a patient similarity measure called D3K that incorporates domain knowledge and data-driven insights. Using the electronic health records (EHRs) of 169,434 patients with either diabetes, hypertension or dyslipidaemia (DHL), we construct patient feature vectors containing demographics, vital signs, laboratory test results, and prescribed medications. We discretize the variables of interest into various bins based on domain knowledge and make the patient similarity computation to be aligned with clinical guidelines. Key findings from this study are: (1) D3K outperforms baseline approaches in all seven sub-cohorts; (2) our domain knowledge-based binning strategy outperformed the traditional percentile-based binning in all seven sub-cohorts; (3) there is substantial agreement between D3K and physicians (κ = 0.746), indicating that D3K can be applied to facilitate shared decision making. This is the first study to use patient similarity analytics on a cardiometabolic syndrome-related dataset sourced from medical institutions in Singapore. We consider patient similarity among patient cohorts with the same medical conditions to develop localized models for personalized decision support to improve the outcomes of a target patient.
Collapse
Affiliation(s)
- Ronald Wihal Oei
- Institute of Data Science, National University of Singapore, Singapore 117602, Singapore; (W.-Y.T.); (W.H.); (M.-L.L.)
- Correspondence:
| | - Hao Sen Andrew Fang
- SingHealth Polyclinics, SingHealth, Singapore 150167, Singapore; (H.S.A.F.); (N.-C.T.)
| | - Wei-Ying Tan
- Institute of Data Science, National University of Singapore, Singapore 117602, Singapore; (W.-Y.T.); (W.H.); (M.-L.L.)
| | - Wynne Hsu
- Institute of Data Science, National University of Singapore, Singapore 117602, Singapore; (W.-Y.T.); (W.H.); (M.-L.L.)
- School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Mong-Li Lee
- Institute of Data Science, National University of Singapore, Singapore 117602, Singapore; (W.-Y.T.); (W.H.); (M.-L.L.)
- School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Ngiap-Chuan Tan
- SingHealth Polyclinics, SingHealth, Singapore 150167, Singapore; (H.S.A.F.); (N.-C.T.)
| |
Collapse
|
7
|
Huynh PK, Setty A, Phan H, Le TQ. Probabilistic domain-knowledge modeling of disorder pathogenesis for dynamics forecasting of acute onset. Artif Intell Med 2021; 115:102056. [PMID: 34001316 PMCID: PMC8493977 DOI: 10.1016/j.artmed.2021.102056] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 03/01/2021] [Accepted: 03/22/2021] [Indexed: 11/18/2022]
Abstract
Disease pathogenesis, a type of domain knowledge about biological mechanisms leading to diseases, has not been adequately encoded in machine-learning-based medical diagnostic models because of the inter-patient variabilities and complex dependencies of the underlying pathogenetic mechanisms. We propose 1) a novel pathogenesis probabilistic graphical model (PPGM) to quantify the dynamics underpinning patient-specific data and pathogenetic domain knowledge, 2) a Bayesian-based inference paradigm to answer the medical queries and forecast acute onsets. The PPGM model consists of two components: a Bayesian network of patient attributes and a temporal model of pathogenetic mechanisms. The model structure was reconstructed from expert knowledge elicitation, and its parameters were estimated using Variational Expectation-Maximization algorithms. We benchmarked our model with two well-established hidden Markov models (HMMs) - Input-output HMM (IO-HMM) and Switching Auto-Regressive HMM (SAR-HMM) - to evaluate the computational costs, forecasting performance, and execution time. Two case studies on Obstructive Sleep Apnea (OSA) and Paroxysmal Atrial Fibrillation (PAF) were used to validate the model. While the performance of the parameter learning step was equivalent to those of IO-HMM and SAR-HMM models, our model forecasting ability was outperforming those two models. The merits of the PPGM model are its representation capability to capture the dynamics of pathogenesis and perform medical inferences and its interpretability for physicians. The model has been used to perform medical queries and forecast the acute onset of OSA and PAF. Additional applications of the model include prognostic healthcare and preventive personalized treatments.
Collapse
Affiliation(s)
- Phat K Huynh
- Department of Industrial and Manufacturing Engineering, North Dakota State University at Fargo, ND, USA
| | | | - Hao Phan
- Pham Ngoc Thach University of Medicine at Ho Chi Minh City, Viet Nam
| | - Trung Q Le
- Department of Industrial and Manufacturing Engineering, North Dakota State University at Fargo, ND, USA; Department of Biomedical Engineering, North Dakota State University at Fargo, ND, USA.
| |
Collapse
|
8
|
Chu J, Chen J, Chen X, Dong W, Shi J, Huang Z. Knowledge-aware multi-center clinical dataset adaptation: Problem, method, and application. J Biomed Inform 2021; 115:103710. [PMID: 33581323 DOI: 10.1016/j.jbi.2021.103710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 02/05/2021] [Accepted: 02/06/2021] [Indexed: 11/30/2022]
Abstract
Adaptable utilization of clinical data collected from multiple centers, prompted by the need to overcome the shifts between the dataset distributions, and exploit these different datasets for potential clinical applications, has received significant attention in recent years. In this study, we propose a novel approach to this task by infusing an external knowledge graph (KG) into multi-center clinical data mining. Specifically, we propose an adversarial learning model to capture shared patient feature representations from multi-center heterogeneous clinical datasets, and employ an external KG to enrich the semantics of the patient sample by providing both clinical center-specific and center-general knowledge features, which are trained with a graph convolutional autoencoder. We evaluate the proposed model on a real clinical dataset extracted from the general cardiology wards of a Chinese hospital and a well-known public clinical dataset (MIMIC III, pertaining to ICU clinical settings) for the task of predicting acute kidney injury in patients with heart failure. The achieved experimental results demonstrate the efficacy of our proposed model.
Collapse
Affiliation(s)
- Jiebin Chu
- College of Biomedical Engineering and Instrument Science, Zhejiang University, China
| | - Jinbiao Chen
- College of Biomedical Engineering and Instrument Science, Zhejiang University, China
| | - Xiaofang Chen
- College of Biomedical Engineering and Instrument Science, Zhejiang University, China
| | - Wei Dong
- Department of Cardiology, Chinese PLA General Hospital, China
| | - Jinlong Shi
- Department of Medical Innovation Research, Medical Big Data Center, Chinese PLA General Hospital, China
| | - Zhengxing Huang
- College of Biomedical Engineering and Instrument Science, Zhejiang University, China.
| |
Collapse
|
9
|
Chen T, Keravnou-Papailiou E, Antoniou G. Medical analytics for healthcare intelligence - Recent advances and future directions. Artif Intell Med 2021; 112:102009. [PMID: 33581829 DOI: 10.1016/j.artmed.2021.102009] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 12/30/2020] [Indexed: 12/16/2022]
Affiliation(s)
- Tianhua Chen
- Department of Computer Science, University of Huddersfield, Huddersfield, UK.
| | | | - Grigoris Antoniou
- Department of Computer Science, University of Huddersfield, Huddersfield, UK
| |
Collapse
|