1
|
Liu M, Li S, Yuan H, Ong MEH, Ning Y, Xie F, Saffari SE, Shang Y, Volovici V, Chakraborty B, Liu N. Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques. Artif Intell Med 2023; 142:102587. [PMID: 37316097 DOI: 10.1016/j.artmed.2023.102587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 04/08/2023] [Accepted: 05/16/2023] [Indexed: 06/16/2023]
Abstract
OBJECTIVE The proper handling of missing values is critical to delivering reliable estimates and decisions, especially in high-stakes fields such as clinical research. In response to the increasing diversity and complexity of data, many researchers have developed deep learning (DL)-based imputation techniques. We conducted a systematic review to evaluate the use of these techniques, with a particular focus on the types of data, intending to assist healthcare researchers from various disciplines in dealing with missing data. MATERIALS AND METHODS We searched five databases (MEDLINE, Web of Science, Embase, CINAHL, and Scopus) for articles published prior to February 8, 2023 that described the use of DL-based models for imputation. We examined selected articles from four perspectives: data types, model backbones (i.e., main architectures), imputation strategies, and comparisons with non-DL-based methods. Based on data types, we created an evidence map to illustrate the adoption of DL models. RESULTS Out of 1822 articles, a total of 111 were included, of which tabular static data (29%, 32/111) and temporal data (40%, 44/111) were the most frequently investigated. Our findings revealed a discernible pattern in the choice of model backbones and data types, for example, the dominance of autoencoder and recurrent neural networks for tabular temporal data. The discrepancy in imputation strategy usage among data types was also observed. The "integrated" imputation strategy, which solves the imputation task simultaneously with downstream tasks, was most popular for tabular temporal data (52%, 23/44) and multi-modal data (56%, 5/9). Moreover, DL-based imputation methods yielded a higher level of imputation accuracy than non-DL methods in most studies. CONCLUSION The DL-based imputation models are a family of techniques, with diverse network structures. Their designation in healthcare is usually tailored to data types with different characteristics. Although DL-based imputation models may not be superior to conventional approaches across all datasets, it is highly possible for them to achieve satisfactory results for a particular data type or dataset. There are, however, still issues with regard to portability, interpretability, and fairness associated with current DL-based imputation models.
Collapse
Affiliation(s)
- Mingxuan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Siqi Li
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Han Yuan
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Marcus Eng Hock Ong
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Department of Emergency Medicine, Singapore General Hospital, Singapore
| | - Yilin Ning
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Feng Xie
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore; Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore
| | - Seyed Ehsan Saffari
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore; Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore
| | - Yuqing Shang
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Victor Volovici
- Department of Neurosurgery, Erasmus MC University Medical Center, Rotterdam, the Netherlands
| | - Bibhas Chakraborty
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore; Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Department of Statistics and Data Science, National University of Singapore, Singapore; Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore; Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; SingHealth AI Office, Singapore Health Services, Singapore; Institute of Data Science, National University of Singapore, Singapore.
| |
Collapse
|
2
|
Accelerating UN Sustainable Development Goals with AI-Driven Technologies: A Systematic Literature Review of Women's Healthcare. Healthcare (Basel) 2023; 11:healthcare11030401. [PMID: 36766976 PMCID: PMC9914215 DOI: 10.3390/healthcare11030401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/24/2023] [Accepted: 01/30/2023] [Indexed: 02/04/2023] Open
Abstract
In this paper, we critically examine if the contributions of artificial intelligence (AI) in healthcare adequately represent the realm of women's healthcare. This would be relevant for achieving and accelerating the gender equality and health sustainability goals (SDGs) defined by the United Nations. Following a systematic literature review (SLR), we examine if AI applications in health and biomedicine adequately represent women's health in the larger scheme of healthcare provision. Our findings are divided into clusters based on thematic markers for women's health that are commensurate with the hypotheses that AI-driven technologies in women's health still remain underrepresented, but that emphasis on its future deployment can increase efficiency in informed health choices and be particularly accessible to women in small or underrepresented communities. Contemporaneously, these findings can assist and influence the shape of governmental policies, accessibility, and the regulatory environment in achieving the SDGs. On a larger scale, in the near future, we will extend the extant literature on applications of AI-driven technologies in health SDGs and set the agenda for future research.
Collapse
|
3
|
Kline A, Wang H, Li Y, Dennis S, Hutch M, Xu Z, Wang F, Cheng F, Luo Y. Multimodal machine learning in precision health: A scoping review. NPJ Digit Med 2022; 5:171. [PMID: 36344814 PMCID: PMC9640667 DOI: 10.1038/s41746-022-00712-8] [Citation(s) in RCA: 83] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 10/14/2022] [Indexed: 11/09/2022] Open
Abstract
Machine learning is frequently being leveraged to tackle problems in the health sector including utilization for clinical decision-support. Its use has historically been focused on single modal data. Attempts to improve prediction and mimic the multimodal nature of clinical expert decision-making has been met in the biomedical field of machine learning by fusing disparate data. This review was conducted to summarize the current studies in this field and identify topics ripe for future research. We conducted this review in accordance with the PRISMA extension for Scoping Reviews to characterize multi-modal data fusion in health. Search strings were established and used in databases: PubMed, Google Scholar, and IEEEXplore from 2011 to 2021. A final set of 128 articles were included in the analysis. The most common health areas utilizing multi-modal methods were neurology and oncology. Early fusion was the most common data merging strategy. Notably, there was an improvement in predictive performance when using data fusion. Lacking from the papers were clear clinical deployment strategies, FDA-approval, and analysis of how using multimodal approaches from diverse sub-populations may improve biases and healthcare disparities. These findings provide a summary on multimodal data fusion as applied to health diagnosis/prognosis problems. Few papers compared the outputs of a multimodal approach with a unimodal prediction. However, those that did achieved an average increase of 6.4% in predictive accuracy. Multi-modal machine learning, while more robust in its estimations over unimodal methods, has drawbacks in its scalability and the time-consuming nature of information concatenation.
Collapse
Affiliation(s)
- Adrienne Kline
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Hanyin Wang
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Yikuan Li
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Saya Dennis
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Meghan Hutch
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Zhenxing Xu
- Department of Population Health Sciences, Cornell University, New York, 10065, NY, USA
| | - Fei Wang
- Department of Population Health Sciences, Cornell University, New York, 10065, NY, USA
| | - Feixiong Cheng
- Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, 44195, OH, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA.
| |
Collapse
|
4
|
Huang Y, Zheng Z, Ma M, Xin X, Liu H, Fei X, Wei L, Chen H. Improving Performance of Outcome Prediction for In-patients with Acute Myocardial Infarction Based on Embedding Representation Learned from Electronic Medical Records: Development and Validation Study (Preprint). J Med Internet Res 2022; 24:e37486. [PMID: 35921141 PMCID: PMC9386580 DOI: 10.2196/37486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 06/02/2022] [Accepted: 07/18/2022] [Indexed: 11/18/2022] Open
Abstract
Background The widespread secondary use of electronic medical records (EMRs) promotes health care quality improvement. Representation learning that can automatically extract hidden information from EMR data has gained increasing attention. Objective We aimed to propose a patient representation with more feature associations and task-specific feature importance to improve the outcome prediction performance for inpatients with acute myocardial infarction (AMI). Methods Medical concepts, including patients’ age, gender, disease diagnoses, laboratory tests, structured radiological features, procedures, and medications, were first embedded into real-value vectors using the improved skip-gram algorithm, where concepts in the context windows were selected by feature association strengths measured by association rule confidence. Then, each patient was represented as the sum of the feature embeddings weighted by the task-specific feature importance, which was applied to facilitate predictive model prediction from global and local perspectives. We finally applied the proposed patient representation into mortality risk prediction for 3010 and 1671 AMI inpatients from a public data set and a private data set, respectively, and compared it with several reference representation methods in terms of the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and F1-score. Results Compared with the reference methods, the proposed embedding-based representation showed consistently superior predictive performance on the 2 data sets, achieving mean AUROCs of 0.878 and 0.973, AUPRCs of 0.220 and 0.505, and F1-scores of 0.376 and 0.674 for the public and private data sets, respectively, while the greatest AUROCs, AUPRCs, and F1-scores among the reference methods were 0.847 and 0.939, 0.196 and 0.283, and 0.344 and 0.361 for the public and private data sets, respectively. Feature importance integrated in patient representation reflected features that were also critical in prediction tasks and clinical practice. Conclusions The introduction of feature associations and feature importance facilitated an effective patient representation and contributed to prediction performance improvement and model interpretation.
Collapse
Affiliation(s)
- Yanqun Huang
- School of Biomedical Engineering, Capital Medical University, Beijing, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China
| | - Zhimin Zheng
- School of Biomedical Engineering, Capital Medical University, Beijing, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China
| | - Moxuan Ma
- School of Biomedical Engineering, Capital Medical University, Beijing, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China
| | - Xin Xin
- School of Biomedical Engineering, Capital Medical University, Beijing, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China
| | - Honglei Liu
- School of Biomedical Engineering, Capital Medical University, Beijing, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China
| | - Xiaolu Fei
- Information Center, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Lan Wei
- Information Center, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Hui Chen
- School of Biomedical Engineering, Capital Medical University, Beijing, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China
| |
Collapse
|
5
|
Razzaq M, Clément F, Yvinec R. An overview of deep learning applications in precocious puberty and thyroid dysfunction. Front Endocrinol (Lausanne) 2022; 13:959546. [PMID: 36339395 PMCID: PMC9632447 DOI: 10.3389/fendo.2022.959546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 09/16/2022] [Indexed: 11/24/2022] Open
Abstract
In the last decade, deep learning methods have garnered a great deal of attention in endocrinology research. In this article, we provide a summary of current deep learning applications in endocrine disorders caused by either precocious onset of adult hormone or abnormal amount of hormone production. To give access to the broader audience, we start with a gentle introduction to deep learning and its most commonly used architectures, and then we focus on the research trends of deep learning applications in thyroid dysfunction classification and precocious puberty diagnosis. We highlight the strengths and weaknesses of various approaches and discuss potential solutions to different challenges. We also go through the practical considerations useful for choosing (and building) the deep learning model, as well as for understanding the thought process behind different decisions made by these models. Finally, we give concluding remarks and future directions.
Collapse
Affiliation(s)
- Misbah Razzaq
- PRC, INRAE, CNRS, Université de Tours, Nouzilly, France
- *Correspondence: Misbah Razzaq,
| | - Frédérique Clément
- Université Paris-Saclay, Inria, Centre Inria de Saclay, Palaiseau, France
| | - Romain Yvinec
- PRC, INRAE, CNRS, Université de Tours, Nouzilly, France
- Université Paris-Saclay, Inria, Centre Inria de Saclay, Palaiseau, France
| |
Collapse
|