1
|
Zhou PY, Wong AKC. Explanation and prediction of clinical data with imbalanced class distribution based on pattern discovery and disentanglement. BMC Med Inform Decis Mak 2021; 21:16. [PMID: 33422088 PMCID: PMC7796578 DOI: 10.1186/s12911-020-01356-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Accepted: 11/30/2020] [Indexed: 11/10/2022] Open
Abstract
Background Statistical data analysis, especially the advanced machine learning (ML) methods, have attracted considerable interest in clinical practices. We are looking for interpretability of the diagnostic/prognostic results that will bring confidence to doctors, patients and their relatives in therapeutics and clinical practice. When datasets are imbalanced in diagnostic categories, we notice that the ordinary ML methods might produce results overwhelmed by the majority classes diminishing prediction accuracy. Hence, it needs methods that could produce explicit transparent and interpretable results in decision-making, without sacrificing accuracy, even for data with imbalanced groups. Methods In order to interpret the clinical patterns and conduct diagnostic prediction of patients with high accuracy, we develop a novel method, Pattern Discovery and Disentanglement for Clinical Data Analysis (cPDD), which is able to discover patterns (correlated traits/indicants) and use them to classify clinical data even if the class distribution is imbalanced. In the most general setting, a relational dataset is a large table such that each column represents an attribute (trait/indicant), and each row contains a set of attribute values (AVs) of an entity (patient). Compared to the existing pattern discovery approaches, cPDD can discover a small succinct set of statistically significant high-order patterns from clinical data for interpreting and predicting the disease class of the patients even with groups small and rare. Results Experiments on synthetic and thoracic clinical dataset showed that cPDD can 1) discover a smaller set of succinct significant patterns compared to other existing pattern discovery methods; 2) allow the users to interpret succinct sets of patterns coming from uncorrelated sources, even the groups are rare/small; and 3) obtain better performance in prediction compared to other interpretable classification approaches. Conclusions In conclusion, cPDD discovers fewer patterns with greater comprehensive coverage to improve the interpretability of patterns discovered. Experimental results on synthetic data validated that cPDD discovers all patterns implanted in the data, displays them precisely and succinctly with statistical support for interpretation and prediction, a capability which the traditional ML methods lack. The success of cPDD as a novel interpretable method in solving the imbalanced class problem shows its great potential to clinical data analysis for years to come.
Collapse
Affiliation(s)
- Pei-Yuan Zhou
- Systems Design Engineering, University of Waterloo, Waterloo, Ontario, Canada.
| | - Andrew K C Wong
- Systems Design Engineering, University of Waterloo, Waterloo, Ontario, Canada
| |
Collapse
|
2
|
Zhu X, Song B, Shi F, Chen Y, Hu R, Gan J, Zhang W, Li M, Wang L, Gao Y, Shan F, Shen D. Joint prediction and time estimation of COVID-19 developing severe symptoms using chest CT scan. Med Image Anal 2021; 67:101824. [PMID: 33091741 PMCID: PMC7547024 DOI: 10.1016/j.media.2020.101824] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2020] [Revised: 08/23/2020] [Accepted: 09/25/2020] [Indexed: 02/08/2023]
Abstract
With the rapidly worldwide spread of Coronavirus disease (COVID-19), it is of great importance to conduct early diagnosis of COVID-19 and predict the conversion time that patients possibly convert to the severe stage, for designing effective treatment plans and reducing the clinicians' workloads. In this study, we propose a joint classification and regression method to determine whether the patient would develop severe symptoms in the later time formulated as a classification task, and if yes, the conversion time will be predicted formulated as a classification task. To do this, the proposed method takes into account 1) the weight for each sample to reduce the outliers' influence and explore the problem of imbalance classification, and 2) the weight for each feature via a sparsity regularization term to remove the redundant features of the high-dimensional data and learn the shared information across two tasks, i.e., the classification and the regression. To our knowledge, this study is the first work to jointly predict the disease progression and the conversion time, which could help clinicians to deal with the potential severe cases in time or even save the patients' lives. Experimental analysis was conducted on a real data set from two hospitals with 408 chest computed tomography (CT) scans. Results show that our method achieves the best classification (e.g., 85.91% of accuracy) and regression (e.g., 0.462 of the correlation coefficient) performance, compared to all comparison methods. Moreover, our proposed method yields 76.97% of accuracy for predicting the severe cases, 0.524 of the correlation coefficient, and 0.55 days difference for the conversion time.
Collapse
Affiliation(s)
- Xiaofeng Zhu
- Center for Future Media and school of computer science and technology, University of Electronic Science and Technology of China, Chengdu 611731, China.
| | - Bin Song
- Department of Radiology, Sichuan University West China Hospital, Chengdu 610041, China.
| | - Feng Shi
- Department of Research and Development, Shanghai United Imaging Intelligence Co., Ltd., Shanghai 200232, China
| | - Yanbo Chen
- Department of Research and Development, Shanghai United Imaging Intelligence Co., Ltd., Shanghai 200232, China
| | - Rongyao Hu
- Center for Future Media and school of computer science and technology, University of Electronic Science and Technology of China, Chengdu 611731, China; School of Natural and Computational Sciences, Massey University Auckland, Auckland 0745, New Zealand
| | - Jiangzhang Gan
- Center for Future Media and school of computer science and technology, University of Electronic Science and Technology of China, Chengdu 611731, China; School of Natural and Computational Sciences, Massey University Auckland, Auckland 0745, New Zealand
| | - Wenhai Zhang
- Department of Research and Development, Shanghai United Imaging Intelligence Co., Ltd., Shanghai 200232, China
| | - Man Li
- Department of Research and Development, Shanghai United Imaging Intelligence Co., Ltd., Shanghai 200232, China
| | - Liye Wang
- Department of Research and Development, Shanghai United Imaging Intelligence Co., Ltd., Shanghai 200232, China
| | - Yaozong Gao
- Department of Research and Development, Shanghai United Imaging Intelligence Co., Ltd., Shanghai 200232, China
| | - Fei Shan
- Department of Radiology, Shanghai Public Health Clinical Center, Fudan University, Shanghai 201508, China.
| | - Dinggang Shen
- Department of Research and Development, Shanghai United Imaging Intelligence Co., Ltd., Shanghai 200232, China; School of Biomedical Engineering, ShanghaiTech University, Shanghai 201210, China; Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, Republic of Korea.
| |
Collapse
|