1
|
Kim Y, Choi W, Choi W, Ko G, Han S, Kim HC, Kim D, Lee DG, Shin DW, Lee Y. A machine learning approach using conditional normalizing flow to address extreme class imbalance problems in personal health records. BioData Min 2024; 17:14. [PMID: 38796471 PMCID: PMC11127363 DOI: 10.1186/s13040-024-00366-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 05/21/2024] [Indexed: 05/28/2024] Open
Abstract
BACKGROUND Supervised machine learning models have been widely used to predict and get insight into diseases by classifying patients based on personal health records. However, a class imbalance is an obstacle that disrupts the training of the models. In this study, we aimed to address class imbalance with a conditional normalizing flow model, one of the deep-learning-based semi-supervised models for anomaly detection. It is the first introduction of the normalizing flow algorithm for tabular biomedical data. METHODS We collected personal health records from South Korean citizens (n = 706), featuring genetic data obtained from direct-to-customer service (microarray chip), medical health check-ups, and lifestyle log data. Based on the health check-up data, six chronic diseases were labeled (obesity, diabetes, hypertriglyceridemia, dyslipidemia, liver dysfunction, and hypertension). After preprocessing, supervised classification models and semi-supervised anomaly detection models, including conditional normalizing flow, were evaluated for the classification of diabetes, which had extreme target imbalance (about 2%), based on AUROC and AUPRC. In addition, we evaluated their performance under the assumption of insufficient collection for patients with other chronic diseases by undersampling disease-affected samples. RESULTS While LightGBM (the best-performing model among supervised classification models) showed AUPRC 0.16 and AUROC 0.82, conditional normalizing flow achieved AUPRC 0.34 and AUROC 0.83 during fifty evaluations of the classification of diabetes, whose base rate was very low, at 0.02. Moreover, conditional normalizing flow performed better than the supervised model under a few disease-affected data numbers for the other five chronic diseases - obesity, hypertriglyceridemia, dyslipidemia, liver dysfunction, and hypertension. For example, while LightGBM performed AUPRC 0.20 and AUROC 0.75, conditional normalizing flow showed AUPRC 0.30 and AUROC 0.74 when predicting obesity, while undersampling disease-affected samples (positive undersampling) lowered the base rate to 0.02. CONCLUSIONS Our research suggests the utility of conditional normalizing flow, particularly when the available cases are limited, for predicting chronic diseases using personal health records. This approach offers an effective solution to deal with sparse data and extreme class imbalances commonly encountered in the biomedical context.
Collapse
Affiliation(s)
- Yeongmin Kim
- School of Computing, KAIST, Daejeon, Republic of Korea
| | - Wongyung Choi
- College of Veterinary Medicine and Research Institute for Veterinary Science, Seoul National University, Seoul, Republic of Korea
| | - Woojeong Choi
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Grace Ko
- Department of Computer Science, Georgetown University, Washington, D.C, USA
| | - Seonggyun Han
- Department of Psychiatry & Huntsman Mental Health Institute, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Hwan-Cheol Kim
- Department of Occupational and Environmental Medicine, College of Medicine, Inha University, Incheon, Republic of Korea
| | - Dokyoon Kim
- Department of Biostatistcs, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Dong-Gi Lee
- Department of Biostatistcs, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Dong Wook Shin
- Department of Clinical Research Design and Evaluation & Department of Digital Health, Samsung Advanced Institute for Health Science and Technology (SAIHST), Sungkyunkwan University, Seoul, Republic of Korea
- Department of Family Medicine & Supportive Care Center, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Younghee Lee
- College of Veterinary Medicine and Research Institute for Veterinary Science, Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
2
|
Sohn B, Won SY. Quality assessment of stroke radiomics studies: Promoting clinical application. Eur J Radiol 2023; 161:110752. [PMID: 36878154 DOI: 10.1016/j.ejrad.2023.110752] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 02/13/2023] [Accepted: 02/20/2023] [Indexed: 03/06/2023]
Abstract
PURPOSE To evaluate the quality of radiomics studies on stroke using a radiomics quality score (RQS), Minimum Information for Medial AI reporting (MINIMAR) and Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) to promote clinical application. METHODS PubMed MEDLINE and Embase were searched to identify radiomics studies on stroke. Of 464 articles, 52 relevant original research articles were included. The RQS, MINIMAR and TRIPOD were scored to evaluate the quality of the studies by neuroradiologists. RESULTS Only four studies (7.7 %) performed external validation. The mean RQS was 3.2 of 36 (8.9 %), and the basic adherence rate was 24.9 %. The adherence rate was low for conducting phantom study (1.9 %), stating comparison to 'gold standard' (1.9 %), offering potential clinical utility (13.5 %) and performing cost-effectiveness analysis (1.9 %). None of the studies performed a test-retest, stated biologic correlation, conducted prospective studies, or opened codes and data to the public, resulting in low RQS. The total MINIMAR adherence rate was 47.4 %. The overall adherence rate for TRIPOD was 54.6 %, with low scores for reporting the title (2.0 %), key elements of the study setting (6.1 %), and explaining the sample size (2.0 %). CONCLUSIONS The overall radiomics reporting quality and reporting of published radiomics studies on stoke was suboptimal. More thorough validation and open data are needed to increase clinical applicability of radiomics studies.
Collapse
Affiliation(s)
- Beomseok Sohn
- Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea
| | - So Yeon Won
- Department of Radiology, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Korea.
| |
Collapse
|
3
|
Zhang S, Gao L, Kang B, Yu X, Zhang R, Wang X. Radiomics assessment of carotid intraplaque hemorrhage: detecting the vulnerable patients. Insights Imaging 2022; 13:200. [PMID: 36538100 PMCID: PMC9768061 DOI: 10.1186/s13244-022-01324-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 10/31/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Intraplaque hemorrhage (IPH), one of the key features of vulnerable plaques, has been shown to be associated with increased risk of stroke. The aim is to develop and validate a CT-based radiomics nomogram incorporating clinical factors and radiomics signature for the detection of IPH in carotid arteries. METHODS This retrospective study analyzed the patients with carotid plaques on CTA from January 2013 to January 2021 at two different institutions. Radiomics features were extracted from CTA images. Demographics and CT characteristics were evaluated to build a clinical factor model. A radiomics signature was constructed by the least absolute shrinkage and selection operator method. A radiomics nomogram combining the radiomics signature and independent clinical factors was constructed. The area under curves of three models were calculated by receiver operating characteristic analysis. RESULTS A total of 46 patients (mean age, 60.7 years ± 10.4 [standard deviation]; 36 men) with 106 carotid plaques were in the training set, and 18 patients (mean age, 61.4 years ± 10.1; 13 men) with 38 carotid plaques were in the external test sets. Stenosis was the independent clinical factor. Eight features were used to build the radiomics signature. The area under the curve (AUC) of the radiomics nomogram was significantly higher than that of the clinical factor model in both the training (p = 0.032) and external test (p = 0.039) sets. CONCLUSIONS A CT-based radiomics nomogram showed satisfactory performance in distinguishing carotid plaques with and without intraplaque hemorrhage.
Collapse
Affiliation(s)
- Shuai Zhang
- grid.410638.80000 0000 8910 6733The School of Medicine, Shandong First Medical University, No. 6699, Qingdao Road, Huaiyin District, Jinan, China
| | - Lin Gao
- grid.410638.80000 0000 8910 6733The School of Medicine, Shandong First Medical University, No. 6699, Qingdao Road, Huaiyin District, Jinan, China
| | - Bing Kang
- grid.460018.b0000 0004 1769 9639Department of Radiology, Shandong Provincial Hospital Affliated to Shandong First Medical University, No. 324 Jingwu Road, Jinan, 250021 China
| | - Xinxin Yu
- grid.460018.b0000 0004 1769 9639Department of Radiology, Shandong Provincial Hospital Affliated to Shandong First Medical University, No. 324 Jingwu Road, Jinan, 250021 China
| | - Ran Zhang
- Huiying Medical Technology Co. Ltd., 66 Xixiaokou Road, Haidian District, Beijing, China
| | - Ximing Wang
- grid.460018.b0000 0004 1769 9639Department of Radiology, Shandong Provincial Hospital Affliated to Shandong First Medical University, No. 324 Jingwu Road, Jinan, 250021 China
| |
Collapse
|
4
|
Economics of Artificial Intelligence in Healthcare: Diagnosis vs. Treatment. Healthcare (Basel) 2022; 10:healthcare10122493. [PMID: 36554017 PMCID: PMC9777836 DOI: 10.3390/healthcare10122493] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/03/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
Motivation: The price of medical treatment continues to rise due to (i) an increasing population; (ii) an aging human growth; (iii) disease prevalence; (iv) a rise in the frequency of patients that utilize health care services; and (v) increase in the price. Objective: Artificial Intelligence (AI) is already well-known for its superiority in various healthcare applications, including the segmentation of lesions in images, speech recognition, smartphone personal assistants, navigation, ride-sharing apps, and many more. Our study is based on two hypotheses: (i) AI offers more economic solutions compared to conventional methods; (ii) AI treatment offers stronger economics compared to AI diagnosis. This novel study aims to evaluate AI technology in the context of healthcare costs, namely in the areas of diagnosis and treatment, and then compare it to the traditional or non-AI-based approaches. Methodology: PRISMA was used to select the best 200 studies for AI in healthcare with a primary focus on cost reduction, especially towards diagnosis and treatment. We defined the diagnosis and treatment architectures, investigated their characteristics, and categorized the roles that AI plays in the diagnostic and therapeutic paradigms. We experimented with various combinations of different assumptions by integrating AI and then comparing it against conventional costs. Lastly, we dwell on three powerful future concepts of AI, namely, pruning, bias, explainability, and regulatory approvals of AI systems. Conclusions: The model shows tremendous cost savings using AI tools in diagnosis and treatment. The economics of AI can be improved by incorporating pruning, reduction in AI bias, explainability, and regulatory approvals.
Collapse
|
5
|
Ultrasonic Imaging of Cardiovascular Disease Based on Image Processor Analysis of Hard Plaque Characteristics. BIOMED RESEARCH INTERNATIONAL 2022; 2022:4304524. [PMID: 36277887 PMCID: PMC9584660 DOI: 10.1155/2022/4304524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 09/07/2022] [Accepted: 09/22/2022] [Indexed: 11/17/2022]
Abstract
Cardiovascular disease detection and analysis using ultrasonic imaging expels errors in manual clinical trials with precise outcomes. It requires a combination of smart computing systems and intelligent image processors. The disease characteristics are analyzed based on the configuration and precise tuning of the processing device. In this article, a characteristic extraction technique (CET) using knowledge learning (KL) is introduced to improve the analysis precision. The proposed method requires optimal selection of disease features and trained similar datasets for improving the characteristic extraction. The disease attributes and accuracy are identified using the standard knowledge update. The image and data features are segmented using the variable processor configuration to prevent false rates. The false rates due to unidentifiable plaque characteristics result in weak knowledge updates. Therefore, the segmentation and data extraction are unanimously performed to prevent feature misleads. The knowledge base is updated using the extracted and identified plaque characteristics for consecutive image analysis. The processor configurations are manageable using the updated knowledge and characteristics to improve precision. The proposed method is verified using precision, characteristic update, training rate, extraction ratio, and time factor.
Collapse
|
6
|
Chen Z, Yang M, Wen Y, Jiang S, Liu W, Huang H. Prediction of atherosclerosis using machine learning based on operations research. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:4892-4910. [PMID: 35430846 DOI: 10.3934/mbe.2022229] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
BACKGROUND Atherosclerosis is one of the major reasons for cardiovascular disease including coronary heart disease, cerebral infarction and peripheral vascular disease. Atherosclerosis has no obvious symptoms in its early stages, so the key to the treatment of atherosclerosis is early intervention of risk factors. Machine learning methods have been used to predict atherosclerosis, but the presence of strong causal relationships between features can lead to extremely high levels of information redundancy, which can affect the effectiveness of prediction systems. OBJECTIVE We aim to combine statistical analysis and machine learning methods to reduce information redundancy and further improve the accuracy of disease diagnosis. METHODS We cleaned and collated the relevant data obtained from the retrospective study at Affiliated Hospital of Nanjing University of Chinese Medicine through data analysis. First, some features that with too many missing values are filtered out of the 34 features, leaving 25 features. 49% of the samples were categorized as the atherosclerosis risk group while the rest 51% as the control group without atherosclerosis risk under the guidance of relevant experts. We compared the prediction results of a single indicator that had been medically proven to be highly correlated with atherosclerosis with the prediction results of multiple features to fully demonstrate the effect of feature information redundancy on the prediction results. Then the features that could distinguish whether have atherosclerosis risk or not were retained by statistical tests, leaving 20 features. To reduce the information redundancy between features, after drawing inspiration from graph theory, machine learning combined with optimal correlation distances was then used to screen out 15 significant features, and the prediction models were evaluated under the 15 features. Finally, the information of the 5 screened-out non-significant features was fully utilized by ensemble learning to improve the prediction superiority for atherosclerosis. RESULTS Area Under the Receiver Operating Characteristic (ROC) Curve (AUC), which is used to measure the predictive performance of the model, was 0.84035 and Kolmogorov-Smirnov (KS) value was 0.646. After feature selection model based on optimal correlation distance, the AUC value was 0.88268 and the KS value was 0.688, both of which were improved by about 0.04. Finally, after ensemble learning, the AUC value of the model was further improved by 0.01369 to 0.89637. CONCLUSIONS The optimal distance feature screening model proposed in this paper improves the performance of atherosclerosis prediction models in terms of both prediction accuracy and AUC metrics. Code and models are available at https://github.com/Cesartwothousands/Prediction-of-Atherosclerosis.
Collapse
Affiliation(s)
- Zihan Chen
- Changwang School of Honors, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Minhui Yang
- School of Electronics and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Yuhang Wen
- School of Teacher Education, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Songyan Jiang
- School of Teacher Education, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Wenjun Liu
- School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Hui Huang
- Department of Ultrasound, Affiliated Hospital of Nanjing University of CM, Nanjing 210029, China
| |
Collapse
|
7
|
Wu X, Chen H, Li T, Wan J. Semi-supervised feature selection with minimal redundancy based on local adaptive. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02288-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
8
|
Zhao Y, Spence JD, Chiu B. Three-dimensional ultrasound assessment of effects of therapies on carotid atherosclerosis using vessel wall thickness maps. ULTRASOUND IN MEDICINE & BIOLOGY 2021; 47:2502-2513. [PMID: 34148714 DOI: 10.1016/j.ultrasmedbio.2021.04.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 03/13/2021] [Accepted: 04/14/2021] [Indexed: 06/12/2023]
Abstract
We present a new method for assessing the effects of therapies on atherosclerosis, by measuring the weighted average of carotid vessel-wall-plus-plaque thickness change (ΔVWT¯Weighted) in 120 patients randomized to pomegranate juice/extract versus placebo. Three-dimensional ultrasound images were acquired at baseline and one year after. Three-dimensional VWT maps were reconstructed and then projected onto a carotid template to obtain two-dimensional VWT maps. Anatomic correspondence on the two-dimensional VWT maps was optimized to reduce misalignment for the same subject and across subjects. A weight was computed at each point on the two-dimensional VWT map to highlight anatomic locations likely to exhibit plaque progression/regression, resulting in ΔVWT¯Weighted for each subject. The weighted average of VWT-Change measured from the two-dimensional VWT maps with correspondence alignment (ΔVWT¯Weighted,MDL) detected a significant difference between the pomegranate and placebo groups (P = 0.008). This method improves the cost-effectiveness of proof-of-concept studies involving new therapies for atherosclerosis.
Collapse
Affiliation(s)
- Yuan Zhao
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong
| | - J David Spence
- Stroke Prevention & Atherosclerosis Research Centre, Robarts Research Institute, London, Ontario, Canada
| | - Bernard Chiu
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong.
| |
Collapse
|
9
|
Su L, Liu Y, Wang M, Li A. Semi-HIC: A novel semi-supervised deep learning method for histopathological image classification. Comput Biol Med 2021; 137:104788. [PMID: 34461503 DOI: 10.1016/j.compbiomed.2021.104788] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 08/17/2021] [Accepted: 08/18/2021] [Indexed: 11/30/2022]
Abstract
Histopathological images provide a gold standard for cancer recognition and diagnosis. Existing approaches for histopathological image classification are supervised learning methods that demand a large amount of labeled data to obtain satisfying performance, which have to face the challenge of limited data annotation due to prohibitive time cost. To circumvent this shortage, a promising strategy is to design semi-supervised learning methods. Recently, a novel semi-supervised approach called Learning by Association (LA) is proposed, which achieves promising performance in nature image classification. However, there are still great challenges in its application to histopathological image classification due to the wide inter-class similarity and intra-class heterogeneity in histopathological images. To address these issues, we propose a novel semi-supervised deep learning method called Semi-HIC for histopathological image classification. Particularly, we introduce a new semi-supervised loss function combining an association cycle consistency (ACC) loss and a maximal conditional association (MCA) loss, which can take advantage of a large number of unlabeled patches and address the problems of inter-class similarity and intra-class variation in histopathological images, and thereby remarkably improve classification performance for histopathological images. Besides, we employ an efficient network architecture with cascaded Inception blocks (CIBs) to learn rich and discriminative embeddings from patches. Experimental results on both the Bioimaging 2015 challenge dataset and the BACH dataset demonstrate our Semi-HIC method compares favorably with existing deep learning methods for histopathological image classification and consistently outperforms the semi-supervised LA method.
Collapse
Affiliation(s)
- Lei Su
- School of Information Science and Technology, University of Science and Technology of China, 443 Huangshan Road, Hefei, 230027, China.
| | - Yu Liu
- School of Information Science and Technology, University of Science and Technology of China, 443 Huangshan Road, Hefei, 230027, China.
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, 443 Huangshan Road, Hefei, 230027, China; Research Centers for Biomedical Engineering, University of Science and Technology of China, 443 Huangshan Road, Hefei, 230027, China.
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, 443 Huangshan Road, Hefei, 230027, China; Research Centers for Biomedical Engineering, University of Science and Technology of China, 443 Huangshan Road, Hefei, 230027, China.
| |
Collapse
|
10
|
Lin M, Wynne JF, Zhou B, Wang T, Lei Y, Curran WJ, Liu T, Yang X. Artificial intelligence in tumor subregion analysis based on medical imaging: A review. J Appl Clin Med Phys 2021; 22:10-26. [PMID: 34164913 PMCID: PMC8292694 DOI: 10.1002/acm2.13321] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 04/17/2021] [Accepted: 05/22/2021] [Indexed: 12/20/2022] Open
Abstract
Medical imaging is widely used in the diagnosis and treatment of cancer, and artificial intelligence (AI) has achieved tremendous success in medical image analysis. This paper reviews AI-based tumor subregion analysis in medical imaging. We summarize the latest AI-based methods for tumor subregion analysis and their applications. Specifically, we categorize the AI-based methods by training strategy: supervised and unsupervised. A detailed review of each category is presented, highlighting important contributions and achievements. Specific challenges and potential applications of AI in tumor subregion analysis are discussed.
Collapse
Affiliation(s)
- Mingquan Lin
- Department of Radiation Oncology and Winship Cancer InstituteEmory UniversityAtlantaGeorgiaUSA
| | - Jacob F. Wynne
- Department of Radiation Oncology and Winship Cancer InstituteEmory UniversityAtlantaGeorgiaUSA
| | - Boran Zhou
- Department of Radiation Oncology and Winship Cancer InstituteEmory UniversityAtlantaGeorgiaUSA
| | - Tonghe Wang
- Department of Radiation Oncology and Winship Cancer InstituteEmory UniversityAtlantaGeorgiaUSA
| | - Yang Lei
- Department of Radiation Oncology and Winship Cancer InstituteEmory UniversityAtlantaGeorgiaUSA
| | - Walter J. Curran
- Department of Radiation Oncology and Winship Cancer InstituteEmory UniversityAtlantaGeorgiaUSA
| | - Tian Liu
- Department of Radiation Oncology and Winship Cancer InstituteEmory UniversityAtlantaGeorgiaUSA
| | - Xiaofeng Yang
- Department of Radiation Oncology and Winship Cancer InstituteEmory UniversityAtlantaGeorgiaUSA
| |
Collapse
|