1
|
Hosseini Chagahi M, Mohammadi Dashtaki S, Moshiri B, Jalil Piran MD. Cardiovascular disease detection using a novel stack-based ensemble classifier with aggregation layer, DOWA operator, and feature transformation. Comput Biol Med 2024; 173:108345. [PMID: 38564852 DOI: 10.1016/j.compbiomed.2024.108345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 03/14/2024] [Accepted: 03/17/2024] [Indexed: 04/04/2024]
Abstract
Due to their widespread prevalence and impact on quality of life, cardiovascular diseases (CVD) pose a considerable global health burden. Early detection and intervention can reduce the incidence, severity, and progression of CVD and prevent premature death. The application of machine learning (ML) techniques to early CVD detection is therefore a valuable approach. In this paper, A stack-based ensemble classifier with an aggregation layer and the dependent ordered weighted averaging (DOWA) operator is proposed for detecting cardiovascular diseases. We propose transforming features using the Johnson transformation technique and normalizing feature distributions. Three diverse first-level classifiers are selected based on their accuracy, and predictions are combined using the aggregation layer and DOWA. A linear support vector machine (SVM) meta-classifier makes the final classification. Adding the aggregation layer to the stacking classifier improves classification accuracy significantly, according to the study. The accuracy is enhanced by 5%, resulting in an impressive overall accuracy of 94.05%. Moreover, the proposed system significantly increases the area under the receiver operating characteristic (ROC) curve compared to recent studies, reaching 97.14%. It further reinforces the classifier's reliability and effectiveness in classifying cardiovascular disease by distinguishing between positive and negative instances. With improved accuracy and a high area under the curve (AUC), the proposed classifier exhibits robustness and superior performance in the detection of cardiovascular diseases.
Collapse
Affiliation(s)
- Mehdi Hosseini Chagahi
- School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran.
| | - Saeed Mohammadi Dashtaki
- School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran.
| | - Behzad Moshiri
- School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran; Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Canada.
| | - M D Jalil Piran
- Department of Computer Science and Engineering, Sejong University, Seoul 05006, South Korea.
| |
Collapse
|
2
|
Shoaib M, Junaid A, Husnain G, Qadir M, Ghadi YY, Askar SS, Abouhawwash M. Advanced detection of coronary artery disease via deep learning analysis of plasma cytokine data. Front Cardiovasc Med 2024; 11:1365481. [PMID: 38525188 PMCID: PMC10957635 DOI: 10.3389/fcvm.2024.1365481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 02/19/2024] [Indexed: 03/26/2024] Open
Abstract
The 2017 World Health Organization Fact Sheet highlights that coronary artery disease is the leading cause of death globally, responsible for approximately 30% of all deaths. In this context, machine learning (ML) technology is crucial in identifying coronary artery disease, thereby saving lives. ML algorithms can potentially analyze complex patterns and correlations within medical data, enabling early detection and accurate diagnosis of CAD. By leveraging ML technology, healthcare professionals can make informed decisions and implement timely interventions, ultimately leading to improved outcomes and potentially reducing the mortality rate associated with coronary artery disease. Machine learning algorithms create non-invasive, quick, accurate, and economical diagnoses. As a result, machine learning algorithms can be employed to supplement existing approaches or as a forerunner to them. This study shows how to use the CNN classifier and RNN based on the LSTM classifier in deep learning to attain targeted "risk" CAD categorization utilizing an evolving set of 450 cytokine biomarkers that could be used as suggestive solid predictive variables for treatment. The two used classifiers are based on these "45" different cytokine prediction characteristics. The best Area Under the Receiver Operating Characteristic curve (AUROC) score achieved is (0.98) for a confidence interval (CI) of 95; the classifier RNN-LSTM used "450" cytokine biomarkers had a great (AUROC) score of 0.99 with a confidence interval of 0.95 the percentage 95, the CNN model containing cytokines received the second best AUROC score (0.92). The RNN-LSTM classifier considerably beats the CNN classifier regarding AUROC scores, as evidenced by a p-value smaller than 7.48 obtained via an independent t-test. As large-scale initiatives to achieve early, rapid, reliable, inexpensive, and accessible individual identification of CAD risk gain traction, robust machine learning algorithms can now augment older methods such as angiography. Incorporating 65 new sensitive cytokine biomarkers can increase early detection even more. Investigating the novel involvement of cytokines in CAD could lead to better risk detection, disease mechanism discovery, and new therapy options.
Collapse
Affiliation(s)
- Muhammad Shoaib
- Department of Computer Science, CECOS University of IT and Emerging Sciences, Peshawar, Pakistan
| | - Ahmad Junaid
- Department of Computer Science, CECOS University of IT and Emerging Sciences, Peshawar, Pakistan
| | - Ghassan Husnain
- Department of Computer Science, CECOS University of IT and Emerging Sciences, Peshawar, Pakistan
| | - Mansoor Qadir
- Department of Computer Science, CECOS University of IT and Emerging Sciences, Peshawar, Pakistan
| | | | - S. S. Askar
- Department of Statistics and Operations Research, College of Science, King Saud University, Riyadh, Saudi Arabia
| | - Mohamed Abouhawwash
- Department of Computational Mathematics, Science and Engineering (CMSE), College of Engineering, Michigan State University, East Lansing, MI, United States
- Department of Mathematics, Faculty of Science, Mansoura University, Mansoura, Egypt
| |
Collapse
|
3
|
V JP, S AAV, P GK, N K K. A novel attention-based cross-modal transfer learning framework for predicting cardiovascular disease. Comput Biol Med 2024; 170:107977. [PMID: 38217974 DOI: 10.1016/j.compbiomed.2024.107977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/19/2023] [Accepted: 01/08/2024] [Indexed: 01/15/2024]
Abstract
Cardiovascular disease (CVD) remains a leading cause of death globally, presenting significant challenges in early detection and treatment. The complexity of CVD arises from its multifaceted nature, influenced by a combination of genetic, environmental, and lifestyle factors. Traditional diagnostic approaches often struggle to effectively integrate and interpret the heterogeneous data associated with CVD. Addressing this challenge, we introduce a novel Attention-Based Cross-Modal (ABCM) transfer learning framework. This framework innovatively merges diverse data types, including clinical records, medical imagery, and genetic information, through an attention-driven mechanism. This mechanism adeptly identifies and focuses on the most pertinent attributes from each data source, thereby enhancing the model's ability to discern intricate interrelationships among various data types. Our extensive testing and validation demonstrate that the ABCM framework significantly surpasses traditional single-source models and other advanced multi-source methods in predicting CVD. Specifically, our approach achieves an accuracy of 93.5%, precision of 92.0%, recall of 94.5%, and an impressive area under the curve (AUC) of 97.2%. These results not only underscore the superior predictive capability of our model but also highlight its potential in offering more accurate and early detection of CVD. The integration of cross-modal data through attention-based mechanisms provides a deeper understanding of the disease, paving the way for more informed clinical decision-making and personalized patient care.
Collapse
Affiliation(s)
- Jothi Prakash V
- Karpagam College of Engineering, Myleripalayam Village, Coimbatore, 641032, Tamil Nadu, India.
| | - Arul Antran Vijay S
- Karpagam College of Engineering, Myleripalayam Village, Coimbatore, 641032, Tamil Nadu, India.
| | - Ganesh Kumar P
- College of Engineering, Guindy, Anna University, Chennai, 600025, Tamil Nadu, India.
| | - Karthikeyan N K
- Coimbatore Institute of Technology, Peelamedu, Coimbatore, 641014, Tamil Nadu, India.
| |
Collapse
|
4
|
Li J, Guo S, Ma R, He J, Zhang X, Rui D, Ding Y, Li Y, Jian L, Cheng J, Guo H. Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets. BMC Med Res Methodol 2024; 24:41. [PMID: 38365610 PMCID: PMC10870437 DOI: 10.1186/s12874-024-02173-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 02/05/2024] [Indexed: 02/18/2024] Open
Abstract
BACKGROUND Missing data is frequently an inevitable issue in cohort studies and it can adversely affect the study's findings. We assess the effectiveness of eight frequently utilized statistical and machine learning (ML) imputation methods for dealing with missing data in predictive modelling of cohort study datasets. This evaluation is based on real data and predictive models for cardiovascular disease (CVD) risk. METHODS The data is from a real-world cohort study in Xinjiang, China. It includes personal information, physical examination data, questionnaires, and laboratory biochemical results from 10,164 subjects with a total of 37 variables. Simple imputation (Simple), regression imputation (Regression), expectation-maximization(EM), multiple imputation (MICE) , K nearest neighbor classification (KNN), clustering imputation (Cluster), random forest (RF), and decision tree (Cart) were the chosen imputation methods. Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) are utilised to assess the performance of different methods for missing data imputation at a missing rate of 20%. The datasets processed with different missing data imputation methods were employed to construct a CVD risk prediction model utilizing the support vector machine (SVM). The predictive performance was then compared using the area under the curve (AUC). RESULTS The most effective imputation results were attained by KNN (MAE: 0.2032, RMSE: 0.7438, AUC: 0.730, CI: 0.719-0.741) and RF (MAE: 0.3944, RMSE: 1.4866, AUC: 0.777, CI: 0.769-0.785). The subsequent best performances were achieved by EM, Cart, and MICE, while Simple, Regression, and Cluster attained the worst performances. The CVD risk prediction model was constructed using the complete data (AUC:0.804, CI:0.796-0.812) in comparison with all other models with p<0.05. CONCLUSION KNN and RF exhibit superior performance and are more adept at imputing missing data in predictive modelling of cohort study datasets.
Collapse
Affiliation(s)
- JiaHang Li
- Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China
- Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China
| | - ShuXia Guo
- Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China
- Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China
| | - RuLin Ma
- Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China
- Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China
| | - Jia He
- Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China
- Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China
| | - XiangHui Zhang
- Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China
- Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China
| | - DongSheng Rui
- Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China
- Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China
| | - YuSong Ding
- Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China
- Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China
| | - Yu Li
- Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China
- Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China
| | - LeYao Jian
- Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China
| | - Jing Cheng
- Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China
| | - Heng Guo
- Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China.
- Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China.
| |
Collapse
|
5
|
Cai Y, Cai YQ, Tang LY, Wang YH, Gong M, Jing TC, Li HJ, Li-Ling J, Hu W, Yin Z, Gong DX, Zhang GW. Artificial intelligence in the risk prediction models of cardiovascular disease and development of an independent validation screening tool: a systematic review. BMC Med 2024; 22:56. [PMID: 38317226 PMCID: PMC10845808 DOI: 10.1186/s12916-024-03273-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 01/23/2024] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND A comprehensive overview of artificial intelligence (AI) for cardiovascular disease (CVD) prediction and a screening tool of AI models (AI-Ms) for independent external validation are lacking. This systematic review aims to identify, describe, and appraise AI-Ms of CVD prediction in the general and special populations and develop a new independent validation score (IVS) for AI-Ms replicability evaluation. METHODS PubMed, Web of Science, Embase, and IEEE library were searched up to July 2021. Data extraction and analysis were performed for the populations, distribution, predictors, algorithms, etc. The risk of bias was evaluated with the prediction risk of bias assessment tool (PROBAST). Subsequently, we designed IVS for model replicability evaluation with five steps in five items, including transparency of algorithms, performance of models, feasibility of reproduction, risk of reproduction, and clinical implication, respectively. The review is registered in PROSPERO (No. CRD42021271789). RESULTS In 20,887 screened references, 79 articles (82.5% in 2017-2021) were included, which contained 114 datasets (67 in Europe and North America, but 0 in Africa). We identified 486 AI-Ms, of which the majority were in development (n = 380), but none of them had undergone independent external validation. A total of 66 idiographic algorithms were found; however, 36.4% were used only once and only 39.4% over three times. A large number of different predictors (range 5-52,000, median 21) and large-span sample size (range 80-3,660,000, median 4466) were observed. All models were at high risk of bias according to PROBAST, primarily due to the incorrect use of statistical methods. IVS analysis confirmed only 10 models as "recommended"; however, 281 and 187 were "not recommended" and "warning," respectively. CONCLUSION AI has led the digital revolution in the field of CVD prediction, but is still in the early stage of development as the defects of research design, report, and evaluation systems. The IVS we developed may contribute to independent external validation and the development of this field.
Collapse
Affiliation(s)
- Yue Cai
- China Medical University, Shenyang, 110122, China
| | - Yu-Qing Cai
- China Medical University, Shenyang, 110122, China
| | - Li-Ying Tang
- China Medical University, Shenyang, 110122, China
| | - Yi-Han Wang
- China Medical University, Shenyang, 110122, China
| | - Mengchun Gong
- Digital Health China Co. Ltd, Beijing, 100089, China
| | - Tian-Ci Jing
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China
| | - Hui-Jun Li
- Shenyang Medical & Film Science and Technology Co. Ltd., Shenyang, 110001, China
- Enduring Medicine Smart Innovation Research Institute, Shenyang, 110001, China
| | - Jesse Li-Ling
- Institute of Genetic Medicine, School of Life Science, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, 610065, China
| | - Wei Hu
- Bayi Orthopedic Hospital, Chengdu, 610017, China
| | - Zhihua Yin
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, 110122, China.
| | - Da-Xin Gong
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China.
- The Internet Hospital Branch of the Chinese Research Hospital Association, Beijing, 100006, China.
| | - Guang-Wei Zhang
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China.
- The Internet Hospital Branch of the Chinese Research Hospital Association, Beijing, 100006, China.
| |
Collapse
|
6
|
Akan G, Nyawawa E, Nyangasa B, Turkcan MK, Mbugi E, Janabi M, Atalar F. Severity of coronary artery disease is associated with diminished circANRIL expression: A possible blood based transcriptional biomarker in East Africa. J Cell Mol Med 2024; 28:e18093. [PMID: 38149798 PMCID: PMC10844708 DOI: 10.1111/jcmm.18093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 12/09/2023] [Accepted: 12/15/2023] [Indexed: 12/28/2023] Open
Abstract
Antisense Noncoding RNA in the INK4 Locus (ANRIL) is the prime candidate gene at Chr9p21, the well-defined genetic risk locus associated with coronary artery disease (CAD). ANRIL and its transcript variants were investigated for the susceptibility to CAD in adipose tissues (AT) and peripheral blood mononuclear cells (PBMCs) of the study group and the impact of 9p21.3 locus mutations was further analysed. Expressions of ANRIL, circANRIL (hsa_circ_0008574), NR003529, EU741058 and DQ485454 were detected in epicardial AT (EAT) mediastinal AT (MAT), subcutaneous AT (SAT) and PBMCs of CAD patients undergoing coronary artery bypass grafting and non-CAD patients undergoing heart valve surgery. ANRIL expression was significantly upregulated, while the expression of circANRIL was significantly downregulated in CAD patients. Decreased circANRIL levels were significantly associated with the severity of CAD and correlated with aggressive clinical characteristics. rs10757278 and rs10811656 were significantly associated with ANRIL and circANRIL expressions in AT and PBMCs. The ROC-curve analysis suggested that circANRIL has high diagnostic accuracy (AUC: 0.9808, cut-off: 0.33, sensitivity: 1.0, specificity: 0.88). circANRIL has high diagnostic accuracy (AUC: 0.9808, cut-off: 0.33, sensitivity: 1.0, specificity: 0.88). We report the first data demonstrating the presence of ANRIL and its transcript variants expressions in the AT and PBMCs of CAD patients. circANRIL having a synergetic effect with ANRIL plays a protective role in CAD pathogenesis. Therefore, altered circANRIL expression may become a potential diagnostic transcriptional biomarker for early CAD diagnosis.
Collapse
Affiliation(s)
- Gokce Akan
- Biochemistry Department, MUHAS Genetics Laboratory, School of MedicineMuhimbili University of Health and Allied SciencesDar es SalaamTanzania
- Near East UniversityDESAM Research InstituteMersinNorth CyprusTurkey
| | | | | | | | - Erasto Mbugi
- Biochemistry Department, MUHAS Genetics Laboratory, School of MedicineMuhimbili University of Health and Allied SciencesDar es SalaamTanzania
| | | | - Fatmahan Atalar
- Biochemistry Department, MUHAS Genetics Laboratory, School of MedicineMuhimbili University of Health and Allied SciencesDar es SalaamTanzania
- Department of Rare DiseasesIstanbul University, Child Health InstituteIstanbulTurkey
| |
Collapse
|
7
|
Hsiao YC, Kuo CY, Lin FJ, Wu YW, Lin TH, Yeh HI, Chen JW, Wu CC. Machine Learning Models for ASCVD Risk Prediction in an Asian Population - How to Validate the Model is Important. ACTA CARDIOLOGICA SINICA 2023; 39:901-912. [PMID: 38022427 PMCID: PMC10646597 DOI: 10.6515/acs.202311_39(6).20230528a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 05/28/2023] [Indexed: 12/01/2023]
Abstract
Introduction Atherosclerotic cardiovascular disease (ASCVD) is prevalent worldwide including Taiwan, however widely accepted tools to assess the risk of ASCVD are lacking in Taiwan. Machine learning models are potentially useful for risk evaluation. In this study we used two cohorts to test the feasibility of machine learning with transfer learning for developing an ASCVD risk prediction model in Taiwan. Methods Two multi-center observational registry cohorts, T-SPARCLE and T-PPARCLE were used in this study. The variables selected were based on European, U.S. and Asian guidelines. Both registries recorded the ASCVD outcomes of the patients. Ten-fold validation and temporal validation methods were used to evaluate the performance of the binary classification analysis [prediction of major adverse cardiovascular (CV) events in one year]. Time-to-event analyses were also performed. Results In the binary classification analysis, eXtreme Gradient Boosting (XGBoost) and random forest had the best performance, with areas under the receiver operating characteristic curve (AUC-ROC) of 0.72 (0.68-0.76) and 0.73 (0.69-0.77), respectively, although it was not significantly better than other models. Temporal validation was also performed, and the data showed significant differences in the distribution of various features and event rate. The AUC-ROC of XGBoost dropped to 0.66 (0.59-0.73), while that of random forest dropped to 0.69 (0.62-0.76) in the temporal validation method, and the performance also became numerically worse than that of the logistic regression model. In the time-to-event analysis, most models had a concordance index of around 0.70. Conclusions Machine learning models with appropriate transfer learning may be a useful tool for the development of CV risk prediction models and may help improve patient care in the future.
Collapse
Affiliation(s)
- Yu-Chung Hsiao
- Department of Internal Medicine, National Taiwan University Hospital
| | - Chen-Yuan Kuo
- Center for Healthy Longevity and Aging Sciences, National Yang Ming Chiao Tung University
| | - Fang-Ju Lin
- Graduate Institute of Clinical Pharmacy & School of Pharmacy, College of Medicine, National Taiwan University
- Department of Pharmacy, National Taiwan University Hospital, Taipei
| | - Yen-Wen Wu
- Division of Cardiology, Cardiovascular Medical Center, Far Eastern Memorial Hospital, New Taipei City
- School of Medicine, National Yang Ming Chiao Tung University, School of Medicine, Taipei
- Graduate Institute of Medicine, Yuan Ze University, Taoyuan
| | - Tsung-Hsien Lin
- Division of Cardiology, Department of Internal Medicine, Kaohsiung Medical University Hospital
- Faculty of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung
| | - Hung-I Yeh
- MacKay Memorial Hospital, MacKay Medical College
| | - Jaw-Wen Chen
- Department of Medical Research and Education, Taipei Veterans General Hospital
| | - Chau-Chung Wu
- Department of Internal Medicine, National Taiwan University Hospital
- Graduate Institute of Medical Education & Bioethics, College of Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
8
|
Sun TH, Wang CC, Wu YL, Hsu KC, Lee TH. Machine learning approaches for biomarker discovery to predict large-artery atherosclerosis. Sci Rep 2023; 13:15139. [PMID: 37704672 PMCID: PMC10499778 DOI: 10.1038/s41598-023-42338-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 09/08/2023] [Indexed: 09/15/2023] Open
Abstract
Large-artery atherosclerosis (LAA) is a leading cause of cerebrovascular disease. However, LAA diagnosis is costly and needs professional identification. Many metabolites have been identified as biomarkers of specific traits. However, there are inconsistent findings regarding suitable biomarkers for the prediction of LAA. In this study, we propose a new method integrates multiple machine learning algorithms and feature selection method to handle multidimensional data. Among the six machine learning models, logistic regression (LR) model exhibited the best prediction performance. The value of area under the receiver operating characteristic curve (AUC) was 0.92 when 62 features were incorporated in the external validation set for the LR model. In this model, LAA could be well predicted by clinical risk factors including body mass index, smoking, and medications for controlling diabetes, hypertension, and hyperlipidemia as well as metabolites involved in aminoacyl-tRNA biosynthesis and lipid metabolism. In addition, we found that 27 features were present among the five adopted models that could provide good results. If these 27 features were used in the LR model, an AUC value of 0.93 could be achieved. Our study has demonstrated the effectiveness of combining machine learning algorithms with recursive feature elimination and cross-validation methods for biomarker identification. Moreover, we have shown that using shared features can yield more reliable correlations than either model, which can be valuable for future identification of LAA.
Collapse
Affiliation(s)
- Ting-Hsuan Sun
- Artificial Intelligence Center, China Medical University Hospital, Taichung, Taiwan
| | - Chia-Chun Wang
- Artificial Intelligence Center, China Medical University Hospital, Taichung, Taiwan
| | - Ya-Lun Wu
- Artificial Intelligence Center, China Medical University Hospital, Taichung, Taiwan
| | - Kai-Cheng Hsu
- Artificial Intelligence Center, China Medical University Hospital, Taichung, Taiwan.
- Department of Neurology, China Medical University Hospital, Taichung, Taiwan.
- Department of Medicine, China Medical University, Taichung, Taiwan.
| | - Tsong-Hai Lee
- Stroke Center and Department of Neurology, Linkou Chang Gung Memorial Hospital, and College of Medicine, Chang Gung University, Taoyuan, Taiwan.
| |
Collapse
|
9
|
Wang DC, Xu WD, Qin Z, Fu L, Lan YY, Liu XY, Huang AF. Systemic lupus erythematosus with high disease activity identification based on machine learning. Inflamm Res 2023; 72:1909-1918. [PMID: 37725103 DOI: 10.1007/s00011-023-01793-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 08/22/2023] [Accepted: 08/28/2023] [Indexed: 09/21/2023] Open
Abstract
OBJECTIVE Clinical evaluation of systemic lupus erythematosus (SLE) disease activity is limited and inconsistent, and high disease activity significantly, seriously impacts on SLE patients. This study aims to generate a machine learning model to identify SLE patients with high disease activity. METHOD A total of 1014 SLE patients with low disease activity and 453 SLE patients with high disease activity were included. A total of 94 clinical, laboratory data and 17 meteorological indicators were collected. After data preprocessing, we use mutual information and multisurf to evaluate and select the importance of features. The selected features are used for machine learning modeling. Performance of the model is evaluated and verified by a series of binary classification indicators. RESULTS We screened out hematuria, proteinuria, pyuria, low complement, precipitation, sunlight and other features for model construction by integrated feature selection. After hyperparameter optimization, the LGB has the best performance (ROC: AUC = 0.930; PRC: AUC = 0.911, APS = 0.913; balance accuracy: 0.856), and the worst is the naive bayes (ROC: AUC = 0.849; PRC: AUC = 0.719, APS = 0.714; balance accuracy: 0.705). Finally, the selection of features has good consistency in the composite feature importance bar plot. CONCLUSION We identify SLE patients with high disease activity by a simple machine learning pipeline, especially the LGB model based on the characteristics of proteinuria, hematuria, pyuria and other feathers screened out by collective feature selection.
Collapse
Affiliation(s)
- Da-Cheng Wang
- Department of Evidence-Based Medicine, Southwest Medical University, 1 Xianglin Road, Luzhou, 646000, Sichuan, China
| | - Wang-Dong Xu
- Department of Evidence-Based Medicine, Southwest Medical University, 1 Xianglin Road, Luzhou, 646000, Sichuan, China.
| | - Zhen Qin
- Department of Rheumatology and Immunology, Affiliated Hospital of Southwest Medical University, 25 Taiping Road, Luzhou, 646000, Sichuan, China
| | - Lu Fu
- Laboratory Animal Center, Southwest Medical University, 1 Xianglin Road, Luzhou, 646000, Sichuan, China
| | - You-Yu Lan
- Department of Rheumatology and Immunology, Affiliated Hospital of Southwest Medical University, 25 Taiping Road, Luzhou, 646000, Sichuan, China
| | - Xiao-Yan Liu
- Department of Evidence-Based Medicine, Southwest Medical University, 1 Xianglin Road, Luzhou, 646000, Sichuan, China
| | - An-Fang Huang
- Department of Rheumatology and Immunology, Affiliated Hospital of Southwest Medical University, 25 Taiping Road, Luzhou, 646000, Sichuan, China.
| |
Collapse
|
10
|
Qian X, Keerman M, Zhang X, Guo H, He J, Maimaitijiang R, Wang X, Ma J, Li Y, Ma R, Guo S. Study on the prediction model of atherosclerotic cardiovascular disease in the rural Xinjiang population based on survival analysis. BMC Public Health 2023; 23:1041. [PMID: 37264356 PMCID: PMC10234013 DOI: 10.1186/s12889-023-15630-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 04/07/2023] [Indexed: 06/03/2023] Open
Abstract
PURPOSE With the increase in aging and cardiovascular risk factors, the morbidity and mortality of atherosclerotic cardiovascular disease (ASCVD), represented by ischemic heart disease and stroke, continue to rise in China. For better prevention and intervention, relevant guidelines recommend using predictive models for early detection of ASCVD high-risk groups. Therefore, this study aims to establish a population ASCVD prediction model in rural areas of Xinjiang using survival analysis. METHODS Baseline cohort data were collected from September to December 2016 and followed up till June 2022. A total of 7975 residents (4054 males and 3920 females) aged 30-74 years were included in the analysis. The data set was divided according to different genders, and the training and test sets ratio was 7:3 for different genders. A Cox regression, Lasso-Cox regression, and random survival forest (RSF) model were established in the training set. The model parameters were determined by cross-validation and parameter tuning and then verified in the training set. Traditional ASCVD prediction models (Framingham and China-PAR models) were constructed in the test set. Different models' discrimination and calibration degrees were compared to find the optimal prediction model for this population according to different genders and further analyze the risk factors of ASCVD. RESULTS After 5.79 years of follow-up, 873 ASCVD events with a cumulative incidence of 10.19% were found (7.57% in men and 14.44% in women). By comparing the discrimination and calibration degrees of each model, the RSF showed the best prediction performance in males and females (male: Area Under Curve (AUC) 0.791 (95%CI 0.767,0.813), C statistic 0.780 (95%CI 0.730,0.829), Brier Score (BS):0.060, female: AUC 0.759 (95%CI 0.734,0.783) C statistic was 0.737 (95%CI 0.702,0.771), BS:0.110). Age, systolic blood pressure (SBP), apolipoprotein B (APOB), Visceral Adiposity Index (VAI), hip circumference (HC), and plasma arteriosclerosis index (AIP) are important predictors of ASCVD in the rural population of Xinjiang. CONCLUSION The performance of the ASCVD prediction model based on the RSF algorithm is better than that based on Cox regression, Lasso-Cox, and the traditional ASCVD prediction model in the rural population of Xinjiang.
Collapse
Affiliation(s)
- Xin Qian
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Mulatibieke Keerman
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Xianghui Zhang
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Heng Guo
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Jia He
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Remina Maimaitijiang
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Xinping Wang
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Jiaolong Ma
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Yu Li
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Rulin Ma
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China.
- Department of Public Health, The Key Laboratory of Preventive Medicine, Shihezi University School of Medicine, Suite 816Building No. 1, Beier Road, Shihezi, 832000, Xinjiang, China.
| | - Shuxia Guo
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China.
- Department of NHC Key Laboratory of Prevention and Treatment of Central, Asia High Incidence Diseases, The First Affiliated Hospital of Shihezi University Medical College, Shihezi, Xinjiang, China.
| |
Collapse
|
11
|
Li JX, Li L, Zhong X, Fan SJ, Cen T, Wang J, He C, Zhang Z, Luo YN, Liu XX, Hu LX, Zhang YD, Qiu HL, Dong GH, Zou XG, Yang BY. Machine learning identifies prominent factors associated with cardiovascular disease: findings from two million adults in the Kashgar Prospective Cohort Study (KPCS). Glob Health Res Policy 2022; 7:48. [PMID: 36474302 PMCID: PMC9724436 DOI: 10.1186/s41256-022-00282-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 11/18/2022] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Identifying factors associated with cardiovascular disease (CVD) is critical for its prevention, but this topic is scarcely investigated in Kashgar prefecture, Xinjiang, northwestern China. We thus explored the CVD epidemiology and identified prominent factors associated with CVD in this region. METHODS A total of 1,887,710 adults at baseline (in 2017) of the Kashgar Prospective Cohort Study were included in the analysis. Sixteen candidate factors, including seven demographic factors, 4 lifestyle factors, and 5 clinical factors, were collected from a questionnaire and health examination records. CVD was defined according to International Clinical Diagnosis (ICD-10) codes. We first used logistic regression models to investigate the association between each of the candidate factors and CVD. Then, we employed 3 machine learning methods-Random Forest, Random Ferns, and Extreme Gradient Boosting-to rank and identify prominent factors associated with CVD. Stratification analyses by sex, ethnicity, education level, economic status, and residential setting were also performed to test the consistency of the ranking. RESULTS The prevalence of CVD in Kashgar prefecture was 8.1%. All the 16 candidate factors were confirmed to be significantly associated with CVD (odds ratios ranged from 1.03 to 2.99, all p values < 0.05) in logistic regression models. Further machine learning-based analysis suggested that age, occupation, hypertension, exercise frequency, and dietary pattern were the five most prominent factors associated with CVD. The ranking of relative importance for prominent factors in stratification analyses showed that the factor importance generally followed the same pattern as that in the overall sample. CONCLUSIONS CVD is a major public health concern in Kashgar prefecture. Age, occupation, hypertension, exercise frequency, and dietary pattern might be the prominent factors associated with CVD in this region.In the future, these factors should be given priority in preventing CVD in future.
Collapse
Affiliation(s)
- Jia-Xin Li
- grid.12981.330000 0001 2360 039XGuangdong Provincial Engineering Technology Research Center of Environmental Pollution and Health Risk Assessment, Department of Occupational and Environmental Health, School of Public Health, Sun Yat-Sen University, 74 Zhongshan 2nd Road, Yuexiu District, Guangzhou, 510080 China
| | - Li Li
- grid.12981.330000 0001 2360 039XDepartment of Respiratory and Critical Care Medicine, The First People’s Hospital of Kashi (The Affiliated Kashi Hospital of Sun Yat-Sen University), No.66, Yingbin Avenue, Kashgar City, 844000 China
| | - Xuemei Zhong
- grid.12981.330000 0001 2360 039XDepartment of Respiratory and Critical Care Medicine, The First People’s Hospital of Kashi (The Affiliated Kashi Hospital of Sun Yat-Sen University), No.66, Yingbin Avenue, Kashgar City, 844000 China
| | - Shu-Jun Fan
- grid.508371.80000 0004 1774 3337Guangzhou Center for Disease Control and Prevention, Guangzhou, 510440 China
| | - Tao Cen
- grid.284723.80000 0000 8877 7471Department of Research and Development, Nanfang Hospital, Southern Medical University, Guangzhou, 510515 China
| | - Jianquan Wang
- grid.12981.330000 0001 2360 039XDepartment of Respiratory and Critical Care Medicine, The First People’s Hospital of Kashi (The Affiliated Kashi Hospital of Sun Yat-Sen University), No.66, Yingbin Avenue, Kashgar City, 844000 China
| | - Chuanjiang He
- grid.12981.330000 0001 2360 039XDepartment of Respiratory and Critical Care Medicine, The First People’s Hospital of Kashi (The Affiliated Kashi Hospital of Sun Yat-Sen University), No.66, Yingbin Avenue, Kashgar City, 844000 China
| | - Zhoubin Zhang
- grid.508371.80000 0004 1774 3337Guangzhou Center for Disease Control and Prevention, Guangzhou, 510440 China
| | - Ya-Na Luo
- grid.12981.330000 0001 2360 039XGuangdong Provincial Engineering Technology Research Center of Environmental Pollution and Health Risk Assessment, Department of Occupational and Environmental Health, School of Public Health, Sun Yat-Sen University, 74 Zhongshan 2nd Road, Yuexiu District, Guangzhou, 510080 China
| | - Xiao-Xuan Liu
- grid.12981.330000 0001 2360 039XGuangdong Provincial Engineering Technology Research Center of Environmental Pollution and Health Risk Assessment, Department of Occupational and Environmental Health, School of Public Health, Sun Yat-Sen University, 74 Zhongshan 2nd Road, Yuexiu District, Guangzhou, 510080 China
| | - Li-Xin Hu
- grid.12981.330000 0001 2360 039XGuangdong Provincial Engineering Technology Research Center of Environmental Pollution and Health Risk Assessment, Department of Occupational and Environmental Health, School of Public Health, Sun Yat-Sen University, 74 Zhongshan 2nd Road, Yuexiu District, Guangzhou, 510080 China
| | - Yi-Dan Zhang
- grid.12981.330000 0001 2360 039XGuangdong Provincial Engineering Technology Research Center of Environmental Pollution and Health Risk Assessment, Department of Occupational and Environmental Health, School of Public Health, Sun Yat-Sen University, 74 Zhongshan 2nd Road, Yuexiu District, Guangzhou, 510080 China
| | - Hui-Ling Qiu
- grid.12981.330000 0001 2360 039XGuangdong Provincial Engineering Technology Research Center of Environmental Pollution and Health Risk Assessment, Department of Occupational and Environmental Health, School of Public Health, Sun Yat-Sen University, 74 Zhongshan 2nd Road, Yuexiu District, Guangzhou, 510080 China
| | - Guang-Hui Dong
- grid.12981.330000 0001 2360 039XGuangdong Provincial Engineering Technology Research Center of Environmental Pollution and Health Risk Assessment, Department of Occupational and Environmental Health, School of Public Health, Sun Yat-Sen University, 74 Zhongshan 2nd Road, Yuexiu District, Guangzhou, 510080 China
| | - Xiao-Guang Zou
- grid.12981.330000 0001 2360 039XDepartment of Respiratory and Critical Care Medicine, The First People’s Hospital of Kashi (The Affiliated Kashi Hospital of Sun Yat-Sen University), No.66, Yingbin Avenue, Kashgar City, 844000 China
| | - Bo-Yi Yang
- grid.12981.330000 0001 2360 039XGuangdong Provincial Engineering Technology Research Center of Environmental Pollution and Health Risk Assessment, Department of Occupational and Environmental Health, School of Public Health, Sun Yat-Sen University, 74 Zhongshan 2nd Road, Yuexiu District, Guangzhou, 510080 China
| |
Collapse
|
12
|
Predicting the Physician’s Specialty Using a Medical Prescription Database. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:5871408. [PMID: 36158134 PMCID: PMC9507660 DOI: 10.1155/2022/5871408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 08/19/2022] [Accepted: 09/01/2022] [Indexed: 11/18/2022]
Abstract
Purpose The present study is aimed at predicting the physician's specialty based on the most frequent two medications prescribed simultaneously. The results of this study could be utilized in the imputation of the missing data in similar databases. Patients and Methods. The research is done through the KAy-means for MIxed LArge datasets (KAMILA) clustering and random forest (RF) model. The data used in the study were retrieved from outpatients' prescriptions in the second populous province of Iran (Khorasan Razavi) from April 2015 to March 2017. Results The main findings of the study represent the importance of each combination in predicting the specialty. The final results showed that the combination of amoxicillin-metronidazole has the highest importance in making an accurate prediction. The findings are provided in a user-friendly R-shiny web application, which can be applied to any medical prescription database. Conclusion Nowadays, a huge amount of data is produced in the field of medical prescriptions, which a significant section of that is missing in the specialty. Thus, imputing the missing variables can lead to valuable results for planning a medication with higher quality, improving healthcare quality, and decreasing expenses.
Collapse
|
13
|
Tran V, Saad T, Tesfaye M, Walelign S, Wordofa M, Abera D, Desta K, Tsegaye A, Ay A, Taye B. Helicobacter pylori (H. pylori) risk factor analysis and prevalence prediction: a machine learning-based approach. BMC Infect Dis 2022; 22:655. [PMID: 35902812 PMCID: PMC9330977 DOI: 10.1186/s12879-022-07625-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 07/18/2022] [Indexed: 12/03/2022] Open
Abstract
Background Although previous epidemiological studies have examined the potential risk factors that increase the likelihood of acquiring Helicobacter pylori infections, most of these analyses have utilized conventional statistical models, including logistic regression, and have not benefited from advanced machine learning techniques. Objective We examined H. pylori infection risk factors among school children using machine learning algorithms to identify important risk factors as well as to determine whether machine learning can be used to predict H. pylori infection status. Methods We applied feature selection and classification algorithms to data from a school-based cross-sectional survey in Ethiopia. The data set included 954 school children with 27 sociodemographic and lifestyle variables. We conducted five runs of tenfold cross-validation on the data. We combined the results of these runs for each combination of feature selection (e.g., Information Gain) and classification (e.g., Support Vector Machines) algorithms. Results The XGBoost classifier had the highest accuracy in predicting H. pylori infection status with an accuracy of 77%—a 13% improvement from the baseline accuracy of guessing the most frequent class (64% of the samples were H. Pylori negative.) K-Nearest Neighbors showed the worst performance across all classifiers. A similar performance was observed using the F1-score and area under the receiver operating curve (AUROC) classifier evaluation metrics. Among all features, place of residence (with urban residence increasing risk) was the most common risk factor for H. pylori infection, regardless of the feature selection method choice. Additionally, our machine learning algorithms identified other important risk factors for H. pylori infection, such as; electricity usage in the home, toilet type, and waste disposal location. Using a 75% cutoff for robustness, machine learning identified five of the eight significant features found by traditional multivariate logistic regression. However, when a lower robustness threshold is used, machine learning approaches identified more H. pylori risk factors than multivariate logistic regression and suggested risk factors not detected by logistic regression. Conclusion This study provides evidence that machine learning approaches are positioned to uncover H. pylori infection risk factors and predict H. pylori infection status. These approaches identify similar risk factors and predict infection with comparable accuracy to logistic regression, thus they could be used as an alternative method. Supplementary Information The online version contains supplementary material available at 10.1186/s12879-022-07625-7.
Collapse
Affiliation(s)
- Van Tran
- Department of Mathematics, Colgate University, 13 Oak Dr., Hamilton, NY, USA
| | - Tazmilur Saad
- Department of Mathematics, Colgate University, 13 Oak Dr., Hamilton, NY, USA
| | - Mehret Tesfaye
- College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa University, Addis Ababa, Ethiopia
| | - Sosina Walelign
- College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa University, Addis Ababa, Ethiopia
| | - Moges Wordofa
- College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa University, Addis Ababa, Ethiopia
| | - Dessie Abera
- College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa University, Addis Ababa, Ethiopia
| | - Kassu Desta
- College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa University, Addis Ababa, Ethiopia
| | - Aster Tsegaye
- College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa University, Addis Ababa, Ethiopia
| | - Ahmet Ay
- Department of Mathematics, Colgate University, 13 Oak Dr., Hamilton, NY, USA. .,Department of Biology, Colgate University, 13 Oak Dr., Hamilton, NY, USA.
| | - Bineyam Taye
- Department of Biology, Colgate University, 13 Oak Dr., Hamilton, NY, USA.
| |
Collapse
|
14
|
Qian X, Li Y, Zhang X, Guo H, He J, Wang X, Yan Y, Ma J, Ma R, Guo S. A Cardiovascular Disease Prediction Model Based on Routine Physical Examination Indicators Using Machine Learning Methods: A Cohort Study. Front Cardiovasc Med 2022; 9:854287. [PMID: 35783868 PMCID: PMC9247206 DOI: 10.3389/fcvm.2022.854287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Accepted: 05/23/2022] [Indexed: 11/24/2022] Open
Abstract
Background Cardiovascular diseases (CVD) are currently the leading cause of premature death worldwide. Model-based early detection of high-risk populations for CVD is the key to CVD prevention. Thus, this research aimed to use machine learning (ML) algorithms to establish a CVD prediction model based on routine physical examination indicators suitable for the Xinjiang rural population. Method The research cohort data collection was divided into two stages. The first stage involved a baseline survey from 2010 to 2012, with follow-up ending in December 2017. The second-phase baseline survey was conducted from September to December 2016, and follow-up ended in August 2021. A total of 12,692 participants (10,407 Uyghur and 2,285 Kazak) were included in the study. Screening predictors and establishing variable subsets were based on least absolute shrinkage and selection operator (Lasso) regression, logistic regression forward partial likelihood estimation (FLR), random forest (RF) feature importance, and RF variable importance. The selected subset of variables was compared with L1 regularized logistic regression (L1-LR), RF, support vector machine (SVM), and AdaBoost algorithm to establish a CVD prediction model suitable for this population. The incidence of CVD in this population was then analyzed. Result After 4.94 years of follow-up, a total of 1,176 people were diagnosed with CVD (cumulative incidence: 9.27%). In the comparison of discrimination and calibration, the prediction performance of the subset of variables selected based on FLR was better than that of other models. Combining the results of discrimination, calibration, and clinical validity, the prediction model based on L1-LR had the best prediction performance. Age, systolic blood pressure, low-density lipoprotein-L/high-density lipoproteins-C, triglyceride blood glucose index, body mass index, and body adiposity index were all important predictors of the onset of CVD in the Xinjiang rural population. Conclusion In the Xinjiang rural population, the prediction model based on L1-LR had the best prediction performance.
Collapse
Affiliation(s)
- Xin Qian
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Yu Li
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Xianghui Zhang
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Heng Guo
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Jia He
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Xinping Wang
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Yizhong Yan
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Jiaolong Ma
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Rulin Ma
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
| | - Shuxia Guo
- Department of Public Health, Shihezi University School of Medicine, Shihezi, China
- Department of NHC Key Laboratory of Prevention and Treatment of Central Asia High Incidence Diseases, The First Affiliated Hospital of Shihezi University Medical College, Shihezi, China
| |
Collapse
|
15
|
Zafar A, Attia Z, Tesfaye M, Walelign S, Wordofa M, Abera D, Desta K, Tsegaye A, Ay A, Taye B. Machine learning-based risk factor analysis and prevalence prediction of intestinal parasitic infections using epidemiological survey data. PLoS Negl Trop Dis 2022; 16:e0010517. [PMID: 35700192 PMCID: PMC9236253 DOI: 10.1371/journal.pntd.0010517] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 06/27/2022] [Accepted: 05/18/2022] [Indexed: 11/21/2022] Open
Abstract
Background Previous epidemiological studies have examined the prevalence and risk factors for a variety of parasitic illnesses, including protozoan and soil-transmitted helminth (STH, e.g., hookworms and roundworms) infections. Despite advancements in machine learning for data analysis, the majority of these studies use traditional logistic regression to identify significant risk factors. Methods In this study, we used data from a survey of 54 risk factors for intestinal parasitosis in 954 Ethiopian school children. We investigated whether machine learning approaches can supplement traditional logistic regression in identifying intestinal parasite infection risk factors. We used feature selection methods such as InfoGain (IG), ReliefF (ReF), Joint Mutual Information (JMI), and Minimum Redundancy Maximum Relevance (MRMR). Additionally, we predicted children’s parasitic infection status using classifiers such as Logistic Regression (LR), Support Vector Machines (SVM), Random Forests (RF) and XGBoost (XGB), and compared their accuracy and area under the receiver operating characteristic curve (AUROC) scores. For optimal model training, we performed tenfold cross-validation and tuned the classifier hyperparameters. We balanced our dataset using the Synthetic Minority Oversampling (SMOTE) method. Additionally, we used association rule learning to establish a link between risk factors and parasitic infections. Key findings Our study demonstrated that machine learning could be used in conjunction with logistic regression. Using machine learning, we developed models that accurately predicted four parasitic infections: any parasitic infection at 79.9% accuracy, helminth infection at 84.9%, any STH infection at 95.9%, and protozoan infection at 94.2%. The Random Forests (RF) and Support Vector Machines (SVM) classifiers achieved the highest accuracy when top 20 risk factors were considered using Joint Mutual Information (JMI) or all features were used. The best predictors of infection were socioeconomic, demographic, and hematological characteristics. Conclusions We demonstrated that feature selection and association rule learning are useful strategies for detecting risk factors for parasite infection. Additionally, we showed that advanced classifiers might be utilized to predict children’s parasitic infection status. When combined with standard logistic regression models, machine learning techniques can identify novel risk factors and predict infection risk. In developing countries such as Ethiopia, intestinal parasites are a significant public health problem. These parasites are detrimental to the health of schoolchildren. Numerous risk factors for parasitic infections have been identified using uni- and multi-variate logistic regression. However, logistic regression has inherent limitations when applied to data sets with a large number of risk factors. We used machine learning techniques in conjunction with logistic regression models to identify relevant risk factors for parasitic infections in a dataset of 954 Ethiopian schoolchildren with 54 different risk factors for parasitic infections. Additionally, we developed predictive models of parasitic infection. Compared to logistic regression, we discovered that machine learning techniques identified novel risk factors and had higher predictive accuracy. Furthermore, we discovered that infection prediction could be aided by combining socioeconomic, health, and hematological characteristics. As a result, we concluded that advanced machine learning methods should be used in conjunction with logistic regression to study parasitic infections.
Collapse
Affiliation(s)
- Aziz Zafar
- Colgate University, Department of Mathematics, Hamilton, New York, United States of America
- Colgate University, Department of Biology, Hamilton, New York, United States of America
| | - Ziad Attia
- Colgate University, Department of Mathematics, Hamilton, New York, United States of America
- Colgate University, Department of Computer Science, Hamilton, New York, United States of America
| | - Mehret Tesfaye
- Addis Ababa University, College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa, Ethiopia
| | - Sosina Walelign
- Addis Ababa University, College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa, Ethiopia
| | - Moges Wordofa
- Addis Ababa University, College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa, Ethiopia
| | - Dessie Abera
- Addis Ababa University, College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa, Ethiopia
| | - Kassu Desta
- Addis Ababa University, College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa, Ethiopia
| | - Aster Tsegaye
- Addis Ababa University, College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa, Ethiopia
| | - Ahmet Ay
- Colgate University, Department of Mathematics, Hamilton, New York, United States of America
- Colgate University, Department of Biology, Hamilton, New York, United States of America
- * E-mail: (AA); (BT)
| | - Bineyam Taye
- Colgate University, Department of Biology, Hamilton, New York, United States of America
- * E-mail: (AA); (BT)
| |
Collapse
|
16
|
Smart Home Technology Solutions for Cardiovascular Diseases: A Systematic Review. APPLIED SYSTEM INNOVATION 2022. [DOI: 10.3390/asi5030051] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Cardiovascular diseases (CVD) are the leading cause of mortality globally. Despite improvement in therapies, people with CVD lack support for monitoring and managing their condition at home and out of hospital settings. Smart Home Technologies have potential to monitor health status and support people with CVD in their homes. We explored the Smart Home Technologies available for CVD monitoring and management in people with CVD and acceptance of the available technologies to end-users. We systematically searched four databases, namely Medline, Web of Science, Embase, and IEEE, from 1990 to 2020 (search date 18 March 2020). “Smart-Home” was defined as a system using integrated sensor technologies. We included studies using sensors, such as wearable and non-wearable devices, to capture vital signs relevant to CVD at home settings and to transfer the data using communication systems, including the gateway. We categorised the articles for parameters monitored, communication systems and data sharing, end-user applications, regulations, and user acceptance. The initial search yielded 2462 articles, and the elimination of duplicates resulted in 1760 articles. Of the 36 articles eligible for full-text screening, we selected five Smart Home Technology studies for CVD management with sensor devices connected to a gateway and having a web-based user interface. We observed that the participants of all the studies were people with heart failure. A total of three main categories—Smart Home Technology for CVD management, user acceptance, and the role of regulatory agencies—were developed and discussed. There is an imperative need to monitor CVD patients’ vital parameters regularly. However, limited Smart Home Technology is available to address CVD patients’ needs and monitor health risks. Our review suggests the need to develop and test Smart Home Technology for people with CVD. Our findings provide insights and guidelines into critical issues, including Smart Home Technology for CVD management, user acceptance, and regulatory agency’s role to be followed when designing, developing, and deploying Smart Home Technology for CVD.
Collapse
|
17
|
Exploration of Black Boxes of Supervised Machine Learning Models: A Demonstration on Development of Predictive Heart Risk Score. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:5475313. [PMID: 35602638 PMCID: PMC9119773 DOI: 10.1155/2022/5475313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 04/26/2022] [Indexed: 11/18/2022]
Abstract
Machine learning (ML) often provides applicable high-performance models to facilitate decision-makers in various fields. However, this high performance is achieved at the expense of the interpretability of these models, which has been criticized by practitioners and has become a significant hindrance in their application. Therefore, in highly sensitive decisions, black boxes of ML models are not recommended. We proposed a novel methodology that uses complex supervised ML models and transforms them into simple, interpretable, transparent statistical models. This methodology is like stacking ensemble ML in which the best ML models are used as a base learner to compute relative feature weights. The index of these weights is further used as a single covariate in the simple logistic regression model to estimate the likelihood of an event. We tested this methodology on the primary dataset related to cardiovascular diseases (CVDs), the leading cause of mortalities in recent times. Therefore, early risk assessment is an important dimension that can potentially reduce the burden of CVDs and their related mortality through accurate but interpretable risk prediction models. We developed an artificial neural network and support vector machines based on ML models and transformed them into a simple statistical model and heart risk scores. These simplified models were found transparent, reliable, valid, interpretable, and approximate in predictions. The findings of this study suggest that complex supervised ML models can be efficiently transformed into simple statistical models that can also be validated.
Collapse
|
18
|
Li JJ, Wang CM, Wang YJ, Yang Q, Cai WY, Li YJ, Song M, Zang YL, Cui XH, Li Q, Chen Y, Weng XG, Zhu XX. Network pharmacology analysis and experimental validation to explore the mechanism of Shenlian extract on myocardial ischemia. JOURNAL OF ETHNOPHARMACOLOGY 2022; 288:114973. [PMID: 34990768 DOI: 10.1016/j.jep.2022.114973] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 09/30/2021] [Accepted: 01/02/2022] [Indexed: 06/14/2023]
Abstract
ETHNOPHARMACOLOGICAL RELEVANCE Shenlian extract (SL), extracted from Salvia miltiorrhiza Bunge and Andrographis paniculata (Burm. f.) Nees, has been proved to be effective in the prevention and treatment of atherosclerosis. Recently, we have partially elucidated the mechanisms involved in the therapeutic effects of SL on myocardial ischemia (MI). However, the underlying mechanisms remain largely unclear. AIM OF THE STUDY This study aims to explore the potential molecular mechanism of SL on MI on the basis of network pharmacology. MATERIALS AND METHODS First, the main active ingredients of SL were screened in the Traditional Chinese Medicine Integrated Database, and the MI-associated targets were collected from the DisGeNET database. Then, we used compound-target and target-pathway networks to uncover the therapeutic mechanisms of SL. On the basis of network pharmacology analysis results, we assessed the effects of SL in MI rat model and oxygen glucose deprivation model of H9c2 cells and validated the possible molecular mechanisms of SL on myocardial injury in vivo and in vitro. RESULTS The network pharmacology results showed that 37 potential targets were recognized, including TNF-α, Bcl-2, STAT3, PI3K and MMP2. These results revealed that the possible targets of SL were involved in the regulation of inflammation and apoptosis signaling pathway. Then, in vivo experiments indicated that SL significantly reduced the myocardial infarction size of MI rats. Serum CK-MB, cTnT, CK, LDH, and AST levels were significantly decreased by SL (P < 0.05 or P < 0.01). In vitro, SL significantly increased H9c2 cell viability. The levels of inflammation factors including TNF-α and MMP2 were significantly decreased by SL (P < 0.05 or P < 0.01). TUNEL and Annexin V/propidium iodide assays indicated that SL could significantly decrease the cell apoptotic rate in vivo and in vitro (P < 0.05 or P < 0.01). The remarkable upregulation of anti-apoptotic Bcl-2 and downregulation of pro-apoptotic Bax protein level further confirmed this result. Kyoto Encyclopedia of Genes and Genomes pathway analysis showed that the PI3K-AKT and JAK2-STAT3 pathways were significantly enriched in SL. Compared with the model group, SL treatment significantly activated the PI3K-AKT and JAK2-STAT3 pathways in vivo and in vitro according to Western blot analyses. CONCLUSION SL could protect the myocardium from MI injury. The underlying mechanism may be related to the reduction of inflammation and apoptosis by activating the PI3K/AKT and JAK2/STAT3 pathways.
Collapse
Affiliation(s)
- Jing-Jing Li
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Science, Beijing, 100700, China
| | - Chun-Miao Wang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Science, Beijing, 100700, China
| | - Ya-Jie Wang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Science, Beijing, 100700, China.
| | - Qing Yang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Science, Beijing, 100700, China
| | - Wei-Yan Cai
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Science, Beijing, 100700, China
| | - Yu-Jie Li
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Science, Beijing, 100700, China
| | - Min Song
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Science, Beijing, 100700, China
| | - Yuan-Long Zang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Science, Beijing, 100700, China
| | - Xi-He Cui
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Science, Beijing, 100700, China
| | - Qi Li
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Science, Beijing, 100700, China
| | - Ying Chen
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Science, Beijing, 100700, China
| | - Xiao-Gang Weng
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Science, Beijing, 100700, China
| | - Xiao-Xin Zhu
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Science, Beijing, 100700, China.
| |
Collapse
|
19
|
Classification Comparison of Machine Learning Algorithms Using Two Independent CAD Datasets. MATHEMATICS 2022. [DOI: 10.3390/math10030311] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In the last few decades, statistical methods and machine learning (ML) algorithms have become efficient in medical decision-making. Coronary artery disease (CAD) is a common type of cardiovascular disease that causes many deaths each year. In this study, two CAD datasets from different countries (TRNC and Iran) are tested to understand the classification efficiency of different supervised machine learning algorithms. The Z-Alizadeh Sani dataset contained 303 individuals (216 patient, 87 control), while the Near East University (NEU) Hospital dataset contained 475 individuals (305 patients, 170 control). This study was conducted in three stages: (1) Each dataset, as well as their merged version, was subject to review separately with a random sampling method to obtain train-test subsets. (2) The NEU Hospital dataset was assigned as the training data, while the Z-Alizadeh Sani dataset was the test data. (3) The Z-Alizadeh Sani dataset was assigned as the training data, while the NEU hospital dataset was the test data. Among all ML algorithms, the Random Forest showed successful results for its classification performance at each stage. The least successful ML method was kNN which underperformed at all pitches. Other methods, including logistic regression, have varying classification performances at every step.
Collapse
|