1
|
Shimizu GY, Schrempf M, Romão EA, Jauk S, Kramer D, Rainer PP, Cardeal da Costa JA, de Azevedo-Marques JM, Scarpelini S, Suzuki KMF, César HV, de Azevedo-Marques PM. Machine learning-based risk prediction for major adverse cardiovascular events in a Brazilian hospital: Development, external validation, and interpretability. PLoS One 2024; 19:e0311719. [PMID: 39392843 PMCID: PMC11469522 DOI: 10.1371/journal.pone.0311719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 09/23/2024] [Indexed: 10/13/2024] Open
Abstract
BACKGROUND Studies of cardiovascular disease risk prediction by machine learning algorithms often do not assess their ability to generalize to other populations and few of them include an analysis of the interpretability of individual predictions. This manuscript addresses the development and validation, both internal and external, of predictive models for the assessment of risks of major adverse cardiovascular events (MACE). Global and local interpretability analyses of predictions were conducted towards improving MACE's model reliability and tailoring preventive interventions. METHODS The models were trained and validated on a retrospective cohort with the use of data from Ribeirão Preto Medical School (RPMS), University of São Paulo, Brazil. Data from Beth Israel Deaconess Medical Center (BIDMC), USA, were used for external validation. A balanced sample of 6,000 MACE cases and 6,000 non-MACE cases from RPMS was created for training and internal validation and an additional one of 8,000 MACE cases and 8,000 non-MACE cases from BIDMC was employed for external validation. Eight machine learning algorithms, namely Penalized Logistic Regression, Random Forest, XGBoost, Decision Tree, Support Vector Machine, k-Nearest Neighbors, Naive Bayes, and Multi-Layer Perceptron were trained to predict a 5-year risk of major adverse cardiovascular events and their predictive performance was evaluated regarding accuracy, ROC curve (receiver operating characteristic), and AUC (area under the ROC curve). LIME and Shapley values were applied towards insights about model interpretability. FINDINGS Random Forest showed the best predictive performance in both internal validation (AUC = 0.871 (0.859-0.882); Accuracy = 0.794 (0.782-0.808)) and external one (AUC = 0.786 (0.778-0.792); Accuracy = 0.710 (0.704-0.717)). Compared to LIME, Shapley values suggest more consistent explanations on exploratory analysis and importance of features. CONCLUSIONS Among the machine learning algorithms evaluated, Random Forest showed the best generalization ability, both internally and externally. Shapley values for local interpretability were more informative than LIME ones, which is in line with our exploratory analysis and global interpretation of the final model. Machine learning algorithms with good generalization and accompanied by interpretability analyses are recommended for assessments of individual risks of cardiovascular diseases and development of personalized preventive actions.
Collapse
Affiliation(s)
- Gilson Yuuji Shimizu
- Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, São Paulo, Brazil
| | - Michael Schrempf
- Predicting Health GmbH, Graz, Austria
- Division of Cardiology, Medical University of Graz, Graz, Austria
| | - Elen Almeida Romão
- Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, São Paulo, Brazil
| | - Stefanie Jauk
- Steiermärkische Krankenanstaltengesellschaft m. b. H., Graz, Austria
- Predicting Health GmbH, Graz, Austria
| | - Diether Kramer
- Steiermärkische Krankenanstaltengesellschaft m. b. H., Graz, Austria
- Predicting Health GmbH, Graz, Austria
| | | | | | | | - Sandro Scarpelini
- Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, São Paulo, Brazil
| | | | - Hilton Vicente César
- Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, São Paulo, Brazil
| | | |
Collapse
|
2
|
Zinzuwadia AN, Mineeva O, Li C, Farukhi Z, Giulianini F, Cade B, Chen L, Karlson E, Paynter N, Mora S, Demler O. Tailoring Risk Prediction Models to Local Populations. JAMA Cardiol 2024:2823894. [PMID: 39292486 PMCID: PMC11411452 DOI: 10.1001/jamacardio.2024.2912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/19/2024]
Abstract
Importance Risk estimation is an integral part of cardiovascular care. Local recalibration of guideline-recommended models could address the limitations of existing tools. Objective To provide a machine learning (ML) approach to augment the performance of the American Heart Association's Predicting Risk of Cardiovascular Disease Events (AHA-PREVENT) equations when applied to a local population while preserving clinical interpretability. Design, Setting, and Participants This cohort study used a New England-based electronic health record cohort of patients without prior atherosclerotic cardiovascular disease (ASCVD) who had the data necessary to calculate the AHA-PREVENT 10-year risk of developing ASCVD in the event period (2007-2016). Patients with prior ASCVD events, death prior to 2007, or age 79 years or older in 2007 were subsequently excluded. The final study population of 95 326 patients was split into 3 nonoverlapping subsets for training, testing, and validation. The AHA-PREVENT model was adapted to this local population using the open-source ML model (MLM) Extreme Gradient Boosting model (XGBoost) with minimal predictor variables, including age, sex, and AHA-PREVENT. The MLM was monotonically constrained to preserve known associations between risk factors and ASCVD risk. Along with sex, race and ethnicity data from the electronic health record were collected to validate the performance of ASCVD risk prediction in subgroups. Data were analyzed from August 2021 to February 2024. Main Outcomes and Measures Consistent with the AHA-PREVENT model, ASCVD events were defined as the first occurrence of either nonfatal myocardial infarction, coronary artery disease, ischemic stroke, or cardiovascular death. Cardiovascular death was coded via government registries. Discrimination, calibration, and risk reclassification were assessed using the Harrell C index, a modified Hosmer-Lemeshow goodness-of-fit test and calibration curves, and reclassification tables, respectively. Results In the test set of 38 137 patients (mean [SD] age, 64.8 [6.9] years, 22 708 [59.5]% women and 15 429 [40.5%] men; 935 [2.5%] Asian, 2153 [5.6%] Black, 1414 [3.7%] Hispanic, 31 400 [82.3%] White, and 2235 [5.9%] other, including American Indian, multiple races, unspecified, and unrecorded, consolidated owing to small numbers), MLM-PREVENT had improved calibration (modified Hosmer-Lemeshow P > .05) compared to the AHA-PREVENT model across risk categories in the overall cohort (χ23 = 2.2; P = .53 vs χ23 > 16.3; P < .001) and sex subgroups (men: χ23 = 2.1; P = .55 vs χ23 > 16.3; P < .001; women: χ23 = 6.5; P = .09 vs. χ23 > 16.3; P < .001), while also surpassing a traditional recalibration approach. MLM-PREVENT maintained or improved AHA-PREVENT's calibration in Asian, Black, and White individuals. Both MLM-PREVENT and AHA-PREVENT performed equally well in discriminating risk (approximate ΔC index, ±0.01). Using a clinically significant 7.5% risk threshold, MLM-PREVENT reclassified a total of 11.5% of patients. We visualize the recalibration through MLM-PREVENT ASCVD risk charts that highlight preserved risk associations of the original AHA-PREVENT model. Conclusions and Relevance The interpretable ML approach presented in this article enhanced the accuracy of the AHA-PREVENT model when applied to a local population while still preserving the risk associations found by the original model. This method has the potential to recalibrate other established risk tools and is implementable in electronic health record systems for improved cardiovascular risk assessment.
Collapse
Affiliation(s)
| | | | - Chunying Li
- Brigham & Women's Hospital, Boston, Massachusetts
| | - Zareen Farukhi
- Brigham & Women's Hospital, Boston, Massachusetts
- Massachusetts General Hospital, Boston
| | | | - Brian Cade
- Brigham & Women's Hospital, Boston, Massachusetts
| | - Lin Chen
- Brigham & Women's Hospital, Boston, Massachusetts
| | | | - Nina Paynter
- Brigham & Women's Hospital, Boston, Massachusetts
| | - Samia Mora
- Brigham & Women's Hospital, Boston, Massachusetts
| | - Olga Demler
- Brigham & Women's Hospital, Boston, Massachusetts
- ETH Zurich, Zurich, Switzerland
| |
Collapse
|
3
|
Wang L, Liu D, Sun Y, Zhang Y, Chen W, Yuan Y, Hu S, Li S. Machine learning-based analysis of heavy metal contamination in Chinese lake basin sediments: Assessing influencing factors and policy implications. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2024; 283:116815. [PMID: 39094459 DOI: 10.1016/j.ecoenv.2024.116815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 07/17/2024] [Accepted: 07/26/2024] [Indexed: 08/04/2024]
Abstract
Sediments are important heavy metal sinks in lakes, crucial for ensuring water environment safety. Existing studies mainly focused on well-studied lakes, leaving gaps in understanding pollution patterns in specific basins and influencing factors.We compiled comprehensive sediment contamination data from literature and public datasets, including hydro-geomorphological, climatic, soil, landscape, and anthropogenic factors. Using advanced machine learning, we analyzed typical pollution factors to infer potential sources and migration pathways of pollutants and predicted pollution levels in basins with limited data availability. Our analysis of pollutant distribution data revealed that Cd had the most extensive pollution range, with the most severe pollution occurring in the Huaihe and Yangtze River basins. Furthermore, we identified distinct groups of driving factors influencing various heavy metals. Cd, Cr, and Pb were primarily influenced by human activities, while Cu and Ni were affected by both anthropogenic and natural factors, and Zn tended more towards natural sources. Our predictions indicated that, in addition to the typical highly polluted areas, the potential risk of Cd, Cu and Ni is higher in Xinjiang, and in Tibet and Qinghai, the potential risk of Cd, Cr, Cu and Ni is higher. Pb and Zn presented lower risks, except in the Huaihe and Yangtze River Basins. Temperature, wind, precipitation, precipitation rate, and the cation exchange capacity of soil significantly impacted the predictions of heavy metal pollution in sediments, suggesting that particulate migration, rainfall runoff, and soil erosion are likely the main pathways for pollutant migration into sediments. Considering the migration, pathways, and sources of pollutants, we propose strategies such as low-impact development and promoting sustainable transportation to mitigate pollution. This study provides the latest insights into heavy metal pollution in Chinese lake sediments, offering references for policy-making and water resource management.
Collapse
Affiliation(s)
- Luqi Wang
- Hubei Key Laboratory of Multi-media Pollution Cooperative Control in Yangtze Basin, School of Environmental Science and Engineering, Huazhong University of Science and Technology, 1037 Luoyu Road, Wuhan, Hubei 430074, China
| | - Dongsheng Liu
- Hubei Key Laboratory of Multi-media Pollution Cooperative Control in Yangtze Basin, School of Environmental Science and Engineering, Huazhong University of Science and Technology, 1037 Luoyu Road, Wuhan, Hubei 430074, China
| | - Yifan Sun
- Hubei Key Laboratory of Multi-media Pollution Cooperative Control in Yangtze Basin, School of Environmental Science and Engineering, Huazhong University of Science and Technology, 1037 Luoyu Road, Wuhan, Hubei 430074, China
| | - Yinsheng Zhang
- Hubei Key Laboratory of Multi-media Pollution Cooperative Control in Yangtze Basin, School of Environmental Science and Engineering, Huazhong University of Science and Technology, 1037 Luoyu Road, Wuhan, Hubei 430074, China; School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| | - Wei Chen
- Yangtze Clean Energy Conservation and Environmental Protection Co., Ltd, Shanghai 201718, PR China
| | - Yi Yuan
- Yangtze Clean Energy Conservation and Environmental Protection Co., Ltd, Shanghai 201718, PR China
| | - Shengchao Hu
- Research Center for Environmental Ecology and Engineering, School of Environmental Ecology and Biological Engineering, Wuhan Institute of Technology, Wuhan 430205, PR China.
| | - Sen Li
- Hubei Key Laboratory of Multi-media Pollution Cooperative Control in Yangtze Basin, School of Environmental Science and Engineering, Huazhong University of Science and Technology, 1037 Luoyu Road, Wuhan, Hubei 430074, China.
| |
Collapse
|
4
|
Islam MM, Rahman MJ, Rabby MS, Alam MJ, Pollob SMAI, Ahmed NAMF, Tawabunnahar M, Roy DC, Shin J, Maniruzzaman M. Predicting the risk of diabetic retinopathy using explainable machine learning algorithms. Diabetes Metab Syndr 2023; 17:102919. [PMID: 38091881 DOI: 10.1016/j.dsx.2023.102919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 11/24/2023] [Accepted: 11/26/2023] [Indexed: 12/31/2023]
Abstract
BACKGROUND AND OBJECTIVE Diabetic retinopathy (DR) is a global health concern among diabetic patients. The objective of this study was to propose an explainable machine learning (ML)-based system for predicting the risk of DR. MATERIALS AND METHODS This study utilized publicly available cross-sectional data in a Chinese cohort of 6374 respondents. We employed boruta and least absolute shrinkage and selection operator (LASSO) based feature selection methods to identify the common predictors of DR. Using the identified predictors, we trained and optimized four widly applicable models (artificial neural network, support vector machine, random forest, and extreme gradient boosting (XGBoost) to predict patients with DR. Moreover, shapely additive explanation (SHAP) was adopted to show the contribution of each predictor of DR in the prediction. RESULTS Combining Boruta and LASSO method revealed that community, TCTG, HDLC, BUN, FPG, HbAlc, weight, and duration were the most important predictors of DR. The XGBoost-based model outperformed the other models, with an accuracy of 90.01%, precision of 91.80%, recall of 97.91%, F1 score of 94.86%, and AUC of 0.850. Moreover, SHAP method showed that HbA1c, community, FPG, TCTG, duration, and UA1b were the influencing predictors of DR. CONCLUSION The proposed integrating system will be helpful as a tool for selecting significant predictors, which can predict patients who are at high risk of DR at an early stage in China.
Collapse
Affiliation(s)
- Md Merajul Islam
- Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh; Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh-2224, Bangladesh.
| | - Md Jahanur Rahman
- Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh.
| | - Md Symun Rabby
- Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh-2224, Bangladesh.
| | - Md Jahangir Alam
- Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh.
| | | | - N A M Faisal Ahmed
- Instutite of Education and Research, University of Rajshahi, Rajshahi-6205, Bangladesh.
| | - Most Tawabunnahar
- Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh-2224, Bangladesh.
| | - Dulal Chandra Roy
- Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh.
| | - Junpil Shin
- School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, 965-8580, Fukushima, Japan.
| | - Md Maniruzzaman
- Statistics Discipline, Khulna University, Khulna-9208, Bangladesh.
| |
Collapse
|
5
|
Affiliation(s)
- Eugene Braunwald
- TIMI Study Group, Division of Cardiovascular Medicine, and Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Hale Building for Transformative Medicine, Suite 7022, 60 Fenwood Road, Boston, MA 02115, USA
| |
Collapse
|
6
|
Vo HK, Nguyen DV, Vu TT, Tran HB, Nguyen HTT. Prevalence and risk factors of prehypertension/hypertension among freshman students from the Vietnam National University: a cross-sectional study. BMC Public Health 2023; 23:1166. [PMID: 37328903 PMCID: PMC10276403 DOI: 10.1186/s12889-023-16118-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 06/13/2023] [Indexed: 06/18/2023] Open
Abstract
BACKGROUND Prehypertension (PHT) and hypertension (HTN) in young adults are essential risk factors for other cardiovascular diseases (CVD) in later years of life. However, there is a lack of knowledge about the burden and risk factors of PHT/HTN for Vietnamese youth. The aim of this study was to investigate the prevalence of PHT/HTN and risk factors among university students in Hanoi, Vietnam. METHODS This study was designed as a cross-sectional investigation with 840 students (394 males and 446 females) randomly sampled from freshmen of Vietnam National University, Hanoi (VNU). Socio-demographic, anthropometric, and lifestyle data were collected using questionnaire forms and physical measurements. HTN was defined as blood pressure (BP) ≥ 140/90 mmHg and/or current treatment with antihypertensive medications. PHT was defined as a systolic BP from 120 to 139 mmHg and/or a diastolic BP from 80 to 89 mmHg. Body mass index (BMI) was classified according to the WHO diagnostic criteria for Asian adults: normal weight (BMI 18.5-22.9 kg/m2), underweight (BMI < 18.5 kg/m2), overweight (BMI 23-24.9 kg/m2), and obese (BMI ≥ 25 kg/m2). Bivariable and multivariable log-binomial regression analyses were conducted to explore the association of PHT/HTN with different risk factors. RESULTS The overall prevalence of prehypertension and hypertension was 33.5% [95% CI: 30.3-36.8%] (54.1% in men and 15.3% in women) and 1.4% [95% CI: 0.7-2.5%] (2.5% in men and 0.5% in women), respectively. Regarding CVD major risk factors, 119 (14.2%) were identified as overweight/obese, 461 (54.9%) were physical inactivity, 29.4% of men and 8.1% of women reported consuming alcohol. The multivariable analysis indicated the male sex (adjusted prevalence ratio [aPR] = 3.07; 95% CI: 2.32-4.06), alcohol consumption (aPR = 1.28; 95% CI: 1.03-1.59) and obesity (aPR = 1.35; 95% CI: 1.08-1.68) as the independent risk factors for PHT/HTN. CONCLUSIONS The results revealed the high burden of prehypertension and hypertension among university freshmen in VNU. Male sex, alcohol consumption, and obesity were identified as important risk factors for PHT/HTN. Our study suggests an early screening program for PHT/HTN and campaigns to promote a healthy lifestyle for young adults in Vietnam.
Collapse
Affiliation(s)
- Hong-Khoi Vo
- Neurology Center, Bach Mai Hospital, Hanoi, Vietnam
- Department of Neurology, VNU-University of Medicine and Pharmacy, Hanoi, Vietnam
| | - Dung Viet Nguyen
- Department of Internal Medicine, VNU-University of Medicine and Pharmacy, Hanoi, Vietnam.
- Vietnam National Heart Institute, Bach Mai Hospital, Hanoi, Vietnam.
| | - Thom Thi Vu
- Department of Basic Medical Sciences, VNU-University of Medicine and Pharmacy, Hanoi, Vietnam
| | - Hieu Ba Tran
- Department of Internal Medicine, VNU-University of Medicine and Pharmacy, Hanoi, Vietnam
- Vietnam National Heart Institute, Bach Mai Hospital, Hanoi, Vietnam
| | - Hoai Thi Thu Nguyen
- Department of Internal Medicine, VNU-University of Medicine and Pharmacy, Hanoi, Vietnam.
- Vietnam National Heart Institute, Bach Mai Hospital, Hanoi, Vietnam.
| |
Collapse
|