1
|
MacCarthy G, Pazoki R. Using Machine Learning to Evaluate the Value of Genetic Liabilities in the Classification of Hypertension within the UK Biobank. J Clin Med 2024; 13:2955. [PMID: 38792496 PMCID: PMC11122671 DOI: 10.3390/jcm13102955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/01/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024] Open
Abstract
Background and Objective: Hypertension increases the risk of cardiovascular diseases (CVD) such as stroke, heart attack, heart failure, and kidney disease, contributing to global disease burden and premature mortality. Previous studies have utilized statistical and machine learning techniques to develop hypertension prediction models. Only a few have included genetic liabilities and evaluated their predictive values. This study aimed to develop an effective hypertension classification model and investigate the potential influence of genetic liability for multiple risk factors linked to CVD on hypertension risk using the random forest and the neural network. Materials and Methods: The study involved 244,718 European participants, who were divided into training and testing sets. Genetic liabilities were constructed using genetic variants associated with CVD risk factors obtained from genome-wide association studies (GWAS). Various combinations of machine learning models before and after feature selection were tested to develop the best classification model. The models were evaluated using area under the curve (AUC), calibration, and net reclassification improvement in the testing set. Results: The models without genetic liabilities achieved AUCs of 0.70 and 0.72 using the random forest and the neural network methods, respectively. Adding genetic liabilities improved the AUC for the random forest but not for the neural network. The best classification model was achieved when feature selection and classification were performed using random forest (AUC = 0.71, Spiegelhalter z score = 0.10, p-value = 0.92, calibration slope = 0.99). This model included genetic liabilities for total cholesterol and low-density lipoprotein (LDL). Conclusions: The study highlighted that incorporating genetic liabilities for lipids in a machine learning model may provide incremental value for hypertension classification beyond baseline characteristics.
Collapse
Affiliation(s)
- Gideon MacCarthy
- Cardiovascular and Metabolic Research Group, Division of Biomedical Sciences, Department of Life Sciences, College of Health, Medicine and Life Sciences, Brunel University London, London UB8 3PH, UK
| | - Raha Pazoki
- Cardiovascular and Metabolic Research Group, Division of Biomedical Sciences, Department of Life Sciences, College of Health, Medicine and Life Sciences, Brunel University London, London UB8 3PH, UK
- MRC Centre for Environment and Health, Department of Epidemiology and Biostatistics, School of Public Health, St Mary’s Campus, Norfolk Place, Imperial College London, London W2 1PG, UK
| |
Collapse
|
2
|
Schjerven FE, Lindseth F, Steinsland I. Prognostic risk models for incident hypertension: A PRISMA systematic review and meta-analysis. PLoS One 2024; 19:e0294148. [PMID: 38466745 PMCID: PMC10927109 DOI: 10.1371/journal.pone.0294148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 10/26/2023] [Indexed: 03/13/2024] Open
Abstract
OBJECTIVE Our goal was to review the available literature on prognostic risk prediction for incident hypertension, synthesize performance, and provide suggestions for future work on the topic. METHODS A systematic search on PUBMED and Web of Science databases was conducted for studies on prognostic risk prediction models for incident hypertension in generally healthy individuals. Study-quality was assessed using the Prediction model Risk of Bias Assessment Tool (PROBAST) checklist. Three-level meta-analyses were used to obtain pooled AUC/C-statistic estimates. Heterogeneity was explored using study and cohort characteristics in meta-regressions. RESULTS From 5090 hits, we found 53 eligible studies, and included 47 in meta-analyses. Only four studies were assessed to have results with low risk of bias. Few models had been externally validated, with only the Framingham risk model validated more than thrice. The pooled AUC/C-statistics were 0.82 (0.77-0.86) for machine learning models and 0.78 (0.76-0.80) for traditional models, with high heterogeneity in both groups (I2 > 99%). Intra-class correlations within studies were 60% and 90%, respectively. Follow-up time (P = 0.0405) was significant for ML models and age (P = 0.0271) for traditional models in explaining heterogeneity. Validations of the Framingham risk model had high heterogeneity (I2 > 99%). CONCLUSION Overall, the quality of included studies was assessed as poor. AUC/C-statistic were mostly acceptable or good, and higher for ML models than traditional models. High heterogeneity implies large variability in the performance of new risk models. Further, large heterogeneity in validations of the Framingham risk model indicate variability in model performance on new populations. To enable researchers to assess hypertension risk models, we encourage adherence to existing guidelines for reporting and developing risk models, specifically reporting appropriate performance measures. Further, we recommend a stronger focus on validation of models by considering reasonable baseline models and performing external validations of existing models. Hence, developed risk models must be made available for external researchers.
Collapse
Affiliation(s)
- Filip Emil Schjerven
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
| | - Frank Lindseth
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
| | - Ingelin Steinsland
- Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
3
|
Schjerven FE, Ingeström EML, Steinsland I, Lindseth F. Development of risk models of incident hypertension using machine learning on the HUNT study data. Sci Rep 2024; 14:5609. [PMID: 38454041 PMCID: PMC10920790 DOI: 10.1038/s41598-024-56170-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 03/03/2024] [Indexed: 03/09/2024] Open
Abstract
In this study, we aimed to create an 11-year hypertension risk prediction model using data from the Trøndelag Health (HUNT) Study in Norway, involving 17 852 individuals (20-85 years; 38% male; 24% incidence rate) with blood pressure (BP) below the hypertension threshold at baseline (1995-1997). We assessed 18 clinical, behavioral, and socioeconomic features, employing machine learning models such as eXtreme Gradient Boosting (XGBoost), Elastic regression, K-Nearest Neighbor, Support Vector Machines (SVM) and Random Forest. For comparison, we used logistic regression and a decision rule as reference models and validated six external models, with focus on the Framingham risk model. The top-performing models consistently included XGBoost, Elastic regression and SVM. These models efficiently identified hypertension risk, even among individuals with optimal baseline BP (< 120/80 mmHg), although improvement over reference models was modest. The recalibrated Framingham risk model outperformed the reference models, approaching the best-performing ML models. Important features included age, systolic and diastolic BP, body mass index, height, and family history of hypertension. In conclusion, our study demonstrated that linear effects sufficed for a well-performing model. The best models efficiently predicted hypertension risk, even among those with optimal or normal baseline BP, using few features. The recalibrated Framingham risk model proved effective in our cohort.
Collapse
Affiliation(s)
- Filip Emil Schjerven
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway.
| | - Emma Maria Lovisa Ingeström
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, Trondheim, Norway
| | - Ingelin Steinsland
- Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Frank Lindseth
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
4
|
Huang AA, Huang SY. Shapely additive values can effectively visualize pertinent covariates in machine learning when predicting hypertension. J Clin Hypertens (Greenwich) 2023; 25:1135-1144. [PMID: 37971610 PMCID: PMC10710553 DOI: 10.1111/jch.14745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 10/16/2023] [Accepted: 10/17/2023] [Indexed: 11/19/2023]
Abstract
Machine learning methods are widely used within the medical field to enhance prediction. However, little is known about the reliability and efficacy of these models to predict long-term medical outcomes such as blood pressure using lifestyle factors, such as diet. The authors assessed whether machine-learning techniques could accurately predict hypertension risk using nutritional information. A cross-sectional study using data from the National Health and Nutrition Examination Survey (NHANES) between January 2017 and March 2020. XGBoost was used as the machine-learning model of choice in this study due to its increased performance relative to other common methods within medical studies. Model prediction metrics (e.g., AUROC, Balanced Accuracy) were used to measure overall model efficacy, covariate Gain statistics (percentage each covariate contributes to the overall prediction) and SHapely Additive exPlanations (SHAP, method to visualize each covariate) were used to provide explanations to machine-learning output and increase the transparency of this otherwise cryptic method. Of a total of 9650 eligible patients, the mean age was 41.02 (SD = 22.16), 4792 (50%) males, 4858 (50%) female, 3407 (35%) White patients, 2567 (27%) Black patients, 2108 (22%) Hispanic patients, and 981 (10%) Asian patients. From evaluation of model gain statistics, age was found to be the single strongest predictor of hypertension, with a gain of 53.1%. Additionally, demographic factors such as poverty and Black race were also strong predictors of hypertension, with gain of 4.33% and 4.18%, respectively. Nutritional Covariates contributed 37% to the overall prediction: Sodium, Caffeine, Potassium, and Alcohol intake being significantly represented within the model. Machine Learning can be used to predict hypertension.
Collapse
Affiliation(s)
- Alexander A. Huang
- Cornell UniversityNew YorkUSA
- Northwestern University Feinberg School of MedicineChicagoUSA
| | - Samuel Y. Huang
- Cornell UniversityNew YorkUSA
- Virginia Commonwealth University School of MedicineRichmondUSA
| |
Collapse
|
5
|
Limonova AS, Ershova AI, Kiseleva AV, Ramensky VE, Vyatkin YV, Kutsenko VA, Meshkov AN, Drapkina OM. Assessment of polygenic risk of hypertension. КАРДИОВАСКУЛЯРНАЯ ТЕРАПИЯ И ПРОФИЛАКТИКА 2023. [DOI: 10.15829/1728-8800-2022-3464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Hypertension (HTN) is a leading risk factor for the development of cardiovascular diseases. In recent decades, the rapid development of genetic tests, in particular genome-wide association study (GWAS), has made it possible to identify hundreds of nucleotide sequence variants associated with the development of HTN. One approach to improve the predictive power of genetic testing is to combine information about many nucleotide sequence variants into a single risk assessment system, often referred to as a genetic risk score. Within the framework of this review, the most significant publications on the study of the genetic risk score for HTN will be considered, and the features of their development and application will be discussed.
Collapse
Affiliation(s)
- A. S. Limonova
- National Medical Research Center for Therapy and Preventive Medicine
| | - A. I. Ershova
- National Medical Research Center for Therapy and Preventive Medicine
| | - A. V. Kiseleva
- National Medical Research Center for Therapy and Preventive Medicine
| | - V. E. Ramensky
- National Medical Research Center for Therapy and Preventive Medicine; Lomonosov Moscow State University
| | - Yu. V. Vyatkin
- National Medical Research Center for Therapy and Preventive Medicine; Novosibirsk National Research State University
| | - V. A. Kutsenko
- National Medical Research Center for Therapy and Preventive Medicine; Faculty of Mechanics and Mathematics, Lomonosov Moscow State University
| | - A. N. Meshkov
- National Medical Research Center for Therapy and Preventive Medicine; Pirogov Russian National Research Medical University
| | - O. M. Drapkina
- National Medical Research Center for Therapy and Preventive Medicine
| |
Collapse
|
6
|
Ji W, Zhang Y, Cheng Y, Wang Y, Zhou Y. Development and validation of prediction models for hypertension risks: A cross-sectional study based on 4,287,407 participants. Front Cardiovasc Med 2022; 9:928948. [PMID: 36225955 PMCID: PMC9548597 DOI: 10.3389/fcvm.2022.928948] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 08/29/2022] [Indexed: 11/13/2022] Open
Abstract
ObjectiveTo develop an optimal screening model to identify the individuals with a high risk of hypertension in China by comparing tree-based machine learning models, such as classification and regression tree, random forest, adaboost with a decision tree, extreme gradient boosting decision tree, and other machine learning models like an artificial neural network, naive Bayes, and traditional logistic regression models.MethodsA total of 4,287,407 adults participating in the national physical examination were included in the study. Features were selected using the least absolute shrinkage and selection operator regression. The Borderline synthetic minority over-sampling technique was used for data balance. Non-laboratory and semi-laboratory analyses were carried out in combination with the selected features. The tree-based machine learning models, other machine learning models, and traditional logistic regression models were constructed to identify individuals with hypertension, respectively. Top features selected using the best algorithm and the corresponding variable importance score were visualized.ResultsA total of 24 variables were finally included for analyses after the least absolute shrinkage and selection operator regression model. The sample size of hypertensive patients in the training set was expanded from 689,025 to 2,312,160 using the borderline synthetic minority over-sampling technique algorithm. The extreme gradient boosting decision tree algorithm showed the best results (area under the receiver operating characteristic curve of non-laboratory: 0.893 and area under the receiver operating characteristic curve of semi-laboratory: 0.894). This study found that age, systolic blood pressure, waist circumference, diastolic blood pressure, albumin, drinking frequency, electrocardiogram, ethnicity (uyghur, hui, and other), body mass index, sex (female), exercise frequency, diabetes mellitus, and total bilirubin are important factors reflecting hypertension. Besides, some algorithms included in the semi-laboratory analyses showed less improvement in the predictive performance compared to the non-laboratory analyses.ConclusionUsing multiple methods, a more significant prediction model can be built, which discovers risk factors and provides new insights into the prediction and prevention of hypertension.
Collapse
Affiliation(s)
- Weidong Ji
- Department of Medical Information, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Yushan Zhang
- Department of Maternal and Child Health, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Yinlin Cheng
- Department of Medical Information, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Yushan Wang
- Center of Health Management, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
- *Correspondence: Yushan Wang
| | - Yi Zhou
- Department of Medical Information, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- Yi Zhou
| |
Collapse
|