1
|
Cruz EO, Sakowitz S, Mallick S, Le N, Chervu N, Bakhtiyar SS, Benharash P. Machine learning prediction of hospitalization costs for coronary artery bypass grafting operations. Surgery 2024:S0039-6060(24)00216-2. [PMID: 38760232 DOI: 10.1016/j.surg.2024.03.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/21/2024] [Accepted: 03/21/2024] [Indexed: 05/19/2024]
Abstract
BACKGROUND With the steady rise in health care expenditures, the examination of factors that may influence the costs of care has garnered much attention. Although machine learning models have previously been applied in health economics, their application within cardiac surgery remains limited. We evaluated several machine learning algorithms to model hospitalization costs for coronary artery bypass grafting. METHODS All adult hospitalizations for isolated coronary artery bypass grafting were identified in the 2016 to 2020 Nationwide Readmissions Database. Machine learning models were trained to predict expenditures and compared with traditional linear regression. Given the significance of postoperative length of stay, we additionally developed models excluding postoperative length of stay to uncover other drivers of costs. To facilitate comparison, machine learning classification models were also trained to predict patients in the highest decile of costs. Significant factors associated with high cost were identified using SHapley Additive exPlanations beeswarm plots. RESULTS Among 444,740 hospitalizations included for analysis, the median cost of hospitalization in coronary artery bypass grafting patients was $43,103. eXtreme Gradient Boosting most accurately predicted hospitalization costs, with R2 = 0.519 over the validation set. The top predictive features in the eXtreme Gradient Boosting model included elective procedure status, prolonged mechanical ventilation, new-onset respiratory failure or myocardial infarction, and postoperative length of stay. After removing postoperative length of stay, eXtreme Gradient Boosting remained the most accurate model (R2 = 0.38). Prolonged ventilation, respiratory failure, and elective status remained important predictive parameters. CONCLUSION Machine learning models appear to accurately model total hospitalization costs for coronary artery bypass grafting. Future work is warranted to uncover other drivers of costs and improve the value of care in cardiac surgery.
Collapse
Affiliation(s)
- Emma O Cruz
- Cardiovascular Outcomes Research Laboratory, University of California, Los Angeles, CA; Computer Science Department, Stanford University, Palo Alto, CA
| | - Sara Sakowitz
- Cardiovascular Outcomes Research Laboratory, University of California, Los Angeles, CA
| | - Saad Mallick
- Cardiovascular Outcomes Research Laboratory, University of California, Los Angeles, CA
| | - Nguyen Le
- Cardiovascular Outcomes Research Laboratory, University of California, Los Angeles, CA
| | - Nikhil Chervu
- Cardiovascular Outcomes Research Laboratory, University of California, Los Angeles, CA
| | - Syed Shahyan Bakhtiyar
- Cardiovascular Outcomes Research Laboratory, University of California, Los Angeles, CA; Department of Surgery, University of Colorado, Aurora, CO
| | - Peyman Benharash
- Cardiovascular Outcomes Research Laboratory, University of California, Los Angeles, CA; Division of Cardiac Surgery, Department of Surgery, University of California, Los Angeles, CA.
| |
Collapse
|
2
|
Miles TJ, Ghanta RK. Machine learning in cardiac surgery: a narrative review. J Thorac Dis 2024; 16:2644-2653. [PMID: 38738250 PMCID: PMC11087616 DOI: 10.21037/jtd-23-1659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 03/15/2024] [Indexed: 05/14/2024]
Abstract
Background and Objective Machine learning (ML) is increasingly being utilized to provide data driven solutions to challenges in medicine. Within the field of cardiac surgery, ML methods have been employed as risk stratification tools to predict a variety of operative outcomes. However, the clinical utility of ML in this domain is unclear. The aim of this review is to provide an overview of ML in cardiac surgery, particularly with regards to its utility in predictive analytics and implications for use in clinical decision support. Methods We performed a narrative review of relevant articles indexed in PubMed since 2000 using the MeSH terms "Machine Learning", "Supervised Machine Learning", "Deep Learning", or "Artificial Intelligence" and "Cardiovascular Surgery" or "Thoracic Surgery". Key Content and Findings ML methods have been widely used to generate pre-operative risk profiles, consistently resulting in the accurate prediction of clinical outcomes in cardiac surgery. However, improvement in predictive performance over traditional risk metrics has proven modest and current applications in the clinical setting remain limited. Conclusions Studies utilizing high volume, multidimensional data such as that derived from electronic health record (EHR) data appear to best demonstrate the advantages of ML methods. Models trained on post cardiac surgery intensive care unit data demonstrate excellent predictive performance and may provide greater clinical utility if incorporated as clinical decision support tools. Further development of ML models and their integration into EHR's may result in dynamic clinical decision support strategies capable of informing clinical care and improving outcomes in cardiac surgery.
Collapse
Affiliation(s)
- Travis J. Miles
- Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, TX, USA
- Applied Statistics and Machine Learning for the Advancement of Surgery, Department of Surgery, Baylor College of Medicine, Houston, TX, USA
| | - Ravi K. Ghanta
- Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, TX, USA
- Applied Statistics and Machine Learning for the Advancement of Surgery, Department of Surgery, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
3
|
Langenberger B. Machine learning as a tool to identify inpatients who are not at risk of adverse drug events in a large dataset of a tertiary care hospital in the USA. Br J Clin Pharmacol 2023; 89:3523-3538. [PMID: 37430382 DOI: 10.1111/bcp.15846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 07/03/2023] [Accepted: 07/06/2023] [Indexed: 07/12/2023] Open
Abstract
AIMS Adverse drug events (ADEs) are a major threat to inpatients in the United States of America (USA). It is unknown how well machine learning (ML) is able to predict whether or not a patient will suffer from an ADE during hospital stay based on data available at hospital admission for emergency department patients of all ages (binary classification task). It is further unknown whether ML is able to outperform logistic regression (LR) in doing so, and which variables are the most important predictors. METHODS In this study, 5 ML models- namely a random forest, gradient boosting machine (GBM), ridge regression, least absolute shrinkage and selection operator (LASSO) regression, and elastic net regression-as well as a LR were trained and tested for the prediction of inpatient ADEs identified using ICD-10-CM codes based on comprehensive previous work in a diverse population. In total, 210 181 observations from patients who were admitted to a large tertiary care hospital after emergency department stay between 2011 and 2019 were included. The area under the receiver operating characteristics curve (AUC) and AUC-precision-recall (AUC-PR) were used as primary performance indicators. RESULTS Tree-based models performed best with respect to AUC and AUC-PR. The gradient boosting machine (GBM) reached an AUC of 0.747 (95% confidence interval (CI): 0.735 to 0.759) and an AUC-PR of 0.134 (95% CI: 0.131 to 0.137) on unforeseen test data, while the random forest reached an AUC of 0.743 (95% CI: 0.731 to 0.755) and an AUC-PR of 0.139 (95% CI: 0.135 to 0.142), respectively. ML statistically significantly outperformed LR both on AUC and AUC-PR. Nonetheless, overall, models did not differ much with respect to their performance. Most important predictors were admission type, temperature and chief complaint for the best performing model (GBM). CONCLUSIONS The study demonstrated a first application of ML to predict inpatient ADEs based on ICD-10-CM codes, and a comparison with LR. Future research should address concerns arising from low precision and related problems.
Collapse
Affiliation(s)
- Benedikt Langenberger
- Department of Health Care Management, Technische Universität Berlin, Berlin, Germany
| |
Collapse
|
4
|
Behnoush AH, Khalaji A, Rezaee M, Momtahen S, Mansourian S, Bagheri J, Masoudkabir F, Hosseini K. Machine learning-based prediction of 1-year mortality in hypertensive patients undergoing coronary revascularization surgery. Clin Cardiol 2023; 46:269-278. [PMID: 36588391 PMCID: PMC10018097 DOI: 10.1002/clc.23963] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 12/12/2022] [Accepted: 12/19/2022] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Machine learning (ML) has shown promising results in all fields of medicine, including preventive cardiology. Hypertensive patients are at higher risk of mortality after coronary artery bypass graft (CABG) surgery; thus, we aimed to design and evaluate five ML models to predict 1-year mortality among hypertensive patients who underwent CABG. HYOTHESIS ML algorithms can significantly improve mortality prediction after CABG. METHODS Tehran Heart Center's CABG data registry was used to extract several baseline and peri-procedural characteristics and mortality data. The best features were chosen using random forest (RF) feature selection algorithm. Five ML models were developed to predict 1-year mortality: logistic regression (LR), RF, artificial neural network (ANN), extreme gradient boosting (XGB), and naïve Bayes (NB). The area under the curve (AUC), sensitivity, and specificity were used to evaluate the models. RESULTS Among the 8,493 hypertensive patients who underwent CABG (mean age of 68.27 ± 9.27 years), 303 died in the first year. Eleven features were selected as the best predictors, among which total ventilation hours and ejection fraction were the leading ones. LR showed the best prediction ability with an AUC of 0.82, while the least AUC was for the NB model (0.79). Among the subgroups, the highest AUC for LR model was for two age range groups (50-59 and 80-89 years), overweight, diabetic, and smoker subgroups of hypertensive patients. CONCLUSIONS All ML models had excellent performance in predicting 1-year mortality among CABG hypertension patients, while LR was the best regarding AUC. These models can help clinicians assess the risk of mortality in specific subgroups at higher risk (such as hypertensive ones).
Collapse
Affiliation(s)
- Amir Hossein Behnoush
- Tehran Heart Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran.,Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran.,School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.,Non-Communicable Diseases Research Center, Endocrinology and Metabolism Population Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Amirmohammad Khalaji
- Tehran Heart Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran.,Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran.,School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.,Non-Communicable Diseases Research Center, Endocrinology and Metabolism Population Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Malihe Rezaee
- Tehran Heart Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran.,Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran.,Non-Communicable Diseases Research Center, Endocrinology and Metabolism Population Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran.,School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Shahram Momtahen
- Department of Surgery, Tehran Heart Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Soheil Mansourian
- Department of Surgery, Tehran Heart Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Jamshid Bagheri
- Department of Surgery, Tehran Heart Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Farzad Masoudkabir
- Tehran Heart Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran.,Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Kaveh Hosseini
- Tehran Heart Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran.,Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
5
|
Chang W, Wang X, Yang J, Qin T. An Improved CatBoost-Based Classification Model for Ecological Suitability of Blueberries. SENSORS (BASEL, SWITZERLAND) 2023; 23:1811. [PMID: 36850409 PMCID: PMC9961688 DOI: 10.3390/s23041811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 01/30/2023] [Accepted: 01/31/2023] [Indexed: 06/18/2023]
Abstract
Selecting the best planting area for blueberries is an essential issue in agriculture. To better improve the effectiveness of blueberry cultivation, a machine learning-based classification model for blueberry ecological suitability was proposed for the first time and its validation was conducted by using multi-source environmental features data in this paper. The sparrow search algorithm (SSA) was adopted to optimize the CatBoost model and classify the ecological suitability of blueberries based on the selection of data features. Firstly, the Borderline-SMOTE algorithm was used to balance the number of positive and negative samples. The Variance Inflation Factor and information gain methods were applied to filter out the factors affecting the growth of blueberries. Subsequently, the processed data were fed into the CatBoost for training, and the parameters of the CatBoost were optimized to obtain the optimal model using SSA. Finally, the SSA-CatBoost model was adopted to classify the ecological suitability of blueberries and output the suitability types. Taking a study on a blueberry plantation in Majiang County, Guizhou Province, China as an example, the findings demonstrate that the AUC value of the SSA-CatBoost-based blueberry ecological suitability model is 0.921, which is 2.68% higher than that of the CatBoost (AUC = 0.897) and is significantly higher than Logistic Regression (AUC = 0.855), Support Vector Machine (AUC = 0.864), and Random Forest (AUC = 0.875). Furthermore, the ecological suitability of blueberries in Majiang County is mapped according to the classification results of different models. When comparing the actual blueberry cultivation situation in Majiang County, the classification results of the SSA-CatBoost model proposed in this paper matches best with the real blueberry cultivation situation in Majiang County, which is of a high reference value for the selection of blueberry cultivation sites.
Collapse
Affiliation(s)
- Wenfeng Chang
- Department of Electrical Engineering, Guizhou University, Guiyang 550025, China
| | - Xiao Wang
- Department of Electrical Engineering, Guizhou University, Guiyang 550025, China
| | - Jing Yang
- Department of Electrical Engineering, Guizhou University, Guiyang 550025, China
| | - Tao Qin
- Department of Electrical Engineering, Guizhou University, Guiyang 550025, China
| |
Collapse
|
6
|
Khalaji A, Behnoush AH, Jameie M, Sharifi A, Sheikhy A, Fallahzadeh A, Sadeghian S, Pashang M, Bagheri J, Ahmadi Tafti SH, Hosseini K. Machine learning algorithms for predicting mortality after coronary artery bypass grafting. Front Cardiovasc Med 2022; 9:977747. [PMID: 36093147 PMCID: PMC9448905 DOI: 10.3389/fcvm.2022.977747] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 08/02/2022] [Indexed: 11/13/2022] Open
Abstract
BackgroundAs the era of big data analytics unfolds, machine learning (ML) might be a promising tool for predicting clinical outcomes. This study aimed to evaluate the predictive ability of ML models for estimating mortality after coronary artery bypass grafting (CABG).Materials and methodsVarious baseline and follow-up features were obtained from the CABG data registry, established in 2005 at Tehran Heart Center. After selecting key variables using the random forest method, prediction models were developed using: Logistic Regression (LR), Support Vector Machine (SVM), Naïve Bayes (NB), K-Nearest Neighbors (KNN), Extreme Gradient Boosting (XGBoost), and Random Forest (RF) algorithms. Area Under the Curve (AUC) and other indices were used to assess the performance.ResultsA total of 16,850 patients with isolated CABG (mean age: 67.34 ± 9.67 years) were included. Among them, 16,620 had one-year follow-up, from which 468 died. Eleven features were chosen to train the models. Total ventilation hours and left ventricular ejection fraction were by far the most predictive factors of mortality. All the models had AUC > 0.7 (acceptable performance) for 1-year mortality. Nonetheless, LR (AUC = 0.811) and XGBoost (AUC = 0.792) outperformed NB (AUC = 0.783), RF (AUC = 0.783), SVM (AUC = 0.738), and KNN (AUC = 0.715). The trend was similar for two-to-five-year mortality, with LR demonstrating the highest predictive ability.ConclusionVarious ML models showed acceptable performance for estimating CABG mortality, with LR illustrating the highest prediction performance. These models can help clinicians make decisions according to the risk of mortality in patients undergoing CABG.
Collapse
Affiliation(s)
- Amirmohammad Khalaji
- Tehran Heart Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
- Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Amir Hossein Behnoush
- Tehran Heart Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
- Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Mana Jameie
- Tehran Heart Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
- Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
- Non-communicable Diseases Research Center, Endocrinology and Metabolism Population Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Ali Sharifi
- Faculty of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran
| | - Ali Sheikhy
- Tehran Heart Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
- Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
- Non-communicable Diseases Research Center, Endocrinology and Metabolism Population Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Aida Fallahzadeh
- Tehran Heart Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
- Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
- Non-communicable Diseases Research Center, Endocrinology and Metabolism Population Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Saeed Sadeghian
- Tehran Heart Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Mina Pashang
- Tehran Heart Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Jamshid Bagheri
- Tehran Heart Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Seyed Hossein Ahmadi Tafti
- Tehran Heart Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Kaveh Hosseini
- Tehran Heart Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
- Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research Institute, Tehran University of Medical Sciences, Tehran, Iran
- *Correspondence: Kaveh Hosseini,
| |
Collapse
|
7
|
Predicting Children with ADHD Using Behavioral Activity: A Machine Learning Analysis. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12052737] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Attention deficit hyperactivity disorder (ADHD) is one of childhood’s most frequent neurobehavioral disorders. The purpose of this study is to: (i) extract the most prominent risk factors for children with ADHD; and (ii) propose a machine learning (ML)-based approach to classify children as either having ADHD or healthy. We extracted the data of 45,779 children aged 3–17 years from the 2018–2019 National Survey of Children’s Health (NSCH, 2018–2019). About 5218 (11.4%) of children were ADHD, and the rest of the children were healthy. Since the class label is highly imbalanced, we adopted a combination of oversampling and undersampling approaches to make a balanced class label. We adopted logistic regression (LR) to extract the significant factors for children with ADHD based on p-values (<0.05). Eight ML-based classifiers such as random forest (RF), Naïve Bayes (NB), decision tree (DT), XGBoost, k-nearest neighborhood (KNN), multilayer perceptron (MLP), support vector machine (SVM), and 1-dimensional convolution neural network (1D CNN) were adopted for the prediction of children with ADHD. The average age of the children with ADHD was 12.4 ± 3.4 years. Our findings showed that RF-based classifier provided the highest classification accuracy of 85.5%, sensitivity of 84.4%, specificity of 86.4%, and an AUC of 0.94. This study illustrated that LR with RF-based system could provide excellent accuracy for classifying and predicting children with ADHD. This system will be helpful for early detection and diagnosis of ADHD.
Collapse
|