1
|
Pratiwi NKC, Tayara H, Chong KT. An Ensemble Classifiers for Improved Prediction of Native-Non-Native Protein-Protein Interaction. Int J Mol Sci 2024; 25:5957. [PMID: 38892144 PMCID: PMC11172808 DOI: 10.3390/ijms25115957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 05/27/2024] [Accepted: 05/27/2024] [Indexed: 06/21/2024] Open
Abstract
In this study, we present an innovative approach to improve the prediction of protein-protein interactions (PPIs) through the utilization of an ensemble classifier, specifically focusing on distinguishing between native and non-native interactions. Leveraging the strengths of various base models, including random forest, gradient boosting, extreme gradient boosting, and light gradient boosting, our ensemble classifier integrates these diverse predictions using a logistic regression meta-classifier. Our model was evaluated using a comprehensive dataset generated from molecular dynamics simulations. While the gains in AUC and other metrics might seem modest, they contribute to a model that is more robust, consistent, and adaptable. To assess the effectiveness of various approaches, we compared the performance of logistic regression to four baseline models. Our results indicate that logistic regression consistently underperforms across all evaluated metrics. This suggests that it may not be well-suited to capture the complex relationships within this dataset. Tree-based models, on the other hand, appear to be more effective for problems involving molecular dynamics simulations. Extreme gradient boosting (XGBoost) and light gradient boosting (LightGBM) are optimized for performance and speed, handling datasets effectively and incorporating regularizations to avoid over-fitting. Our findings indicate that the ensemble method enhances the predictive capability of PPIs, offering a promising tool for computational biology and drug discovery by accurately identifying potential interaction sites and facilitating the understanding of complex protein functions within biological systems.
Collapse
Affiliation(s)
- Nor Kumalasari Caecar Pratiwi
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea;
- Department of Electrical Engineering, Telkom University, Bandung 40257, West Java, Indonesia
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea;
- Advances Electronics and Information Research Centre, Jeonbuk National University, Jeonju 54896, Republic of Korea
| |
Collapse
|
2
|
Perschinka F, Peer A, Joannidis M. [Artificial intelligence and acute kidney injury]. Med Klin Intensivmed Notfmed 2024; 119:199-207. [PMID: 38396124 PMCID: PMC10995052 DOI: 10.1007/s00063-024-01111-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 01/17/2024] [Indexed: 02/25/2024]
Abstract
Digitalization is increasingly finding its way into intensive care units and with it artificial intelligence (AI) for critically ill patients. One promising area for the use of AI is in the field of acute kidney injury (AKI). The use of AI is primarily focused on the prediction of AKI, but further approaches are also being used to classify existing AKI into different phenotypes. Different AI models are used for prediction. The area under the receiver operating characteristic curve values (AUROC) achieved with these models vary and are influenced by several factors, such as the prediction time and the definition of AKI. Most models have an AUROC between 0.650 and 0.900, with lower values for predictions further into the future and when applying Acute Kidney Injury Network (AKIN) instead of KDIGO criteria. Classification into phenotypes already makes it possible to categorize patients into groups with different risks of mortality or requirement of renal replacement therapy (RRT), but the etiologies or therapeutic consequences derived from this are still lacking. However, all the models suffer from AI-specific shortcomings. The use of large databases does not make it possible to promptly include recent changes in therapy and the implementation of new biomarkers in a relevant proportion. For this reason, serum creatinine and urinary output, with their known limitations, dominate current AI models for prediction impairing the performance of the current models. On the other hand, the increasingly complex models no longer allow physicians to understand the basis on which the warning of a threatening AKI is calculated and subsequent initiation of therapy should take place. The successful use of AIs in routine clinical practice will be highly determined by the trust of the physicians in the systems and overcoming the aforementioned weaknesses. However, the clinician will remain irreplaceable as the decisive authority for critically ill patients by combining measurable and nonmeasurable parameters.
Collapse
Affiliation(s)
| | | | - Michael Joannidis
- Gemeinsame Einrichtung für Internistische Notfall- und Intensivmedizin, Department Innere Medizin, Medizinische Universität Innsbruck, Anichstraße 35, 6020, Innsbruck, Österreich.
| |
Collapse
|
3
|
Gil-Rojas S, Suárez M, Martínez-Blanco P, Torres AM, Martínez-García N, Blasco P, Torralba M, Mateo J. Application of Machine Learning Techniques to Assess Alpha-Fetoprotein at Diagnosis of Hepatocellular Carcinoma. Int J Mol Sci 2024; 25:1996. [PMID: 38396674 PMCID: PMC10888351 DOI: 10.3390/ijms25041996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 01/29/2024] [Accepted: 02/05/2024] [Indexed: 02/25/2024] Open
Abstract
Hepatocellular carcinoma (HCC) is the most common primary liver tumor and is associated with high mortality rates. Approximately 80% of cases occur in cirrhotic livers, posing a significant challenge for appropriate therapeutic management. Adequate screening programs in high-risk groups are essential for early-stage detection. The extent of extrahepatic tumor spread and hepatic functional reserve are recognized as two of the most influential prognostic factors. In this retrospective multicenter study, we utilized machine learning (ML) methods to analyze predictors of mortality at the time of diagnosis in a total of 208 patients. The eXtreme gradient boosting (XGB) method achieved the highest values in identifying key prognostic factors for HCC at diagnosis. The etiology of HCC was found to be the variable most strongly associated with a poorer prognosis. The widely used Barcelona Clinic Liver Cancer (BCLC) classification in our setting demonstrated superiority over the TNM classification. Although alpha-fetoprotein (AFP) remains the most commonly used biological marker, elevated levels did not correlate with reduced survival. Our findings suggest the need to explore new prognostic biomarkers for individualized management of these patients.
Collapse
Affiliation(s)
- Sergio Gil-Rojas
- Gastroenterology Department, Virgen de la Luz Hospital, 16002 Cuenca, Spain
- Medical Analysis Expert Group, Institute of Technology, Universidad de Castilla-La Mancha, 16071 Cuenca, Spain
- Medical Analysis Expert Group, Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| | - Miguel Suárez
- Gastroenterology Department, Virgen de la Luz Hospital, 16002 Cuenca, Spain
- Medical Analysis Expert Group, Institute of Technology, Universidad de Castilla-La Mancha, 16071 Cuenca, Spain
- Medical Analysis Expert Group, Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| | - Pablo Martínez-Blanco
- Gastroenterology Department, Virgen de la Luz Hospital, 16002 Cuenca, Spain
- Medical Analysis Expert Group, Institute of Technology, Universidad de Castilla-La Mancha, 16071 Cuenca, Spain
- Medical Analysis Expert Group, Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| | - Ana M. Torres
- Medical Analysis Expert Group, Institute of Technology, Universidad de Castilla-La Mancha, 16071 Cuenca, Spain
- Medical Analysis Expert Group, Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| | | | - Pilar Blasco
- Department of Pharmacy, General University Hospital, 46014 Valencia, Spain
| | - Miguel Torralba
- Internal Medicine Unit, University Hospital of Guadalajara, 19002 Guadalajara, Spain
- Faculty of Medicine, Universidad de Alcalá de Henares, 28801 Alcalá de Henares, Spain
- Translational Research Group in Cellular Immunology (GITIC), Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| | - Jorge Mateo
- Medical Analysis Expert Group, Institute of Technology, Universidad de Castilla-La Mancha, 16071 Cuenca, Spain
- Medical Analysis Expert Group, Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| |
Collapse
|
4
|
Ahire N, Awale RN, Wagh A. Classification of attention deficit hyperactivity disorder using machine learning on an EEG dataset. APPLIED NEUROPSYCHOLOGY. CHILD 2024:1-11. [PMID: 38163329 DOI: 10.1080/21622965.2023.2300078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
The neurodevelopmental disorder, Attention Deficit Hyperactivity Disorder (ADHD), frequently affecting youngsters, is characterized by persistent patterns of inattention, hyperactivity, and impulsivity, the etiology of which may involve a variety of genetic, environmental, and neurological factors. Electroencephalography (EEG) measures the electrical activity in the brain through neuronal activity, which is a function of cognitive processes. In this study, a previously recorded sample set of 121 children containing unbiased data from both ADHD and control group classes and EEG signals were analyzed to classify the ADHD patients. The samples were tested under different cognitive conditions, and multiple features were extracted using Euclidean distance. Many machine learning algorithms use Euclidean distance as their default distance metric to compare two recorded data points. The extracted features were trained using four supervised machine learning algorithms (linear regression, random forest, extreme gradient boosting, and K nearest neighbor (KNN)) based on the results of various frequency bands. The results suggest that the KNN algorithm produces the highest accuracy over other machine learning approaches, and results can be further improved with the application of hyperparameter tuning and used for classifying sub-groups of ADHD to identify the severity of the disorder.
Collapse
Affiliation(s)
- Nitin Ahire
- Xavier Institute of Engineering, Mumbai, India
| | | | | |
Collapse
|
5
|
Tong L, Sun Y, Zhu Y, Luo H, Wan W, Wu Y. Prognostic estimation for acute ischemic stroke patients undergoing mechanical thrombectomy within an extended therapeutic window using an interpretable machine learning model. Front Neuroinform 2023; 17:1273827. [PMID: 37901289 PMCID: PMC10603294 DOI: 10.3389/fninf.2023.1273827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 10/02/2023] [Indexed: 10/31/2023] Open
Abstract
Background Mechanical thrombectomy (MT) is effective for acute ischemic stroke with large vessel occlusion (AIS-LVO) within an extended therapeutic window. However, successful reperfusion does not guarantee positive prognosis, with around 40-50% of cases yielding favorable outcomes. Preoperative prediction of patient outcomes is essential to identify those who may benefit from MT. Although machine learning (ML) has shown promise in handling variables with non-linear relationships in prediction models, its "black box" nature and the absence of ML models for extended-window MT prognosis remain limitations. Objective This study aimed to establish and select the optimal model for predicting extended-window MT outcomes, with the Shapley additive explanation (SHAP) approach used to enhance the interpretability of the selected model. Methods A retrospective analysis was conducted on 260 AIS-LVO patients undergoing extended-window MT. Selected patients were allocated into training and test sets at a 3:1 ratio following inclusion and exclusion criteria. Four ML classifiers and one logistic regression (Logit) model were constructed using pre-treatment variables from the training set. The optimal model was selected through comparative validation, with key features interpreted using the SHAP approach. The effectiveness of the chosen model was further evaluated using the test set. Results Of the 212 selected patients, 159 comprised the training and 53 the test sets. Extreme gradient boosting (XGBoost) showed the highest discrimination with an area under the curve (AUC) of 0.93 during validation, and maintained an AUC of 0.77 during testing. SHAP analysis identified ischemic core volume, baseline NHISS score, ischemic penumbra volume, ASPECTS, and patient age as the top five determinants of outcome prediction. Conclusion XGBoost emerged as the most effective for predicting the prognosis of AIS-LVO patients undergoing MT within the extended therapeutic window. SHAP interpretation improved its clinical confidence, paving the way for ML in clinical decision-making.
Collapse
Affiliation(s)
- Lin Tong
- Department of Radiology Intervention, Shanghai Putuo District Liqun Hospital, Shanghai, China
| | - Yun Sun
- Department of Emergency, Shanghai Putuo District Liqun Hospital, Shanghai, China
| | - Yueqi Zhu
- Institute of Diagnostic and Interventional Radiology, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, Shanghai, China
| | - Hui Luo
- Department of Emergency, Shanghai Putuo District Liqun Hospital, Shanghai, China
| | - Wan Wan
- Department of Radiology Intervention, Shanghai Putuo District Liqun Hospital, Shanghai, China
| | - Ying Wu
- Department of Emergency, Shanghai Putuo District Liqun Hospital, Shanghai, China
| |
Collapse
|
6
|
Suárez M, Martínez R, Torres AM, Ramón A, Blasco P, Mateo J. A Machine Learning-Based Method for Detecting Liver Fibrosis. Diagnostics (Basel) 2023; 13:2952. [PMID: 37761319 PMCID: PMC10529519 DOI: 10.3390/diagnostics13182952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 09/03/2023] [Accepted: 09/12/2023] [Indexed: 09/29/2023] Open
Abstract
Cholecystectomy and Metabolic-associated steatotic liver disease (MASLD) are prevalent conditions in gastroenterology, frequently co-occurring in clinical practice. Cholecystectomy has been shown to have metabolic consequences, sharing similar pathological mechanisms with MASLD. A database of MASLD patients who underwent cholecystectomy was analysed. This study aimed to develop a tool to identify the risk of liver fibrosis after cholecystectomy. For this purpose, the extreme gradient boosting (XGB) algorithm was used to construct an effective predictive model. The factors associated with a better predictive method were platelet level, followed by dyslipidaemia and type-2 diabetes (T2DM). Compared to other ML methods, our proposed method, XGB, achieved higher accuracy values. The XGB method had the highest balanced accuracy (93.16%). XGB outperformed KNN in accuracy (93.16% vs. 84.45%) and AUC (0.92 vs. 0.84). These results demonstrate that the proposed XGB method can be used as an automatic diagnostic aid for MASLD patients based on machine-learning techniques.
Collapse
Affiliation(s)
- Miguel Suárez
- Gastroenterology Department, Virgen de la Luz Hospital, 16002 Cuenca, Spain
- Medical Analysis Expert Group, Institute of Technology, Universidad de Castilla-La Mancha, 16071 Cuenca, Spain
- Medical Analysis Expert Group, Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| | - Raquel Martínez
- Gastroenterology Department, Virgen de la Luz Hospital, 16002 Cuenca, Spain
- Medical Analysis Expert Group, Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| | - Ana María Torres
- Medical Analysis Expert Group, Institute of Technology, Universidad de Castilla-La Mancha, 16071 Cuenca, Spain
- Medical Analysis Expert Group, Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| | - Antonio Ramón
- Department of Pharmacy, General University Hospital, 46014 Valencia, Spain
| | - Pilar Blasco
- Department of Pharmacy, General University Hospital, 46014 Valencia, Spain
| | - Jorge Mateo
- Medical Analysis Expert Group, Institute of Technology, Universidad de Castilla-La Mancha, 16071 Cuenca, Spain
- Medical Analysis Expert Group, Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| |
Collapse
|
7
|
Kang BS, Lee SU, Hong S, Choi SK, Shin JE, Wie JH, Jo YS, Kim YH, Kil K, Chung YH, Jung K, Hong H, Park IY, Ko HS. Prediction of gestational diabetes mellitus in Asian women using machine learning algorithms. Sci Rep 2023; 13:13356. [PMID: 37587201 PMCID: PMC10432552 DOI: 10.1038/s41598-023-39680-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 07/28/2023] [Indexed: 08/18/2023] Open
Abstract
This study developed a machine learning algorithm to predict gestational diabetes mellitus (GDM) using retrospective data from 34,387 pregnancies in multi-centers of South Korea. Variables were collected at baseline, E0 (until 10 weeks' gestation), E1 (11-13 weeks' gestation) and M1 (14-24 weeks' gestation). The data set was randomly divided into training and test sets (7:3 ratio) to compare the performances of light gradient boosting machine (LGBM) and extreme gradient boosting (XGBoost) algorithms, with a full set of variables (original). A prediction model with the whole cohort achieved area under the receiver operating characteristics curve (AUC) and area under the precision-recall curve (AUPR) values of 0.711 and 0.246 at baseline, 0.720 and 0.256 at E0, 0.721 and 0.262 at E1, and 0.804 and 0.442 at M1, respectively. Then comparison of three models with different variable sets were performed: [a] variables from clinical guidelines; [b] selected variables from Shapley additive explanations (SHAP) values; and [c] Boruta algorithms. Based on model [c] with the least variables and similar or better performance than the other models, simple questionnaires were developed. The combined use of maternal factors and laboratory data could effectively predict individual risk of GDM using a machine learning model.
Collapse
Affiliation(s)
- Byung Soo Kang
- Department of Obstetrics and Gynecology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Seon Ui Lee
- Department of Obstetrics and Gynecology, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Subeen Hong
- Department of Obstetrics and Gynecology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Sae Kyung Choi
- Department of Obstetrics and Gynecology, Incheon St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Jae Eun Shin
- Department of Obstetrics and Gynecology, Bucheon St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Jeong Ha Wie
- Department of Obstetrics and Gynecology, Eunpyeong St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Yun Sung Jo
- Department of Obstetrics and Gynecology, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Yeon Hee Kim
- Department of Obstetrics and Gynecology, Uijeongbu St. Mary's Hospital,, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Kicheol Kil
- Department of Obstetrics and Gynecology, Yeouido St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Yoo Hyun Chung
- Department of Obstetrics and Gynecology, Daejeon St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | | | | | - In Yang Park
- Department of Obstetrics and Gynecology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Hyun Sun Ko
- Department of Obstetrics and Gynecology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea.
| |
Collapse
|
8
|
Predicting Genetic Disorder and Types of Disorder Using Chain Classifier Approach. Genes (Basel) 2022; 14:genes14010071. [PMID: 36672812 PMCID: PMC9858679 DOI: 10.3390/genes14010071] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 12/16/2022] [Accepted: 12/16/2022] [Indexed: 12/28/2022] Open
Abstract
Genetic disorders are the result of mutation in the deoxyribonucleic acid (DNA) sequence which can be developed or inherited from parents. Such mutations may lead to fatal diseases such as Alzheimer's, cancer, Hemochromatosis, etc. Recently, the use of artificial intelligence-based methods has shown superb success in the prediction and prognosis of different diseases. The potential of such methods can be utilized to predict genetic disorders at an early stage using the genome data for timely treatment. This study focuses on the multi-label multi-class problem and makes two major contributions to genetic disorder prediction. A novel feature engineering approach is proposed where the class probabilities from an extra tree (ET) and random forest (RF) are joined to make a feature set for model training. Secondly, the study utilizes the classifier chain approach where multiple classifiers are joined in a chain and the predictions from all the preceding classifiers are used by the conceding classifiers to make the final prediction. Because of the multi-label multi-class data, macro accuracy, Hamming loss, and α-evaluation score are used to evaluate the performance. Results suggest that extreme gradient boosting (XGB) produces the best scores with a 92% α-evaluation score and a 84% macro accuracy score. The performance of XGB is much better than state-of-the-art approaches, in terms of both performance and computational complexity.
Collapse
|
9
|
Casillas N, Torres AM, Moret M, Gómez A, Rius-Peris JM, Mateo J. Mortality predictors in patients with COVID-19 pneumonia: a machine learning approach using eXtreme Gradient Boosting model. Intern Emerg Med 2022; 17:1929-1939. [PMID: 36098861 PMCID: PMC9469825 DOI: 10.1007/s11739-022-03033-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 06/12/2022] [Indexed: 12/15/2022]
Abstract
Recently, global health has seen an increase in demand for assistance as a result of the COVID-19 pandemic. This has prompted many researchers to conduct different studies looking for variables that are associated with increased clinical risk, and find effective and safe treatments. Many of these studies have been limited by presenting small samples and a large data set. Using machine learning (ML) techniques we can detect parameters that help us to improve clinical diagnosis, since they are a system for the detection, prediction and treatment of complex data. ML techniques can be valuable for the study of COVID-19, especially because they can uncover complex patterns in large data sets. This retrospective study of 150 hospitalized adult COVID-19 patients, of which we established two groups, those who died were called Case group (n = 53) while the survivors were Control group (n = 98). For analysis, a supervised learning algorithm eXtreme Gradient Boosting (XGBoost) has been used due to its good response compared to other methods because it is highly efficient, flexible and portable. In this study, the response to different treatments has been evaluated and has made it possible to accurately predict which patients have higher mortality using artificial intelligence, obtaining better results compared to other ML methods.
Collapse
Affiliation(s)
- N. Casillas
- Departament of Internal Medicine, Hospital Virgen de la Luz, Cuenca, Spain
- Neurobiological Research Group, Institute of Technology, Castilla-La Mancha University, Cuenca, Spain
| | - A. M. Torres
- Neurobiological Research Group, Institute of Technology, Castilla-La Mancha University, Cuenca, Spain
| | - M. Moret
- Departament of Internal Medicine, Hospital Virgen de la Luz, Cuenca, Spain
| | - A. Gómez
- Departament of Internal Medicine, Hospital Virgen de la Luz, Cuenca, Spain
| | - J. M. Rius-Peris
- Neurobiological Research Group, Institute of Technology, Castilla-La Mancha University, Cuenca, Spain
- Departament of Pediatrics, Hospital Virgen de la Luz, Cuenca, Spain
| | - J. Mateo
- Neurobiological Research Group, Institute of Technology, Castilla-La Mancha University, Cuenca, Spain
| |
Collapse
|
10
|
Miranda D, Olivares R, Munoz R, Minonzio JG. Improvement of Patient Classification Using Feature Selection Applied to Bidirectional Axial Transmission. IEEE TRANSACTIONS ON ULTRASONICS, FERROELECTRICS, AND FREQUENCY CONTROL 2022; 69:2663-2671. [PMID: 35914050 DOI: 10.1109/tuffc.2022.3195477] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Osteoporosis is still a worldwide problem, particularly due to associated fragility fractures. Patients at risk of fracture are currently detected using the X-Ray gold standard dual-energy X-ray absorptiometry (DXA), based on a calibrated 2-D image. Different alternatives, such as 3-D X-rays, magnetic resonance imaging (MRI) or ultrasound, have been proposed, the latter having advantages of being portable and sensitive to mechanical and geometrical properties. Bidirectional axial transmission (BDAT) has been used to classify between patients with or without nontraumatic fractures using "classical" ultrasonic parameters, such as velocities, as well as cortical thickness and porosity, obtained from an inverse problems. Recently, complementary parameters acquired with structural and textural analysis of guided wave spectrum images (GWSIs) have been introduced. These parameters are not limited by solution ambiguities, as for inverse problem. The aim of the study is to improve the patient classification using a feature selection strategy for all available ultrasound features completed by clinical parameters. To this end, three classical feature ranking methods were considered: analysis of variance (ANOVA), recursive feature elimination (RFE), and extreme gradient boosting importance feature (XGBI). In order to evaluate the performance of the feature selection techniques, three classical classification methods were used: logistic regression (LR), support vector machine (SVM), and extreme gradient boosting (XGB). The database was obtained from a previous clinical study [Minonzio et al., 2019]. Results indicate that the best accuracy of 71 [66-76]% was achieved by using RFE and SVM with 22 (out of 43) ultrasonic and clinical features. This value outperformed the accuracy of 68 [64-73]% reached with 2 (out of 6) DXA and clinical features. These values open promising perspectives toward improved and generalizable classification of patients at risk of fracture.
Collapse
|
11
|
Ramón A, Torres AM, Milara J, Cascón J, Blasco P, Mateo J. eXtreme Gradient Boosting-based method to classify patients with COVID-19. J Investig Med 2022; 70:jim-2021-002278. [PMID: 35850970 DOI: 10.1136/jim-2021-002278] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/15/2022] [Indexed: 01/08/2023]
Abstract
Different demographic, clinical and laboratory variables have been related to the severity and mortality following SARS-CoV-2 infection. Most studies applied traditional statistical methods and in some cases combined with a machine learning (ML) method. This is the first study to date to comparatively analyze five ML methods to select the one that most closely predicts mortality in patients admitted with COVID-19. The aim of this single-center observational study is to classify, based on different types of variables, adult patients with COVID-19 at increased risk of mortality. SARS-CoV-2 infection was defined by a positive reverse transcriptase PCR. A total of 203 patients were admitted between March 15 and June 15, 2020 to a tertiary hospital. Data were extracted from the electronic medical record. Four supervised ML algorithms (k-nearest neighbors (KNN), decision tree (DT), Gaussian naïve Bayes (GNB) and support vector machine (SVM)) were compared with the eXtreme Gradient Boosting (XGB) method proposed to have excellent scalability and high running speed, among other qualities. The results indicate that the XGB method has the best prediction accuracy (92%), high precision (>0.92) and high recall (>0.92). The KNN, SVM and DT approaches present moderate prediction accuracy (>80%), moderate recall (>0.80) and moderate precision (>0.80). The GNB algorithm shows relatively low classification performance. The variables with the greatest weight in predicting mortality were C reactive protein, procalcitonin, glutamyl oxaloacetic transaminase, glutamyl pyruvic transaminase, neutrophils, D-dimer, creatinine, lactic acid, ferritin, days of non-invasive ventilation, septic shock and age. Based on these results, XGB is a solid candidate for correct classification of patients with COVID-19.
Collapse
Affiliation(s)
- Antonio Ramón
- Pharmacy Department, General University Hospital Consortium of Valencia, Valencia, Spain
| | - Ana Maria Torres
- Institute of Technology, Universidad de Castilla-La Mancha, Cuenca, Spain
| | - Javier Milara
- Pharmacy Department, General University Hospital Consortium of Valencia, Valencia, Spain
- Pharmacy Department, University of Valencia, Valencia, Spain
| | - Joaquín Cascón
- Institute of Technology, Universidad de Castilla-La Mancha, Cuenca, Spain
| | - Pilar Blasco
- Pharmacy Department, General University Hospital Consortium of Valencia, Valencia, Spain
| | - Jorge Mateo
- Institute of Technology, Universidad de Castilla-La Mancha, Cuenca, Spain
| |
Collapse
|
12
|
Different Scales of Medical Data Classification Based on Machine Learning Techniques: A Comparative Study. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12020919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In recent years, medical data have vastly increased due to the continuous generation of digital data. The different forms of medical data, such as reports, textual, numerical, monitoring, and laboratory data generate the so-called medical big data. This paper aims to find the best algorithm which predicts new medical data with high accuracy, since good prediction accuracy is essential in medical fields. To achieve the study’s goal, the best accuracy algorithm and least processing time algorithm are defined through an experiment and comparison of seven different algorithms, including Naïve bayes, linear model, regression, decision tree, random forest, gradient boosted tree, and J48. The conducted experiments have allowed the prediction of new medical big data that reach the algorithm with the best accuracy and processing time. Here, we find that the best accuracy classification algorithm is the random forest with accuracy values of 97.58%, 83.59%, and 90% for heart disease, M-health, and diabetes datasets, respectively. The Naïve bayes has the lowest processing time with values of 0.078, 7.683, and 22.374 s for heart disease, M-health, and diabetes datasets, respectively. In addition, the best result of the experiment is obtained by the combination of the CFS feature selection algorithm with the Random Forest classification algorithm. The results of applying RF with the combination of CFS on the heart disease dataset are as follows: Accuracy of 90%, precision of 83.3%, sensitivity of 100, and consuming time of 3 s. Moreover, the results of applying this combination on the M-health dataset are as follows: Accuracy of 83.59%, precision of 74.3%, sensitivity of 93.1, and consuming time of 13.481 s. Furthermore, the results on the diabetes dataset are as follows: Accuracy of 97.58%, precision of 86.39%, sensitivity of 97.14, and consuming time of 56.508 s.
Collapse
|
13
|
Tso CF, Lam C, Calvert J, Mao Q. Machine learning early prediction of respiratory syncytial virus in pediatric hospitalized patients. Front Pediatr 2022; 10:886212. [PMID: 35989982 PMCID: PMC9385995 DOI: 10.3389/fped.2022.886212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 07/04/2022] [Indexed: 11/13/2022] Open
Abstract
Respiratory syncytial virus (RSV) causes millions of infections among children in the US each year and can cause severe disease or death. Infections that are not promptly detected can cause outbreaks that put other hospitalized patients at risk. No tools besides diagnostic testing are available to rapidly and reliably predict RSV infections among hospitalized patients. We conducted a retrospective study from pediatric electronic health record (EHR) data and built a machine learning model to predict whether a patient will test positive to RSV by nucleic acid amplification test during their stay. Our model demonstrated excellent discrimination with an area under the receiver-operating curve of 0.919, a sensitivity of 0.802, and specificity of 0.876. Our model can help clinicians identify patients who may have RSV infections rapidly and cost-effectively. Successfully integrating this model into routine pediatric inpatient care may assist efforts in patient care and infection control.
Collapse
Affiliation(s)
| | - Carson Lam
- Dascena, Inc., Houston, TX, United States
| | | | - Qingqing Mao
- Dascena, Inc., Houston, TX, United States.,Montera Inc., San Francisco, CA, United States
| |
Collapse
|