1
|
Ramón A, Bas A, Herrero S, Blasco P, Suárez M, Mateo J. Personalized Assessment of Mortality Risk and Hospital Stay Duration in Hospitalized Patients with COVID-19 Treated with Remdesivir: A Machine Learning Approach. J Clin Med 2024; 13:1837. [PMID: 38610602 PMCID: PMC11013017 DOI: 10.3390/jcm13071837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 03/15/2024] [Accepted: 03/20/2024] [Indexed: 04/14/2024] Open
Abstract
Background: Despite advancements in vaccination, early treatments, and understanding of SARS-CoV-2, its impact remains significant worldwide. Many patients require intensive care due to severe COVID-19. Remdesivir, a key treatment option among viral RNA polymerase inhibitors, lacks comprehensive studies on factors associated with its effectiveness. Methods: We conducted a retrospective study in 2022, analyzing data from 252 hospitalized COVID-19 patients treated with remdesivir. Six machine learning algorithms were compared to predict factors influencing remdesivir's clinical benefits regarding mortality and hospital stay. Results: The extreme gradient boost (XGB) method showed the highest accuracy for both mortality (95.45%) and hospital stay (94.24%). Factors associated with worse outcomes in terms of mortality included limitations in life support, ventilatory support needs, lymphopenia, low albumin and hemoglobin levels, flu and/or coinfection, and cough. For hospital stay, factors included vaccine doses, lung density, pulmonary radiological status, comorbidities, oxygen therapy, troponin, lactate dehydrogenase levels, and asthenia. Conclusions: These findings underscore XGB's effectiveness in accurately categorizing COVID-19 patients undergoing remdesivir treatment.
Collapse
Affiliation(s)
- Antonio Ramón
- Department of Pharmacy, University General Hospital, 46014 Valencia, Spain; (A.R.); (A.B.); (S.H.); (P.B.)
- Medical Analysis Expert Group, Institute of Technology, University of Castilla-La Mancha, 16002 Cuenca, Spain
| | - Andrés Bas
- Department of Pharmacy, University General Hospital, 46014 Valencia, Spain; (A.R.); (A.B.); (S.H.); (P.B.)
| | - Santiago Herrero
- Department of Pharmacy, University General Hospital, 46014 Valencia, Spain; (A.R.); (A.B.); (S.H.); (P.B.)
| | - Pilar Blasco
- Department of Pharmacy, University General Hospital, 46014 Valencia, Spain; (A.R.); (A.B.); (S.H.); (P.B.)
- Medical Analysis Expert Group, Institute of Technology, University of Castilla-La Mancha, 16002 Cuenca, Spain
| | - Miguel Suárez
- Medical Analysis Expert Group, Institute of Technology, University of Castilla-La Mancha, 16002 Cuenca, Spain
- Department of Gastroenterology, Virgen de la Luz Hospital, 16002 Cuenca, Spain
| | - Jorge Mateo
- Medical Analysis Expert Group, Institute of Technology, University of Castilla-La Mancha, 16002 Cuenca, Spain
- Medical Analysis Expert Group, Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| |
Collapse
|
2
|
Zhou CM, Li H, Xue Q, Yang JJ, Zhu Y. Artificial intelligence algorithms for predicting post-operative ileus after laparoscopic surgery. Heliyon 2024; 10:e26580. [PMID: 38439857 PMCID: PMC10909660 DOI: 10.1016/j.heliyon.2024.e26580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 02/13/2024] [Accepted: 02/15/2024] [Indexed: 03/06/2024] Open
Abstract
Objective By constructing a predictive model using machine learning and deep learning technologies, we aim to understand the risk factors for postoperative intestinal obstruction in laparoscopic colorectal cancer patients, and establish an effective artificial intelligence-based predictive model to guide individualized prevention and treatment, thus improving patient outcomes. Methods We constructed a model of the artificial intelligence algorithm in Python. Subjects were randomly assigned to either a training set for variable identification and model construction, or a test set for testing model performance, at a ratio of 7:3. The model was trained with ten algorithms. We used the AUC values of the ROC curves, as well as accuracy, precision, recall rate and F1 scores. Results The results of feature engineering composited with the GBDT algorithm showed that opioid use, anesthesia duration, and body weight were the top three factors in the development of POI. We used ten machine learning and deep learning algorithms to validate the model, and the results were as follows: the three algorithms with best accuracy were XGB (0.807), Decision Tree (0.807) and Neural DecisionTree (0.807); the two algorithms with best precision were XGB (0.500) and Decision Tree (0.500); the two algorithms with best recall rate were adab (0.243) and Decision Tree (0.135); the two algorithms with highest F1 score were adab (0.290) and Decision Tree (0.213); and the three algorithms with best AUC were Gradient Boosting (0.678), XGB (0.638) and LinearSVC (0.633). Conclusion This study shows that XGB and Decision Tree are the two best algorithms for predicting the risk of developing ileus after laparoscopic colon cancer surgery. It provides new insight and approaches to the field of postoperative intestinal obstruction in colorectal cancer through the application of machine learning techniques, thereby improving our understanding of the disease and offering strong support for clinical decision-making.
Collapse
Affiliation(s)
- Cheng-Mao Zhou
- Big Data and Artificial Intelligence Research Group, Department of Anaesthesiology and Nursing, Central People's Hospital of Zhanjiang, Zhanjiang, Guangdong, China
| | - HuiJuan Li
- Big Data and Artificial Intelligence Research Group, Department of Anesthesiology, Pain and Perioperative Medicine, First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Qiong Xue
- Big Data and Artificial Intelligence Research Group, Department of Anesthesiology, Pain and Perioperative Medicine, First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Jian-Jun Yang
- Big Data and Artificial Intelligence Research Group, Department of Anesthesiology, Pain and Perioperative Medicine, First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Yu Zhu
- Big Data and Artificial Intelligence Research Group, Department of Anaesthesiology and Nursing, Central People's Hospital of Zhanjiang, Zhanjiang, Guangdong, China
| |
Collapse
|
3
|
Zhao J, Jiang P, Shen T, Zhang R, Zhang D, Zhang N, Ting N, Ding K, Yang B, Tan C, Yu Z. Data-driven assessment of soil total nitrogen on the Qinghai-Tibet Plateau. Sci Total Environ 2024; 914:169993. [PMID: 38215840 DOI: 10.1016/j.scitotenv.2024.169993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 01/03/2024] [Accepted: 01/05/2024] [Indexed: 01/14/2024]
Abstract
The investigation of soil total nitrogen (STN) holds significant importance in the preservation and sustainability of Earth's ecosystems. The Qinghai-Tibet Plateau (QTP), renowned as the world's most expansive plateau and characterized by its exceptionally delicate ecosystem, demands an in-depth exploration of its STN content. In this study, we use a machine learning approach to extrapolate point-scale measured STN stocks to the entire QTP and calculated STN storage from 0 to 2 m. Our results show that the XGB algorithm performs well in modeling STN despite variations in simulation accuracy for specific depth ranges. The spatial distribution of STN across the QTP exhibits pronounced heterogeneity, especially for the 0-50 cm soil layer, with relatively higher STN stocks in the southeast and lower stocks in the northwest of QTP. The vertical distribution reveals a gradual decrease in STN storage with increasing depth. The 0-50 cm soil layer holds the highest STN stocks, averaging around 0.78 kg/m2, which is almost the sum of STN stocks in the 50-100 cm and 100-200 cm soil layers. Meanwhile, the STN stocks are smaller in permafrost zone than that in non-permafrost zone. We also investigate the impact factors that control the spatiotemporal distribution of STN. It indicates that vegetation, precipitation, temperature, and elevation are the major factors for STN distribution, while physical properties of the soil have a relatively smaller impact. These findings are crucial for understanding the distribution and evolution of STN on the QTP.
Collapse
Affiliation(s)
- Jiahui Zhao
- The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing 210098, China; College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China
| | - Peng Jiang
- The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing 210098, China; College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China; Key Laboratory of Natural Resource Coupling Process and Effects, Beijing 100055, China; The Middle Reaches of Yarlung Zangbo River, Natural Resources, Observation and Research Station of Tibet Autonomous Region, Research Center of Applied Geology of China Geological Survey, Chengdu 610036, China; Joint International Research Laboratory of Global Change and Water Cycle, Hohai University, Nanjing 210098, China.
| | - Tongqing Shen
- The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing 210098, China; College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China
| | - Rongrong Zhang
- The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing 210098, China; Key Laboratory of Natural Resource Coupling Process and Effects, Beijing 100055, China; Joint International Research Laboratory of Global Change and Water Cycle, Hohai University, Nanjing 210098, China
| | - Dawei Zhang
- China Institute of Water Resources and Hydropower Research, Beijing 100038, China
| | - Nana Zhang
- College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China
| | - Nie Ting
- College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China
| | - Kunqi Ding
- College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China
| | - Bin Yang
- The Middle Reaches of Yarlung Zangbo River, Natural Resources, Observation and Research Station of Tibet Autonomous Region, Research Center of Applied Geology of China Geological Survey, Chengdu 610036, China
| | - Changhai Tan
- Research Center of Applied Geology of China Geological Survey, Chengdu 610036, China
| | - Zhongbo Yu
- The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing 210098, China; College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China; Yangtze Institute for Conservation and Development, Hohai University, Jiangsu 210098, China; Joint International Research Laboratory of Global Change and Water Cycle, Hohai University, Nanjing 210098, China
| |
Collapse
|
4
|
Dhandapani A, Iqbal J, Kumar RN. Application of machine learning (individual vs stacking) models on MERRA-2 data to predict surface PM 2.5 concentrations over India. Chemosphere 2023; 340:139966. [PMID: 37634588 DOI: 10.1016/j.chemosphere.2023.139966] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/31/2023] [Accepted: 08/24/2023] [Indexed: 08/29/2023]
Abstract
The spatial coverage of PM2.5 monitoring is non-uniform across India due to the limited number of ground monitoring stations. Alternatively, Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2), is an atmospheric reanalysis data used for estimating PM2.5. MERRA-2 does not explicitly measure PM2.5 but rather follows an empirical model. MERRA-2 data were spatiotemporally collocated with ground observation for validation across India. Significant underestimation in MERRA-2 prediction of PM2.5 was observed over many monitoring stations ranging from -20 to 60 μg m-3. The utility of Machine Learning (ML) models to overcome this challenge was assessed. MERRA-2 aerosol and meteorological parameters were the input features used to train and test the individual ML models and compare them with the stacking technique. Initially, with 10% of randomly selected data, individual model performance was assessed to identify the best model. XGBoost (XGB) was the best model (r2 = 0.73) compared to Random Forest (RF) and LightGBM (LGBM). Stacking was then applied by keeping XGB as a meta-regressor. Stacked model results (r2 = 0.77) outperformed the best standalone estimate of XGB. Stacking technique was used to predict hourly and daily PM2.5 in different regions across India and each monitoring station. The eastern region exhibited the best hourly prediction (r2 = 0.80) and substantial reduction in Mean Bias (MB = -0.03 μg m-3), followed by the northern region (r2 = 0.63 and MB = -0.10 μg m-3), which showed better output due to the frequent observation of PM2.5 >100 μg m-3. Due to sparse data availability to train the ML models, the lowest performance was for the central region (r2 = 0.46 and MB = -0.60 μg m-3). Overall, India's PM2.5 prediction was good on an hourly basis compared to a daily basis using the ML stacking technique.
Collapse
Affiliation(s)
- Abisheg Dhandapani
- Department of Civil and Environmental Engineering, Birla Institute of Technology, Mesra, Ranchi, 835215, Jharkhand, India
| | - Jawed Iqbal
- Department of Civil and Environmental Engineering, Birla Institute of Technology, Mesra, Ranchi, 835215, Jharkhand, India
| | - R Naresh Kumar
- Department of Civil and Environmental Engineering, Birla Institute of Technology, Mesra, Ranchi, 835215, Jharkhand, India.
| |
Collapse
|
5
|
Casillas N, Ramón A, Torres AM, Blasco P, Mateo J. Predictive Model for Mortality in Severe COVID-19 Patients across the Six Pandemic Waves. Viruses 2023; 15:2184. [PMID: 38005862 PMCID: PMC10675561 DOI: 10.3390/v15112184] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 10/21/2023] [Accepted: 10/28/2023] [Indexed: 11/26/2023] Open
Abstract
The impact of SARS-CoV-2 infection remains substantial on a global scale, despite widespread vaccination efforts, early therapeutic interventions, and an enhanced understanding of the disease's underlying mechanisms. At the same time, a significant number of patients continue to develop severe COVID-19, necessitating admission to intensive care units (ICUs). This study aimed to provide evidence concerning the most influential predictors of mortality among critically ill patients with severe COVID-19, employing machine learning (ML) techniques. To accomplish this, we conducted a retrospective multicenter investigation involving 684 patients with severe COVID-19, spanning from 1 June 2020 to 31 March 2023, wherein we scrutinized sociodemographic, clinical, and analytical data. These data were extracted from electronic health records. Out of the six supervised ML methods scrutinized, the extreme gradient boosting (XGB) method exhibited the highest balanced accuracy at 96.61%. The variables that exerted the greatest influence on mortality prediction encompassed ferritin, fibrinogen, D-dimer, platelet count, C-reactive protein (CRP), prothrombin time (PT), invasive mechanical ventilation (IMV), PaFi (PaO2/FiO2), lactate dehydrogenase (LDH), lymphocyte levels, activated partial thromboplastin time (aPTT), body mass index (BMI), creatinine, and age. These findings underscore XGB as a robust candidate for accurately classifying patients with COVID-19.
Collapse
Affiliation(s)
- Nazaret Casillas
- Department of Internal Medicine, Hospital Virgen De La Luz, 16002 Cuenca, Spain
- Medical Analysis Expert Group, Institute of Technology, University of Castilla-La Mancha, 16002 Cuenca, Spain
| | - Antonio Ramón
- Department of Pharmacy, General University Hospital, 46014 Valencia, Spain
| | - Ana María Torres
- Medical Analysis Expert Group, Institute of Technology, University of Castilla-La Mancha, 16002 Cuenca, Spain
- Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| | - Pilar Blasco
- Department of Pharmacy, General University Hospital, 46014 Valencia, Spain
| | - Jorge Mateo
- Medical Analysis Expert Group, Institute of Technology, University of Castilla-La Mancha, 16002 Cuenca, Spain
- Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| |
Collapse
|
6
|
Cui S, Song H, Ren H, Wang X, Xie Z, Wen H, Li Y. Prediction of Hemorrhagic Complication after Thrombolytic Therapy Based on Multimodal Data from Multiple Centers: An Approach to Machine Learning and System Implementation. J Pers Med 2022; 12. [PMID: 36556272 DOI: 10.3390/jpm12122052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Revised: 12/08/2022] [Accepted: 12/09/2022] [Indexed: 12/15/2022] Open
Abstract
Hemorrhagic complication (HC) is the most severe complication of intravenous thrombolysis (IVT) in patients with acute ischemic stroke (AIS). This study aimed to build a machine learning (ML) prediction model and an application system for a personalized analysis of the risk of HC in patients undergoing IVT therapy. We included patients from Chongqing, Hainan and other centers, including Computed Tomography (CT) images, demographics, and other data, before the occurrence of HC. After feature engineering, a better feature subset was obtained, which was used to build a machine learning (ML) prediction model (Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), eXtreme Gradient Boosting (XGB)), and then evaluated with relevant indicators. Finally, a prediction model with better performance was obtained. Based on this, an application system was built using the Flask framework. A total of 517 patients were included, of which 332 were in the training cohort, 83 were in the internal validation cohort, and 102 were in the external validation cohort. After evaluation, the performance of the XGB model is better, with an AUC of 0.9454 and ACC of 0.8554 on the internal validation cohort, and 0.9142 and ACC of 0.8431 on the external validation cohort. A total of 18 features were used to construct the model, including hemoglobin and fasting blood sugar. Furthermore, the validity of the model is demonstrated through decision curves. Subsequently, a system prototype is developed to verify the test prediction effect. The clinical decision support system (CDSS) embedded with the XGB model based on clinical data and image features can better carry out personalized analysis of the risk of HC in intravenous injection patients.
Collapse
|
7
|
Suleman MT, Khan YD. m1A-pred: Prediction of Modified 1-methyladenosine Sites in RNA Sequences through Artificial Intelligence. Comb Chem High Throughput Screen 2022; 25:2473-2484. [PMID: 35718969 DOI: 10.2174/1386207325666220617152743] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 04/06/2022] [Accepted: 04/11/2022] [Indexed: 01/27/2023]
Abstract
BACKGROUND The process of nucleotides modification or methyl groups addition to nucleotides is known as post-transcriptional modification (PTM). 1-methyladenosine (m1A) is a type of PTM formed by adding a methyl group to the nitrogen at the 1st position of the adenosine base. Many human disorders are associated with m1A, which is widely found in ribosomal RNA and transfer RNA. OBJECTIVE The conventional methods such as mass spectrometry and site-directed mutagenesis proved to be laborious and burdensome. Systematic identification of modified sites from RNA sequences is gaining much attention nowadays. Consequently, an extreme gradient boost predictor, m1A-Pred, is developed in this study for the prediction of modified m1A sites. METHODS The current study involves the extraction of position and composition-based properties within nucleotide sequences. The extraction of features helps in the development of the features vector. Statistical moments were endorsed for dimensionality reduction in the obtained features. RESULTS Through a series of experiments using different computational models and evaluation methods, it was revealed that the proposed predictor, m1A-pred, proved to be the most robust and accurate model for the identification of modified sites. AVAILABILITY AND IMPLEMENTATION To enhance the research on m1A sites, a friendly server was also developed, which was the final phase of this research.
Collapse
Affiliation(s)
- Muhammad Taseer Suleman
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
8
|
Casillas N, Torres AM, Moret M, Gómez A, Rius-Peris JM, Mateo J. Mortality predictors in patients with COVID-19 pneumonia: a machine learning approach using eXtreme Gradient Boosting model. Intern Emerg Med 2022; 17:1929-1939. [PMID: 36098861 PMCID: PMC9469825 DOI: 10.1007/s11739-022-03033-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 06/12/2022] [Indexed: 12/15/2022]
Abstract
Recently, global health has seen an increase in demand for assistance as a result of the COVID-19 pandemic. This has prompted many researchers to conduct different studies looking for variables that are associated with increased clinical risk, and find effective and safe treatments. Many of these studies have been limited by presenting small samples and a large data set. Using machine learning (ML) techniques we can detect parameters that help us to improve clinical diagnosis, since they are a system for the detection, prediction and treatment of complex data. ML techniques can be valuable for the study of COVID-19, especially because they can uncover complex patterns in large data sets. This retrospective study of 150 hospitalized adult COVID-19 patients, of which we established two groups, those who died were called Case group (n = 53) while the survivors were Control group (n = 98). For analysis, a supervised learning algorithm eXtreme Gradient Boosting (XGBoost) has been used due to its good response compared to other methods because it is highly efficient, flexible and portable. In this study, the response to different treatments has been evaluated and has made it possible to accurately predict which patients have higher mortality using artificial intelligence, obtaining better results compared to other ML methods.
Collapse
Affiliation(s)
- N. Casillas
- Departament of Internal Medicine, Hospital Virgen de la Luz, Cuenca, Spain
- Neurobiological Research Group, Institute of Technology, Castilla-La Mancha University, Cuenca, Spain
| | - A. M. Torres
- Neurobiological Research Group, Institute of Technology, Castilla-La Mancha University, Cuenca, Spain
| | - M. Moret
- Departament of Internal Medicine, Hospital Virgen de la Luz, Cuenca, Spain
| | - A. Gómez
- Departament of Internal Medicine, Hospital Virgen de la Luz, Cuenca, Spain
| | - J. M. Rius-Peris
- Neurobiological Research Group, Institute of Technology, Castilla-La Mancha University, Cuenca, Spain
- Departament of Pediatrics, Hospital Virgen de la Luz, Cuenca, Spain
| | - J. Mateo
- Neurobiological Research Group, Institute of Technology, Castilla-La Mancha University, Cuenca, Spain
| |
Collapse
|
9
|
Abstract
Background Birth weight is a significant determinant of the likelihood of survival of an infant. Babies born at low birth weight are 25 times more likely to die than at normal birth weight. Low birth weight (LBW) affects one out of every seven newborns, accounting for about 14.6 percent of the babies born worldwide. Moreover, the prevalence of LBW varies substantially by region, with 7.2 per cent in the developed regions and 13.7 per cent in Africa, respectively. Ethiopia has a large burden of LBW, around half of Africa. These newborns were more likely to die within the first month of birth or to have long-term implications. These are stunted growth, low IQ, overweight or obesity, developing heart disease, diabetes, and early death. Therefore, the ability to predict the LBW is the better preventive measure and indicator of infant health risks. Method This study implemented predictive LBW models based on the data obtained from the Ethiopia Demographic and Health Survey 2016. This study was employed to compare and identify the best-suited classifier for predictive classification among Logistic Regression, Decision Tree, Naive Bayes, K-Nearest Neighbor, Random Forest (RF), Support Vector Machine, Gradient Boosting, and Extreme Gradient Boosting. Results Data preprocessing is conducted, including data cleaning. The Normal and LBW are the binary target category in this study. The study reveals that RF was the best classifier and predicts LBW with 91.60 percent accuracy, 91.60 percent Recall, 96.80 percent ROC-AUC, 91.60 percent F1 Score, 1.05 percent Hamming loss, and 81.86 percent Jaccard score. Conclusion The RF predicted the occurrence of LBW more accurately and effectively than other classifiers in Ethiopia Demographic Health Survey. Gender of the child, marriage to birth interval, mother’s occupation and mother’s age were Ethiopia’s top four critical predictors of low birth weight in Ethiopia. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-022-01981-9.
Collapse
Affiliation(s)
- Wondesen Teshome Bekele
- Department of Statistics, College of Natural and Computational Sciences, Dire Dawa University, Dire Dawa, Ethiopia.
| |
Collapse
|
10
|
Guo CY, Chang KH. A Novel Algorithm to Estimate the Significance Level of a Feature Interaction Using the Extreme Gradient Boosting Machine. Int J Environ Res Public Health 2022; 19:2338. [PMID: 35206527 DOI: 10.3390/ijerph19042338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 02/16/2022] [Accepted: 02/17/2022] [Indexed: 02/04/2023]
Abstract
Recent studies have revealed the importance of the interaction effect in cardiac research. An analysis would lead to an erroneous conclusion when the approach failed to tackle a significant interaction. Regression models deal with interaction by adding the product of the two interactive variables. Thus, statistical methods could evaluate the significance and contribution of the interaction term. However, machine learning strategies could not provide the p-value of specific feature interaction. Therefore, we propose a novel machine learning algorithm to assess the p-value of a feature interaction, named the extreme gradient boosting machine for feature interaction (XGB-FI). The first step incorporates the concept of statistical methodology by stratifying the original data into four subgroups according to the two interactive features. The second step builds four XGB machines with cross-validation techniques to avoid overfitting. The third step calculates a newly defined feature interaction ratio (FIR) for all possible combinations of predictors. Finally, we calculate the empirical p-value according to the FIR distribution. Computer simulation studies compared the XGB-FI with the multiple regression model with an interaction term. The results showed that the type I error of XGB-FI is valid under the nominal level of 0.05 when there is no interaction effect. The power of XGB-FI is consistently higher than the multiple regression model in all scenarios we examined. In conclusion, the new machine learning algorithm outperforms the conventional statistical model when searching for an interaction.
Collapse
|
11
|
Lapajne J, Knapič M, Žibrat U. Comparison of Selected Dimensionality Reduction Methods for Detection of Root-Knot Nematode Infestations in Potato Tubers Using Hyperspectral Imaging. Sensors (Basel) 2022; 22:s22010367. [PMID: 35009907 PMCID: PMC8749520 DOI: 10.3390/s22010367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 12/23/2021] [Accepted: 12/27/2021] [Indexed: 11/29/2022]
Abstract
Hyperspectral imaging is a popular tool used for non-invasive plant disease detection. Data acquired with it usually consist of many correlated features; hence most of the acquired information is redundant. Dimensionality reduction methods are used to transform the data sets from high-dimensional, to low-dimensional (in this study to one or a few features). We have chosen six dimensionality reduction methods (partial least squares, linear discriminant analysis, principal component analysis, RandomForest, ReliefF, and Extreme gradient boosting) and tested their efficacy on a hyperspectral data set of potato tubers. The extracted or selected features were pipelined to support vector machine classifier and evaluated. Tubers were divided into two groups, healthy and infested with Meloidogyne luci. The results show that all dimensionality reduction methods enabled successful identification of inoculated tubers. The best and most consistent results were obtained using linear discriminant analysis, with 100% accuracy in both potato tuber inside and outside images. Classification success was generally higher in the outside data set, than in the inside. Nevertheless, accuracy was in all cases above 0.6.
Collapse
|