1
|
Yagin FH, Aygun U, Algarni A, Colak C, Al-Hashem F, Ardigò LP. Platelet Metabolites as Candidate Biomarkers in Sepsis Diagnosis and Management Using the Proposed Explainable Artificial Intelligence Approach. J Clin Med 2024; 13:5002. [PMID: 39274215 PMCID: PMC11395774 DOI: 10.3390/jcm13175002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 08/16/2024] [Accepted: 08/22/2024] [Indexed: 09/16/2024] Open
Abstract
Background: Sepsis is characterized by an atypical immune response to infection and is a dangerous health problem leading to significant mortality. Current diagnostic methods exhibit insufficient sensitivity and specificity and require the discovery of precise biomarkers for the early diagnosis and treatment of sepsis. Platelets, known for their hemostatic abilities, also play an important role in immunological responses. This study aims to develop a model integrating machine learning and explainable artificial intelligence (XAI) to identify novel platelet metabolomics markers of sepsis. Methods: A total of 39 participants, 25 diagnosed with sepsis and 14 control subjects, were included in the study. The profiles of platelet metabolites were analyzed using quantitative 1H-nuclear magnetic resonance (NMR) technology. Data were processed using the synthetic minority oversampling method (SMOTE)-Tomek to address the issue of class imbalance. In addition, missing data were filled using a technique based on random forests. Three machine learning models, namely extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and kernel tree boosting (KTBoost), were used for sepsis prediction. The models were validated using cross-validation. Clinical annotations of the optimal sepsis prediction model were analyzed using SHapley Additive exPlanations (SHAP), an XAI technique. Results: The results showed that the KTBoost model (0.900 accuracy and 0.943 AUC) achieved better performance than the other models in sepsis diagnosis. SHAP results revealed that metabolites such as carnitine, glutamate, and myo-inositol are important biomarkers in sepsis prediction and intuitively explained the prediction decisions of the model. Conclusion: Platelet metabolites identified by the KTBoost model and XAI have significant potential for the early diagnosis and monitoring of sepsis and improving patient outcomes.
Collapse
Affiliation(s)
- Fatma Hilal Yagin
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, Türkiye
| | - Umran Aygun
- Department of Anesthesiology and Reanimation, Malatya Yesilyurt Hasan Calık State Hospital, Malatya 44929, Türkiye
| | - Abdulmohsen Algarni
- Central Labs, King Khalid University, AlQura'a, Abha, P.O. Box 960, Saudi Arabia
| | - Cemil Colak
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, Türkiye
| | - Fahaid Al-Hashem
- Department of Physiology, College of Medicine, King Khalid University, Abha 61421, Saudi Arabia
| | - Luca Paolo Ardigò
- Department of Teacher Education, NLA University College, 0166 Oslo, Norway
| |
Collapse
|
2
|
Kassaw AK, Alebachew Muluneh A, Assefa EM, Yimer A. Predictive modeling and socioeconomic determinants of diarrhea in children under five in the Amhara Region, Ethiopia. Front Public Health 2024; 12:1366496. [PMID: 39157521 PMCID: PMC11327862 DOI: 10.3389/fpubh.2024.1366496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Accepted: 06/19/2024] [Indexed: 08/20/2024] Open
Abstract
Background Diarrheal disease, characterized by high morbidity and mortality rates, continues to be a serious public health concern, especially in developing nations such as Ethiopia. The significant burden it imposes on these countries underscores the importance of identifying predictors of diarrhea. The use of machine learning techniques to identify significant predictors of diarrhea in children under the age of 5 in Ethiopia's Amhara Region is not well documented. Therefore, this study aimed to clarify these issues. Methods This study's data have been extracted from the Ethiopian Population and Health Survey. We have applied machine learning ensemble classifier models such as random forests, logistic regression, K-nearest neighbors, decision trees, support vector machines, gradient boosting, and naive Bayes models to predict the determinants of diarrhea in children under the age of 5 in Ethiopia. Finally, Shapley Additive exPlanation (SHAP) value analysis was performed to predict diarrhea. Result Among the seven models used, the random forest algorithm showed the highest accuracy in predicting diarrheal disease with an accuracy rate of 81.03% and an area under the curve of 86.50%. The following factors were investigated: families who had richest wealth status (log odd of -0.04), children without a history of Acute Respiratory Infections (ARIs) (log odd of -0.08), mothers who did not have a job (log odd of -0.04), children aged between 23 and 36 months (log odd of -0.03), mothers with higher education (log odds ratio of -0.03), urban dwellers (log odd of -0.01), families using electricity as cooking material (log odd of -0.12), children under 5 years of age living in the Amhara region of Ethiopia who did not show signs of wasting, children under 5 years of age who had not taken medications for intestinal parasites unlike their peers and who showed a significant association with diarrheal disease. Conclusion We recommend implementing programs to reduce the incidence of diarrhea in children under the age of 5 in the Amhara region. These programs should focus on removing socioeconomic barriers that impede mothers' access to wealth, a favorable work environment, cooking fuel, education, and healthcare for their children.
Collapse
Affiliation(s)
- Abdulaziz Kebede Kassaw
- Department of Health Informatics, College of Medicine and Health Sciences, Wollo University, Dessie, Ethiopia
| | - Ayana Alebachew Muluneh
- Department of Health Informatics, College of Medicine and Health Sciences, Wollo University, Dessie, Ethiopia
| | - Ebrahim Msaye Assefa
- Department of Pre-clerkship, College of Medicine and Health Science, Wollo University, Dessie, Ethiopia
| | - Ali Yimer
- Department of Public Health, College of Health Sciences, Woldia University, Woldia, Ethiopia
| |
Collapse
|
3
|
Shi Y, Liao Y. A New Integrated Interpolation Method for High Missing Unstable Disease Surveillance Data - 12 Urban Agglomerations, China, 2009-2020. China CDC Wkly 2024; 6:670-676. [PMID: 39027630 PMCID: PMC11252051 DOI: 10.46234/ccdcw2024.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 07/01/2024] [Indexed: 07/20/2024] Open
Abstract
Introduction The prevalence of unstable and incomplete monitoring data significantly complicates syndromic analysis. Many data interpolation methods currently available demonstrate inadequate effectiveness in overcoming this issue. Methods To improve the accuracy of interpolation, we propose the integration of the SHapley Additive exPlanation model (SHAP) with the structural equation model (SEM), forming a combined SHAP-SEM approach. A case study is then performed to assess the enhanced performance of this novel model compared to traditional methods. Results The SHAP-SEM model was utilized to develop an interpolation model employing data from the Chinese respiratory syndrome surveillance database. We executed three distinct experiments to establish the model datasets, comprising a total of 100 replicates. The performance of the model was evaluated using the root mean square error (RMSE), correlation coefficient (r), and F-score. The findings demonstrate that the SHAP-SEM model consistently achieves superior accuracy in data interpolation, which is evident across different seasons and in overall performance. Discussion We conclude that the SHAP-SEM model demonstrates an exceptional capacity for accurately interpolating volatile and incomplete data. This capability is crucial for developing a comprehensive database that is essential for conducting risk assessments related to syndromes.
Collapse
Affiliation(s)
- Yuanhao Shi
- The State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Science, Beijing, China
| | - Yilan Liao
- The State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
4
|
Hosny M, Zhu M, Gao W, Elshenhab AM. STN localization using local field potentials based on wavelet packet features and stacking ensemble learning. J Neurosci Methods 2024; 407:110156. [PMID: 38703796 DOI: 10.1016/j.jneumeth.2024.110156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 02/20/2024] [Accepted: 04/27/2024] [Indexed: 05/06/2024]
Abstract
BACKGROUND DBS entails the insertion of an electrode into the patient brain, enabling Subthalamic nucleus (STN) stimulation. Accurate delineation of STN borders is a critical but time-consuming task, traditionally reliant on the neurosurgeon experience in deciphering the intricacies of microelectrode recording (MER). While clinical outcomes of MER have been satisfactory, they involve certain risks to patient safety. Recently, there has been a growing interest in exploring the potential of local field potentials (LFP) due to their correlation with the STN motor territory. METHOD A novel STN detection system, integrating LFP and wavelet packet transform (WPT) with stacking ensemble learning, is developed. Initial steps involve the inclusion of soft thresholding to increase robustness to LFP variability. Subsequently, non-linear WPT features are extracted. Finally, a unique ensemble model, comprising a dual-layer structure, is developed for STN localization. We harnessed the capabilities of support vector machine, Decision tree and k-Nearest Neighbor in conjunction with long short-term memory (LSTM) network. LSTM is pivotal for assigning adequate weights to every base model. RESULTS Results reveal that the proposed model achieved a remarkable accuracy and F1-score of 89.49% and 91.63%. COMPARISON WITH EXISTING METHODS Ensemble model demonstrated superior performance when compared to standalone base models and existing meta techniques. CONCLUSION This framework is envisioned to enhance the efficiency of DBS surgery and reduce the reliance on clinician experience for precise STN detection. This achievement is strategically significant to serve as an invaluable tool for refining the electrode trajectory, potentially replacing the current methodology based on MER.
Collapse
Affiliation(s)
- Mohamed Hosny
- Department of Electrical Engineering, Benha Faculty of Engineering, Benha University, Benha, Egypt.
| | - Minwei Zhu
- First Affiliated Hospital of Harbin Medical University, Harbin, 150001, China
| | - Wenpeng Gao
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150080, China
| | - Ahmed M Elshenhab
- Department of Mathematics, Faculty of Science, Mansoura University, Mansoura, 35516, Egypt
| |
Collapse
|
5
|
Shon S, Lim K, Chae M, Lee H, Choi J. Predicting Sudden Sensorineural Hearing Loss Recovery with Patient-Personalized Seigel's Criteria Using Machine Learning. Diagnostics (Basel) 2024; 14:1296. [PMID: 38928711 PMCID: PMC11202901 DOI: 10.3390/diagnostics14121296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 06/04/2024] [Accepted: 06/15/2024] [Indexed: 06/28/2024] Open
Abstract
BACKGROUND Accurate prognostic prediction is crucial for managing Idiopathic Sudden Sensorineural Hearing Loss (ISSHL). Previous studies developing ISSHL prognosis models often overlooked individual variability in hearing damage by relying on fixed frequency domains. This study aims to develop models predicting ISSHL prognosis one month after treatment, focusing on patient-specific hearing impairments. METHODS Patient-Personalized Seigel's Criteria (PPSC) were developed considering patient-specific hearing impairment related to ISSHL criteria. We performed a statistical test to assess the shift in the recovery assessment when applying PPSC. The utilized dataset of 581 patients comprised demographic information, health records, laboratory testing, onset and treatment, and hearing levels. To reduce the model's reliance on hearing level features, we used only the averages of hearing levels of the impaired frequencies. Then, model development, evaluation, and interpretation proceeded. RESULTS The chi-square test (p-value: 0.106) indicated that the shift in recovery assessment is not statistically significant. The soft-voting ensemble model was most effective, achieving an Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.864 (95% CI: 0.801-0.927), with model interpretation based on the SHapley Additive exPlanations value. CONCLUSIONS With PPSC, providing a hearing assessment comparable to traditional Seigel's criteria, the developed models successfully predicted ISSHL recovery one month post-treatment by considering patient-specific impairments.
Collapse
Affiliation(s)
- Sanghyun Shon
- Department of Biomedical Informatics, Korea University College of Medicine, Seoul 02708, Republic of Korea; (S.S.); (M.C.)
| | - Kanghyeon Lim
- Department of Otorhinolaryngology-Head and Neck Surgery, Korea University Ansan Hospital, Ansan-si 15355, Republic of Korea;
| | - Minsu Chae
- Department of Biomedical Informatics, Korea University College of Medicine, Seoul 02708, Republic of Korea; (S.S.); (M.C.)
| | - Hwamin Lee
- Department of Biomedical Informatics, Korea University College of Medicine, Seoul 02708, Republic of Korea; (S.S.); (M.C.)
| | - June Choi
- Department of Biomedical Informatics, Korea University College of Medicine, Seoul 02708, Republic of Korea; (S.S.); (M.C.)
- Department of Otorhinolaryngology-Head and Neck Surgery, Korea University Ansan Hospital, Ansan-si 15355, Republic of Korea;
| |
Collapse
|
6
|
Jovanovic L, Damaševičius R, Matic R, Kabiljo M, Simic V, Kunjadic G, Antonijevic M, Zivkovic M, Bacanin N. Detecting Parkinson's disease from shoe-mounted accelerometer sensors using convolutional neural networks optimized with modified metaheuristics. PeerJ Comput Sci 2024; 10:e2031. [PMID: 38855236 PMCID: PMC11157549 DOI: 10.7717/peerj-cs.2031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 04/09/2024] [Indexed: 06/11/2024]
Abstract
Neurodegenerative conditions significantly impact patient quality of life. Many conditions do not have a cure, but with appropriate and timely treatment the advance of the disease could be diminished. However, many patients only seek a diagnosis once the condition progresses to a point at which the quality of life is significantly impacted. Effective non-invasive and readily accessible methods for early diagnosis can considerably enhance the quality of life of patients affected by neurodegenerative conditions. This work explores the potential of convolutional neural networks (CNNs) for patient gain freezing associated with Parkinson's disease. Sensor data collected from wearable gyroscopes located at the sole of the patient's shoe record walking patterns. These patterns are further analyzed using convolutional networks to accurately detect abnormal walking patterns. The suggested method is assessed on a public real-world dataset collected from parents affected by Parkinson's as well as individuals from a control group. To improve the accuracy of the classification, an altered variant of the recent crayfish optimization algorithm is introduced and compared to contemporary optimization metaheuristics. Our findings reveal that the modified algorithm (MSCHO) significantly outperforms other methods in accuracy, demonstrated by low error rates and high Cohen's Kappa, precision, sensitivity, and F1-measures across three datasets. These results suggest the potential of CNNs, combined with advanced optimization techniques, for early, non-invasive diagnosis of neurodegenerative conditions, offering a path to improve patient quality of life.
Collapse
Affiliation(s)
- Luka Jovanovic
- Faculty of Technical Sciences, Singidunum University, Belgrade, Serbia
| | | | - Rade Matic
- Department for Information Systems and Technologies, Belgrade Academy for Business and Arts Applied Studies, Belgrade, Serbia
| | - Milos Kabiljo
- Department for Information Systems and Technologies, Belgrade Academy for Business and Arts Applied Studies, Belgrade, Serbia
| | - Vladimir Simic
- Faculty of Transport and Traffic Engineering, University of Belgrade, Belgrade, Serbia
- College of Engineering, Department of Industrial Engineering and Management, Yuan Ze University, Taoyuan City, Taiwan
| | - Goran Kunjadic
- Higher Colleges of Technology, Abu Dhabi, United Arab Emirates
| | - Milos Antonijevic
- Faculty of Informatics and Computing, Singidunum University, Belgrade, Serbia
| | - Miodrag Zivkovic
- Faculty of Informatics and Computing, Singidunum University, Belgrade, Serbia
| | - Nebojsa Bacanin
- Faculty of Informatics and Computing, Singidunum University, Belgrade, Serbia
- MEU Research Unit, Middle East University, Amman, Jordan
| |
Collapse
|
7
|
Eshel YD, Sharaha U, Beck G, Cohen-Logasi G, Lapidot I, Huleihel M, Mordechai S, Kapelushnik J, Salman A. Monitoring the efficacy of antibiotic therapy in febrile pediatric oncology patients with bacteremia using infrared spectroscopy of white blood cells-based machine learning. Talanta 2024; 270:125619. [PMID: 38199122 DOI: 10.1016/j.talanta.2023.125619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 12/29/2023] [Accepted: 12/30/2023] [Indexed: 01/12/2024]
Abstract
Bacteremia refers to the presence of bacteria in the bloodstream, which can lead to a serious and potentially life-threatening condition. In oncology patients, individuals undergoing cancer treatment have a higher risk of developing bacteremia due to a weakened immune system resulting from the disease itself or the treatments they receive. Prompt and accurate detection of bacterial infections and monitoring the effectiveness of antibiotic therapy are essential for enhancing patient outcomes and preventing the development and dissemination of multidrug-resistant bacteria. Traditional methods of infection monitoring, such as blood cultures and clinical observations, are time-consuming, labor-intensive, and often subject to limitations. This manuscript presents an innovative application of infrared spectroscopy of leucocytes of pediatric oncology patients with bacteremia combined with machine learning to diagnose the etiology of infection as bacterial and simultaneously monitor the efficacy of the antibiotic therapy in febrile pediatric oncology patients with bacteremia infections. Through the implementation of effective monitoring, it becomes possible to promptly identify any indications of treatment failure. This, in turn, indirectly serves to limit the progression of antibiotic resistance. The logistic regression (LR) classifier was able to differentiate the samples as bacterial or control within an hour, after receiving the blood samples with a success rate of over 95 %. Additionally, initial findings indicate that employing infrared spectroscopy of white blood cells (WBCs) along with machine learning is viable for monitoring the success of antibiotic therapy. Our follow up results demonstrate an accuracy of 87.5 % in assessing the effectiveness of the antibiotic treatment.
Collapse
Affiliation(s)
- Yotam D Eshel
- Department of Hematology and Oncology, Saban Pediatric Medical Center Soroka University Medical Center and Faculty of Health Sciences, Beer-Sheva, 84105, Israel
| | - Uraib Sharaha
- Department of Microbiology, Immunology, and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel; Department of Biology, Science and Technology College, Hebron University, Hebron, P760, Palestine
| | - Guy Beck
- Department of Hematology and Oncology, Saban Pediatric Medical Center Soroka University Medical Center and Faculty of Health Sciences, Beer-Sheva, 84105, Israel
| | - Gal Cohen-Logasi
- Department of Green Engineering, SCE-Sami Shamoon College of Engineering, Beer-Sheva, 84100, Israel
| | - Itshak Lapidot
- Department of Electrical and Electronics Engineering, ACLP-Afeka Center for Language Processing, Afeka Tel-Aviv Academic College of Engineering, Tel-Aviv, 69107, Israel; LIA Avignon Université, 339 Chemin des Meinajaries, Avignon, 84000, France
| | - Mahmoud Huleihel
- Department of Microbiology, Immunology, and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel
| | - Shaul Mordechai
- Department of Physics, Ben-Gurion University, Beer-Sheva, 84105, Israel
| | - Joseph Kapelushnik
- Department of Hematology and Oncology, Saban Pediatric Medical Center Soroka University Medical Center and Faculty of Health Sciences, Beer-Sheva, 84105, Israel
| | - Ahmad Salman
- Department of Physics, SCE-Sami Shamoon College of Engineering, Beer-Sheva, 84100, Israel.
| |
Collapse
|
8
|
Ejiyi CJ, Qin Z, Ukwuoma CC, Nneji GU, Monday HN, Ejiyi MB, Ejiyi TU, Okechukwu U, Bamisile OO. Comparative performance analysis of Boruta, SHAP, and Borutashap for disease diagnosis: A study with multiple machine learning algorithms. NETWORK (BRISTOL, ENGLAND) 2024:1-38. [PMID: 38511557 DOI: 10.1080/0954898x.2024.2331506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 03/13/2024] [Indexed: 03/22/2024]
Abstract
Interpretable machine learning models are instrumental in disease diagnosis and clinical decision-making, shedding light on relevant features. Notably, Boruta, SHAP (SHapley Additive exPlanations), and BorutaShap were employed for feature selection, each contributing to the identification of crucial features. These selected features were then utilized to train six machine learning algorithms, including LR, SVM, ETC, AdaBoost, RF, and LR, using diverse medical datasets obtained from public sources after rigorous preprocessing. The performance of each feature selection technique was evaluated across multiple ML models, assessing accuracy, precision, recall, and F1-score metrics. Among these, SHAP showcased superior performance, achieving average accuracies of 80.17%, 85.13%, 90.00%, and 99.55% across diabetes, cardiovascular, statlog, and thyroid disease datasets, respectively. Notably, the LGBM emerged as the most effective algorithm, boasting an average accuracy of 91.00% for most disease states. Moreover, SHAP enhanced the interpretability of the models, providing valuable insights into the underlying mechanisms driving disease diagnosis. This comprehensive study contributes significant insights into feature selection techniques and machine learning algorithms for disease diagnosis, benefiting researchers and practitioners in the medical field. Further exploration of feature selection methods and algorithms holds promise for advancing disease diagnosis methodologies, paving the way for more accurate and interpretable diagnostic models.
Collapse
Affiliation(s)
- Chukwuebuka Joseph Ejiyi
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhen Qin
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Chiagoziem Chima Ukwuoma
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Grace Ugochi Nneji
- Software Engineering Department, Sino-British Collaborative Education, Chengdu University of Technology, Oxford Brookes University, Chengdu, China
| | - Happy Nkanta Monday
- Software Engineering Department, Sino-British Collaborative Education, Chengdu University of Technology, Oxford Brookes University, Chengdu, China
| | | | - Thomas Ugochukwu Ejiyi
- Department of Pure and Industrial Chemistry, University of Nigeria Nsukka, Enugu, Nigeria
| | | | - Olusola O Bamisile
- Sichuan Industrial Internet Intelligent Monitoring and Application Engineering Technology Research Centre, Chengdu University of Technology, Chengdu, China
| |
Collapse
|
9
|
Zhao J, Jiang P, Shen T, Zhang R, Zhang D, Zhang N, Ting N, Ding K, Yang B, Tan C, Yu Z. Data-driven assessment of soil total nitrogen on the Qinghai-Tibet Plateau. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 914:169993. [PMID: 38215840 DOI: 10.1016/j.scitotenv.2024.169993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 01/03/2024] [Accepted: 01/05/2024] [Indexed: 01/14/2024]
Abstract
The investigation of soil total nitrogen (STN) holds significant importance in the preservation and sustainability of Earth's ecosystems. The Qinghai-Tibet Plateau (QTP), renowned as the world's most expansive plateau and characterized by its exceptionally delicate ecosystem, demands an in-depth exploration of its STN content. In this study, we use a machine learning approach to extrapolate point-scale measured STN stocks to the entire QTP and calculated STN storage from 0 to 2 m. Our results show that the XGB algorithm performs well in modeling STN despite variations in simulation accuracy for specific depth ranges. The spatial distribution of STN across the QTP exhibits pronounced heterogeneity, especially for the 0-50 cm soil layer, with relatively higher STN stocks in the southeast and lower stocks in the northwest of QTP. The vertical distribution reveals a gradual decrease in STN storage with increasing depth. The 0-50 cm soil layer holds the highest STN stocks, averaging around 0.78 kg/m2, which is almost the sum of STN stocks in the 50-100 cm and 100-200 cm soil layers. Meanwhile, the STN stocks are smaller in permafrost zone than that in non-permafrost zone. We also investigate the impact factors that control the spatiotemporal distribution of STN. It indicates that vegetation, precipitation, temperature, and elevation are the major factors for STN distribution, while physical properties of the soil have a relatively smaller impact. These findings are crucial for understanding the distribution and evolution of STN on the QTP.
Collapse
Affiliation(s)
- Jiahui Zhao
- The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing 210098, China; College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China
| | - Peng Jiang
- The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing 210098, China; College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China; Key Laboratory of Natural Resource Coupling Process and Effects, Beijing 100055, China; The Middle Reaches of Yarlung Zangbo River, Natural Resources, Observation and Research Station of Tibet Autonomous Region, Research Center of Applied Geology of China Geological Survey, Chengdu 610036, China; Joint International Research Laboratory of Global Change and Water Cycle, Hohai University, Nanjing 210098, China.
| | - Tongqing Shen
- The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing 210098, China; College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China
| | - Rongrong Zhang
- The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing 210098, China; Key Laboratory of Natural Resource Coupling Process and Effects, Beijing 100055, China; Joint International Research Laboratory of Global Change and Water Cycle, Hohai University, Nanjing 210098, China
| | - Dawei Zhang
- China Institute of Water Resources and Hydropower Research, Beijing 100038, China
| | - Nana Zhang
- College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China
| | - Nie Ting
- College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China
| | - Kunqi Ding
- College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China
| | - Bin Yang
- The Middle Reaches of Yarlung Zangbo River, Natural Resources, Observation and Research Station of Tibet Autonomous Region, Research Center of Applied Geology of China Geological Survey, Chengdu 610036, China
| | - Changhai Tan
- Research Center of Applied Geology of China Geological Survey, Chengdu 610036, China
| | - Zhongbo Yu
- The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing 210098, China; College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China; Yangtze Institute for Conservation and Development, Hohai University, Jiangsu 210098, China; Joint International Research Laboratory of Global Change and Water Cycle, Hohai University, Nanjing 210098, China
| |
Collapse
|
10
|
Chadaga K, Prabhu S, Sampathila N, Chadaga R, Umakanth S, Bhat D, G S SK. Explainable artificial intelligence approaches for COVID-19 prognosis prediction using clinical markers. Sci Rep 2024; 14:1783. [PMID: 38245638 PMCID: PMC10799946 DOI: 10.1038/s41598-024-52428-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 01/18/2024] [Indexed: 01/22/2024] Open
Abstract
The COVID-19 influenza emerged and proved to be fatal, causing millions of deaths worldwide. Vaccines were eventually discovered, effectively preventing the severe symptoms caused by the disease. However, some of the population (elderly and patients with comorbidities) are still vulnerable to severe symptoms such as breathlessness and chest pain. Identifying these patients in advance is imperative to prevent a bad prognosis. Hence, machine learning and deep learning algorithms have been used for early COVID-19 severity prediction using clinical and laboratory markers. The COVID-19 data was collected from two Manipal hospitals after obtaining ethical clearance. Multiple nature-inspired feature selection algorithms are used to choose the most crucial markers. A maximum testing accuracy of 95% was achieved by the classifiers. The predictions obtained by the classifiers have been demystified using five explainable artificial intelligence techniques (XAI). According to XAI, the most important markers are c-reactive protein, basophils, lymphocytes, albumin, D-Dimer and neutrophils. The models could be deployed in various healthcare facilities to predict COVID-19 severity in advance so that appropriate treatments could be provided to mitigate a severe prognosis. The computer aided diagnostic method can also aid the healthcare professionals and ease the burden on already suffering healthcare infrastructure.
Collapse
Affiliation(s)
- Krishnaraj Chadaga
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India.
| | - Srikanth Prabhu
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India.
| | - Niranjana Sampathila
- Department of Biomedical Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India.
| | - Rajagopala Chadaga
- Department of Mechanical and Industrial Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Shashikiran Umakanth
- Department of Medicine, Dr. TMA Hospital, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Devadas Bhat
- Department of Biomedical Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Shashi Kumar G S
- Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India
| |
Collapse
|
11
|
Takallou MA, Fallahtafti F, Hassan M, Al-Ramini A, Qolomany B, Pipinos I, Myers S, Alsaleem F. Diagnosis of disease affecting gait with a body acceleration-based model using reflected marker data for training and a wearable accelerometer for implementation. Sci Rep 2024; 14:1075. [PMID: 38212467 PMCID: PMC10784467 DOI: 10.1038/s41598-023-50727-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 12/23/2023] [Indexed: 01/13/2024] Open
Abstract
This paper demonstrates the value of a framework for processing data on body acceleration as a uniquely valuable tool for diagnosing diseases that affect gait early. As a case study, we used this model to identify individuals with peripheral artery disease (PAD) and distinguish them from those without PAD. The framework uses acceleration data extracted from anatomical reflective markers placed in different body locations to train the diagnostic models and a wearable accelerometer carried at the waist for validation. Reflective marker data have been used for decades in studies evaluating and monitoring human gait. They are widely available for many body parts but are obtained in specialized laboratories. On the other hand, wearable accelerometers enable diagnostics outside lab conditions. Models trained by raw marker data at the sacrum achieve an accuracy of 92% in distinguishing PAD patients from non-PAD controls. This accuracy drops to 28% when data from a wearable accelerometer at the waist validate the model. This model was enhanced by using features extracted from the acceleration rather than the raw acceleration, with the marker model accuracy only dropping from 86 to 60% when validated by the wearable accelerometer data.
Collapse
Affiliation(s)
- Mohammad Ali Takallou
- Architectural Engineering Department, University of Nebraska-Lincoln, Omaha, NE, 68182, USA
| | - Farahnaz Fallahtafti
- Department of Biomechanics, University of Nebraska at Omaha, Omaha, NE, 6160, USA
- Department of Surgery and VA Research Service, VA Nebraska-Western Iowa Health Care System, Omaha, NE, 68105, USA
| | - Mahdi Hassan
- Department of Biomechanics, University of Nebraska at Omaha, Omaha, NE, 6160, USA
- Department of Surgery and VA Research Service, VA Nebraska-Western Iowa Health Care System, Omaha, NE, 68105, USA
| | - Ali Al-Ramini
- Mechanical Engineering Department, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Basheer Qolomany
- Cyber Systems Department, University of Nebraska at Kearney, Kearney, NE, 68849, USA
| | - Iraklis Pipinos
- Department of Surgery and VA Research Service, VA Nebraska-Western Iowa Health Care System, Omaha, NE, 68105, USA
- Department of Surgery, University of Nebraska Medical Center, Omaha, NE, 68105, USA
| | - Sara Myers
- Department of Biomechanics, University of Nebraska at Omaha, Omaha, NE, 6160, USA
- Department of Surgery and VA Research Service, VA Nebraska-Western Iowa Health Care System, Omaha, NE, 68105, USA
| | - Fadi Alsaleem
- Architectural Engineering Department, University of Nebraska-Lincoln, Omaha, NE, 68182, USA.
| |
Collapse
|
12
|
Chadaga K, Prabhu S, Bhat V, Sampathila N, Umakanth S, Upadya P S. COVID-19 diagnosis using clinical markers and multiple explainable artificial intelligence approaches: A case study from Ecuador. SLAS Technol 2023; 28:393-410. [PMID: 37689365 DOI: 10.1016/j.slast.2023.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 08/16/2023] [Accepted: 09/06/2023] [Indexed: 09/11/2023]
Abstract
The COVID-19 pandemic erupted at the beginning of 2020 and proved fatal, causing many casualties worldwide. Immediate and precise screening of affected patients is critical for disease control. COVID-19 is often confused with various other respiratory disorders since the symptoms are similar. As of today, the reverse transcription-polymerase chain reaction (RT-PCR) test is utilized for diagnosing COVID-19. However, this approach is sometimes prone to producing erroneous and false negative results. Hence, finding a reliable diagnostic method that can validate the RT-PCR test results is crucial. Artificial intelligence (AI) and machine learning (ML) applications in COVID-19 diagnosis has proven to be beneficial. Hence, clinical markers have been utilized for COVID-19 diagnosis with the help of several classifiers in this study. Further, five different explainable artificial intelligence techniques have been utilized to interpret the predictions. Among all the algorithms, the k-nearest neighbor obtained the best performance with an accuracy, precision, recall and f1-score of 84%, 85%, 84% and 84%. According to this study, the combination of clinical markers such as eosinophils, lymphocytes, red blood cells and leukocytes was significant in differentiating COVID-19. The classifiers can be utilized synchronously with the standard RT-PCR procedure making diagnosis more reliable and efficient.
Collapse
Affiliation(s)
- Krishnaraj Chadaga
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Srikanth Prabhu
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| | - Vivekananda Bhat
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Niranjana Sampathila
- Department of Biomedical Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| | - Shashikiran Umakanth
- Department of Medicine, Dr. TMA Hospital, Manipal Academy of Higher Education, Manipal, India
| | - Sudhakara Upadya P
- Manipal School of Information Sciences, Manipal Academy of Higher Education, Manipal, India
| |
Collapse
|
13
|
Yoon HS, Oh J, Kim YC. Assessing Machine Learning Models for Predicting Age with Intracranial Vessel Tortuosity and Thickness Information. Brain Sci 2023; 13:1512. [PMID: 38002472 PMCID: PMC10669197 DOI: 10.3390/brainsci13111512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/19/2023] [Accepted: 10/23/2023] [Indexed: 11/26/2023] Open
Abstract
This study aimed to develop and validate machine learning (ML) models that predict age using intracranial vessels' tortuosity and diameter features derived from magnetic resonance angiography (MRA) data. A total of 171 subjects' three-dimensional (3D) time-of-flight MRA image data were considered for analysis. After annotations of two endpoints in each arterial segment, tortuosity features such as the sum of the angle metrics, triangular index, relative length, and product of the angle distance, as well as the vessels' diameter features, were extracted and used to train and validate the ML models for age prediction. Features extracted from the right and left internal carotid arteries (ICA) and basilar arteries were considered as the inputs to train and validate six ML regression models with a four-fold cross validation. The random forest regression model resulted in the lowest root mean square error of 14.9 years and the highest average coefficient of determination of 0.186. The linear regression model showed the lowest average mean absolute percentage error (MAPE) and the highest average Pearson correlation coefficient (0.532). The mean diameter of the right ICA vessel segment was the most important feature contributing to prediction of age in two out of the four regression models considered. An ML of tortuosity descriptors and diameter features extracted from MRA data showed a modest correlation between real age and ML-predicted age. Further studies are warranted for the assessment of the model's age predictions in patients with intracranial vessel diseases.
Collapse
Affiliation(s)
| | | | - Yoon-Chul Kim
- Division of Digital Healthcare, College of Software and Digital Healthcare Convergence, Yonsei University, Wonju 26493, Republic of Korea; (H.-S.Y.); (J.O.)
| |
Collapse
|
14
|
Kebede SD, Mamo DN, Adem JB, Semagn BE, Walle AD. Machine learning modeling for identifying predictors of unmet need for family planning among married/in-union women in Ethiopia: Evidence from performance monitoring and accountability (PMA) survey 2019 dataset. PLOS DIGITAL HEALTH 2023; 2:e0000345. [PMID: 37847670 PMCID: PMC10581455 DOI: 10.1371/journal.pdig.0000345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 08/11/2023] [Indexed: 10/19/2023]
Abstract
Unmet need for contraceptives is a public health issue globally that affects maternal and child health. Reducing unmet need reduces the risk of abortion or childbearing by preventing unintended pregnancy. The unmet need for family planning is a frequently used indicator for monitoring family planning programs. This study aimed to identify predictors of unmet need for family planning using advanced machine learning modeling on recent PMA 2019 survey data. The study was conducted using secondary data from PMA Ethiopia 2019 cross-sectional household and female survey which was carried out from September 2019 to December 2019. Eight machine learning classifiers were employed on a total weighted sample of 5819 women and evaluated using performance metrics to predict and identify important predictors of unmet need of family planning with Python 3.10 version software. Data preparation techniques such as removing outliers, handling missing values, handling unbalanced categories, feature engineering, and data splitting were applied to smooth the data for further analysis. Finally, Shapley Additive exPlanations (SHAP) analysis was used to identify the top predictors of unmet need and explain the contribution of the predictors on the model's output. Random Forest was the best predictive model with a performance of 85% accuracy and 0.93 area under the curve on balanced training data through tenfold cross-validation. The SHAP analysis based on random forest model revealed that husband/partner disapproval to use family planning, number of household members, women education being primary, being from Amhara region, and previously delivered in health facility were the top important predictors of unmet need for family planning in Ethiopia. Findings from this study suggest various sociocultural and economic factors might be considered while implementing health policies intended to decrease unmet needs for family planning in Ethiopia. In particular, the husband's/partner's involvement in family planning sessions should be emphasized as it has a significant impact on women's demand for contraceptives.
Collapse
Affiliation(s)
- Shimels Derso Kebede
- Department of Health Informatics, School of Public Health, College of Medicine and Health Science, Wollo University, Dessie, Ethiopia
| | - Daniel Niguse Mamo
- Department of Health Informatics, College of Medicine and health sciences, Arba Minch University, Arba Minch, Ethiopia
| | - Jibril Bashir Adem
- Department of Public Health, College of Medicine and Health Science, Arsi University, Asella, Ethiopia
| | - Birhan Ewunu Semagn
- Department of Public Health, School of Public Health, Asrat Woldeyes Health Science College, Debre Berhan University, Debre Berhan, Ethiopia
| | - Agmasie Damtew Walle
- Department of Health Informatics, Institute of Public Health, College of Medicine and Health Sciences, Mettu University, Mettu, Ethiopia
| |
Collapse
|
15
|
Roy J, Pore S, Roy K. Prediction of cytotoxicity of heavy metals adsorbed on nano-TiO 2 with periodic table descriptors using machine learning approaches. BEILSTEIN JOURNAL OF NANOTECHNOLOGY 2023; 14:939-950. [PMID: 37736658 PMCID: PMC10509545 DOI: 10.3762/bjnano.14.77] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 08/30/2023] [Indexed: 09/23/2023]
Abstract
Nanoparticles with their unique features have attracted researchers over the past decades. Heavy metals, upon release and emission, may interact with different environmental components, which may lead to co-exposure to living organisms. Nanoscale titanium dioxide (nano-TiO2) can adsorb heavy metals. The current idea is that nanoparticles (NPs) may act as carriers and facilitate the entry of heavy metals into organisms. Thus, the present study reports nanoscale quantitative structure-activity relationship (nano-QSAR) models, which are based on an ensemble learning approach, for predicting the cytotoxicity of heavy metals adsorbed on nano-TiO2 to human renal cortex proximal tubule epithelial (HK-2) cells. The ensemble learning approach implements gradient boosting and bagging algorithms; that is, random forest, AdaBoost, Gradient Boost, and Extreme Gradient Boost were constructed and utilized to establish statistically significant relationships between the structural properties of NPs and the cause of cytotoxicity. To demonstrate the predictive ability of the developed nano-QSAR models, simple periodic table descriptors requiring low computational resources were utilized. The nano-QSAR models generated good R2 values (0.99-0.89), Q2 values (0.64-0.77), and Q2F1 values (0.99-0.71). Thus, the present work manifests that ML in conjunction with periodic table descriptors can be used to explore the features and predict unknown compounds with similar properties.
Collapse
Affiliation(s)
- Joyita Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700032, India
| | - Souvik Pore
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700032, India
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700032, India
| |
Collapse
|
16
|
Wang H, Doumard E, Soule-Dupuy C, Kemoun P, Aligon J, Monsarrat P. Explanations as a New Metric for Feature Selection: A Systematic Approach. IEEE J Biomed Health Inform 2023; 27:4131-4142. [PMID: 37220033 DOI: 10.1109/jbhi.2023.3279340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
With the extensive use of Machine Learning (ML) in the biomedical field, there was an increasing need for Explainable Artificial Intelligence (XAI) to improve transparency and reveal complex hidden relationships between variables for medical practitioners, while meeting regulatory requirements. Feature Selection (FS) is widely used as a part of a biomedical ML pipeline to significantly reduce the number of variables while preserving as much information as possible. However, the choice of FS methods affects the entire pipeline including the final prediction explanations, whereas very few works investigate the relationship between FS and model explanations. Through a systematic workflow performed on 145 datasets and an illustration on medical data, the present work demonstrated the promising complementarity of two metrics based on explanations (using ranking and influence changes) in addition to accuracy and retention rate to select the most appropriate FS/ML models. Measuring how much explanations differ with/without FS are particularly promising for FS methods recommendation. While reliefF generally performs the best on average, the optimal choice may vary for each dataset. Positioning FS methods in a tridimensional space, integrating explanations-based metrics, accuracy and retention rate, would allow the user to choose the priorities to be given on each of the dimensions. In biomedical applications, where each medical condition may have its own preferences, this framework will make it possible to offer the healthcare professional the appropriate FS technique, to select the variables that have an important explainable impact, even if this comes at the expense of a limited drop of accuracy.
Collapse
|
17
|
Gao J, He S, Hu J, Chen G. A hybrid system to understand the relations between assessments and plans in progress notes. J Biomed Inform 2023; 141:104363. [PMID: 37054961 PMCID: PMC11449588 DOI: 10.1016/j.jbi.2023.104363] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 04/05/2023] [Accepted: 04/07/2023] [Indexed: 04/15/2023]
Abstract
OBJECTIVE The paper presents a novel solution to the 2022 National NLP Clinical Challenges (n2c2) Track 3, which aims to predict the relations between assessment and plan subsections in progress notes. METHODS Our approach goes beyond standard transformer models and incorporates external information such as medical ontology and order information to comprehend the semantics of progress notes. We fine-tuned transformers to understand the textual data and incorporated medical ontology concepts and their relationships to enhance the model's accuracy. We also captured order information that regular transformers cannot by taking into account the position of the assessment and plan subsections in progress notes. RESULTS Our submission earned third place in the challenge phase with a macro-F1 score of 0.811. After refining our pipeline further, we achieved a macro-F1 of 0.826, outperforming the top-performing system during the challenge phase. CONCLUSION Our approach, which combines fine-tuned transformers, medical ontology, and order information, outperformed other systems in predicting the relationships between assessment and plan subsections in progress notes. This highlights the importance of incorporating external information beyond textual data in natural language processing (NLP) tasks related to medical documentation. Our work could potentially improve the efficiency and accuracy of progress note analysis.
Collapse
Affiliation(s)
- Jifan Gao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 610 Walnut St, Madison, 53726, WI, USA
| | - Shilu He
- Department of Mathematics, University of Wisconsin-Madison, 480 Lincoln Dr, Madison, 53706, WI, USA
| | - Junjie Hu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 610 Walnut St, Madison, 53726, WI, USA.
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 610 Walnut St, Madison, 53726, WI, USA.
| |
Collapse
|
18
|
Zheng D, Hao X, Khan M, Wang L, Li F, Xiang N, Kang F, Hamalainen T, Cong F, Song K, Qiao C. Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia: A retrospective study. Front Cardiovasc Med 2022; 9:959649. [PMID: 36312231 PMCID: PMC9596815 DOI: 10.3389/fcvm.2022.959649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 09/12/2022] [Indexed: 12/05/2022] Open
Abstract
Introduction Preeclampsia, one of the leading causes of maternal and fetal morbidity and mortality, demands accurate predictive models for the lack of effective treatment. Predictive models based on machine learning algorithms demonstrate promising potential, while there is a controversial discussion about whether machine learning methods should be recommended preferably, compared to traditional statistical models. Methods We employed both logistic regression and six machine learning methods as binary predictive models for a dataset containing 733 women diagnosed with preeclampsia. Participants were grouped by four different pregnancy outcomes. After the imputation of missing values, statistical description and comparison were conducted preliminarily to explore the characteristics of documented 73 variables. Sequentially, correlation analysis and feature selection were performed as preprocessing steps to filter contributing variables for developing models. The models were evaluated by multiple criteria. Results We first figured out that the influential variables screened by preprocessing steps did not overlap with those determined by statistical differences. Secondly, the most accurate imputation method is K-Nearest Neighbor, and the imputation process did not affect the performance of the developed models much. Finally, the performance of models was investigated. The random forest classifier, multi-layer perceptron, and support vector machine demonstrated better discriminative power for prediction evaluated by the area under the receiver operating characteristic curve, while the decision tree classifier, random forest, and logistic regression yielded better calibration ability verified, as by the calibration curve. Conclusion Machine learning algorithms can accomplish prediction modeling and demonstrate superior discrimination, while Logistic Regression can be calibrated well. Statistical analysis and machine learning are two scientific domains sharing similar themes. The predictive abilities of such developed models vary according to the characteristics of datasets, which still need larger sample sizes and more influential predictors to accumulate evidence.
Collapse
Affiliation(s)
- Dongying Zheng
- State Key Laboratory of Fine Chemicals, Dalian R&D Center for Stem Cell and Tissue Engineering, Dalian University of Technology, Dalian, China,Department of Obstetrics and Gynecology, Second Affiliated Hospital of Dalian Medical University, Dalian, China,Faculty of Information Technology, University of Jyvaskyla, Jyväskylä, Finland
| | - Xinyu Hao
- Faculty of Information Technology, University of Jyvaskyla, Jyväskylä, Finland,School of Biomedical Engineering, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
| | - Muhanmmad Khan
- Institute of Zoology, University of Punjab, Lahore, Pakistan
| | - Lixia Wang
- Department of Obstetrics and Gynecology, Second Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Fan Li
- Department of Obstetrics and Gynecology, Shengjing Hospital, China Medical University, Shenyang, China
| | - Ning Xiang
- Department of Obstetrics and Gynecology, Jingzhou Hospital Affiliated to Yangtze University, Jingzhou, China
| | - Fuli Kang
- Department of Obstetrics and Gynecology, Second Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Timo Hamalainen
- Faculty of Information Technology, University of Jyvaskyla, Jyväskylä, Finland
| | - Fengyu Cong
- Faculty of Information Technology, University of Jyvaskyla, Jyväskylä, Finland,School of Biomedical Engineering, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China,School of Artificial Intelligence, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China,Key Laboratory of Integrated Circuit and Biomedical Electronic System, Dalian University of Technology, Dalian, China
| | - Kedong Song
- State Key Laboratory of Fine Chemicals, Dalian R&D Center for Stem Cell and Tissue Engineering, Dalian University of Technology, Dalian, China,*Correspondence: Kedong Song
| | - Chong Qiao
- Department of Obstetrics and Gynecology, Shengjing Hospital, China Medical University, Shenyang, China,Chong Qiao
| |
Collapse
|