1
|
Zhang J, Long Z, Ren Z, Xu W, Sun Z, Zhao H, Zhang G, Gao W. Application of machine learning in ultrasonic pretreatment of sewage sludge: Prediction and optimization. ENVIRONMENTAL RESEARCH 2024; 263:120108. [PMID: 39369781 DOI: 10.1016/j.envres.2024.120108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 09/26/2024] [Accepted: 10/03/2024] [Indexed: 10/08/2024]
Abstract
In this research, typical industrial scenarios were analyzed optimized by machine learning algorithms, which fills the gap of massive data and industrial requirements in ultrasonic sludge treatment. Principal component analysis showed that the ultrasonic density and ultrasonic time were positively correlated with soluble chemical oxygen demand (SCOD), total nitrogen (TN), and total phosphorus (TP). Within five machine learning models, the best model for SCOD prediction was XG-boost (R2 = 0.855), while RF was the best for TN and TP (R2 = 0.974 and 0.957, respectively). In addition, SHAP indicated that the importance feature for SCOD, TN, and TP was ultrasonic time, and sludge concentration, respectively. Finally, the typical industrial scenario of ultrasonic pretreatment of sludge was analyzed. In the secondary sludge, treatment volume at 0.6 L, the pH at 7.0, and the ultrasonic time at 20 min was best to improve the SCOD. In the ultrasonic pretreatment primary sludge, treatment volume of 0.3 L, pH of 7.0, and ultrasonic time of 15 min was best to improve the SCOD. Furthermore, the ultrasonic power at 700 W and ultrasonic time at 20 min were best to improve the C/N and C/P in the secondary sludge. In the primary sludge, the ultrasonic power at 600 W, and the ultrasonic time at 15 min were best to improve C/N and C/P. This study lays a foundation for the practical application of ultrasonic pretreatment of sludge and provides basic information for typical industrial scenarios.
Collapse
Affiliation(s)
- Jie Zhang
- School of Energy and Environmental Engineering, Hebei University of Technology, Tianjin, 300401, China
| | - Zeqing Long
- Department of Public Health and Preventive Medicine, Changzhi Medical College, Changzhi, 046000, China
| | - Zhijun Ren
- School of Energy and Environmental Engineering, Hebei University of Technology, Tianjin, 300401, China
| | - Weichao Xu
- National Key Laboratory of Biochemical Engineering, Beijing Engineering Research Centre of Process Pollution Control, Institute of Process Engineering, Innovation Academy for Green Manufacture, Chinese Academy of Sciences, Beijing, 100190, China
| | - Zhi Sun
- National Key Laboratory of Biochemical Engineering, Beijing Engineering Research Centre of Process Pollution Control, Institute of Process Engineering, Innovation Academy for Green Manufacture, Chinese Academy of Sciences, Beijing, 100190, China
| | - He Zhao
- National Key Laboratory of Biochemical Engineering, Beijing Engineering Research Centre of Process Pollution Control, Institute of Process Engineering, Innovation Academy for Green Manufacture, Chinese Academy of Sciences, Beijing, 100190, China
| | - Guangming Zhang
- School of Energy and Environmental Engineering, Hebei University of Technology, Tianjin, 300401, China.
| | - Wenfang Gao
- School of Energy and Environmental Engineering, Hebei University of Technology, Tianjin, 300401, China.
| |
Collapse
|
2
|
Retkute R, Thurston W, Cressman K, Gilligan CA. A framework for modelling desert locust population dynamics and large-scale dispersal. PLoS Comput Biol 2024; 20:e1012562. [PMID: 39700069 DOI: 10.1371/journal.pcbi.1012562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 10/16/2024] [Indexed: 12/21/2024] Open
Abstract
There is an urgent need for mathematical models that can be used to inform the deployment of surveillance, early warning and management systems for transboundary pest invasions. This is especially important for desert locust, one of the most dangerous migratory pests for smallholder farmers. During periods of desert locust upsurges and plagues, gregarious adult locusts form into swarms that are capable of long-range dispersal. Here we introduce a novel integrated modelling framework for use in predicting gregarious locust populations. The framework integrates the selection of breeding sites, maturation through egg, hopper and adult stages and swarm dispersal in search of areas suitable for feeding and breeding. Using a combination of concepts from epidemiological modelling, weather and environment data, together with an atmospheric transport model for swarm movement we provide a tool to forecast short- and long-term swarm movements. A principal aim of the framework is to provide a practical starting point for use in the next upsurge.
Collapse
Affiliation(s)
- Renata Retkute
- Epidemiology and Modelling Group, Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge, United Kingdom
| | | | - Keith Cressman
- Food and Agriculture Organization of the United Nations, Viale delle Terme di Caracalla, Rome, Italy
| | - Christopher A Gilligan
- Epidemiology and Modelling Group, Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge, United Kingdom
| |
Collapse
|
3
|
Cho HN, Ahn I, Gwon H, Kang HJ, Kim Y, Seo H, Choi H, Kim M, Han J, Kee G, Park S, Jun TJ, Kim YH. Explainable predictions of a machine learning model to forecast the postoperative length of stay for severe patients: machine learning model development and evaluation. BMC Med Inform Decis Mak 2024; 24:350. [PMID: 39563368 DOI: 10.1186/s12911-024-02755-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 11/07/2024] [Indexed: 11/21/2024] Open
Abstract
BACKGROUND Predicting the length of stay in advance will not only benefit the hospitals both clinically and financially but enable healthcare providers to better decision-making for improved quality of care. More importantly, understanding the length of stay of severe patients who require general anesthesia is key to enhancing health outcomes. OBJECTIVE Here, we aim to discover how machine learning can support resource allocation management and decision-making resulting from the length of stay prediction. METHODS A retrospective cohort study was conducted from January 2018 to October 2020. A total cohort of 240,000 patients' medical records was collected. The data were collected exclusively for preoperative variables to accurately analyze the predictive factors impacting the postoperative length of stay. The main outcome of this study is an analysis of the length of stay (in days) after surgery until discharge. The prediction was performed with ridge regression, random forest, XGBoost, and multi-layer perceptron neural network models. RESULTS The XGBoost resulted in the best performance with an average error within 3 days. Moreover, we explain each feature's contribution over the XGBoost model and further display distinct predictors affecting the overall prediction outcome at the patient level. The risk factors that most importantly contributed to the stay after surgery were as follows: a direct bilirubin laboratory test, department change, calcium chloride medication, gender, and diagnosis with the removal of other organs. Our results suggest that healthcare providers take into account the risk factors such as the laboratory blood test, distributing patients, and the medication prescribed prior to the surgery. CONCLUSION We successfully predicted the length of stay after surgery and provide explainable models with supporting analyses. In summary, we demonstrate the interpretation with the XGBoost model presenting insights on preoperative features and defining higher risk predictors to the length of stay outcome. Our development in explainable models supports the current in-depth knowledge for the future length of stay prediction on electronic medical records that aids the decision-making and facilitation of the operation department.
Collapse
Affiliation(s)
- Ha Na Cho
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43Gil, Songpagu, Seoul, 05505, Republic of Korea
| | - Imjin Ahn
- Department of Medical Science, Asan Medical Center, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Sonpagu, 05505, Seoul, Republic of Korea
| | - Hansle Gwon
- Department of Medical Science, Asan Medical Center, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Sonpagu, 05505, Seoul, Republic of Korea
| | - Hee Jun Kang
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43Gil, Songpagu, Seoul, 05505, Republic of Korea
| | - Yunha Kim
- Department of Medical Science, Asan Medical Center, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Sonpagu, 05505, Seoul, Republic of Korea
| | - Hyeram Seo
- Department of Medical Science, Asan Medical Center, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Sonpagu, 05505, Seoul, Republic of Korea
| | - Heejung Choi
- Department of Medical Science, Asan Medical Center, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Sonpagu, 05505, Seoul, Republic of Korea
| | - Minkyoung Kim
- Department of Medical Science, Asan Medical Center, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Sonpagu, 05505, Seoul, Republic of Korea
| | - Jiye Han
- Department of Medical Science, Asan Medical Center, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Sonpagu, 05505, Seoul, Republic of Korea
| | - Gaeun Kee
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43Gil, Songpagu, Seoul, 05505, Republic of Korea
| | - Seohyun Park
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43Gil, Songpagu, Seoul, 05505, Republic of Korea
| | - Tae Joon Jun
- Big Data Research Center, Asan Institute for Life Sciences, Asan Medical Center, 88, Olympicro 43Gil, Songpagu, Seoul, 05505, Republic of Korea.
| | - Young-Hak Kim
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43Gil, Songpagu, Seoul, 05505, Republic of Korea
| |
Collapse
|
4
|
Lara-Ramírez EE, Rivera G, Oliva-Hernández AA, Bocanegra-Garcia V, López JA, Guo X. Unsupervised learning analysis on the proteomes of Zika virus. PeerJ Comput Sci 2024; 10:e2443. [PMID: 39650519 PMCID: PMC11623125 DOI: 10.7717/peerj-cs.2443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2024] [Accepted: 10/01/2024] [Indexed: 12/11/2024]
Abstract
Background The Zika virus (ZIKV), which is transmitted by mosquito vectors to nonhuman primates and humans, causes devastating outbreaks in the poorest tropical regions of the world. Molecular epidemiology, supported by clustering phylogenetic gold standard studies using sequence data, has provided valuable information for tracking and controlling the spread of ZIKV. Unsupervised learning (UL), a form of machine learning algorithm, can be applied on the datasets without the need of known information for training. Methods In this work, unsupervised Random Forest (URF), followed by the application of dimensional reduction algorithms such as principal component analysis (PCA), Uniform Manifold Approximation and Projection (UMAP), t-distributed stochastic neighbor embedding (t-SNE), and autoencoders were used to uncover hidden patterns from polymorphic amino acid sites extracted on the proteome ZIKV multi-alignments, without the need of an underlying evolutionary model. Results The four UL algorithms revealed specific host and geographical clustering patterns for ZIKV. Among the four dimensionality reduction (DR) algorithms, the performance was better for UMAP. The four algorithms allowed the identification of imported viruses for specific geographical clusters. The UL dimension coordinates showed a significant correlation with phylogenetic tree branch lengths and significant phylogenetic dependence in Abouheif's Cmean and Pagel's Lambda tests (p value < 0.01) that showed comparable performance with the phylogenetic method. This analytical strategy was generalizable to an external large dengue type 2 dataset. Conclusion These UL algorithms could be practical evolutionary analytical techniques to track the dispersal of viral pathogens.
Collapse
Affiliation(s)
- Edgar E. Lara-Ramírez
- Laboratorio de Biotecnología Farmacéutica, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, Reynosa, Tamaulipas, México
| | - Gildardo Rivera
- Laboratorio de Biotecnología Farmacéutica, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, Reynosa, Tamaulipas, México
| | - Amanda Alejandra Oliva-Hernández
- Laboratorio de Biotecnología Experimental, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, Reynosa, Tamaulipas, México
| | - Virgilio Bocanegra-Garcia
- Laboratorio de Interacción Ambiente Microorganismo, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, Reynosa, Tamaulipas, México
| | - Jesús Adrián López
- Laboratorio de microRNAs y Cáncer, Unidad Académica de Ciencias Biológicas, Universidad Autónoma de Zacatecas, Zacatecas, Zacatecas, México
| | - Xianwu Guo
- Laboratorio de Biotecnología Genómica, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, Reynosa, Tamaulipas, México
| |
Collapse
|
5
|
Lu J, Zhu K, Yang N, Chen Q, Liu L, Liu Y, Yang Y, Li J. Radiomics and Clinical Features for Distinguishing Kidney Stone-Associated Urinary Tract Infection: A Comprehensive Analysis of Machine Learning Classification. Open Forum Infect Dis 2024; 11:ofae581. [PMID: 39435322 PMCID: PMC11493090 DOI: 10.1093/ofid/ofae581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Accepted: 10/02/2024] [Indexed: 10/23/2024] Open
Abstract
Background This study investigated the abilities of radiomics and clinical feature models to distinguish kidney stone-associated urinary tract infections (KS-UTIs) using computed tomography. Methods A retrospective analysis was conducted on a single-center dataset comprising computed tomography (CT) scans and corresponding clinical information from 461 patients with kidney stones. Radiomics features were extracted from CT images and underwent dimensionality reduction and selection. Multiple machine learning (Three types of shallow learning and four types of deep learning) algorithms were employed to construct radiomics and clinical models in this study. Performance evaluation and optimal model selection were done using receiver operating characteristic (ROC) curve analysis and Delong test. Univariate and multivariate logistic regression analyzed clinical and radiomics features to identify significant variables and develop a clinical model. A combined model integrating radiomics and clinical features was established. Model performance was assessed by ROC curve analysis, clinical utility was evaluated through decision curve analysis, and the accuracy of the model was analyzed via calibration curve. Results Multilayer perceptron (MLP) showed higher classification accuracy than other classifiers (area under the curve (AUC) for radiomics model: train 0.96, test 0.94; AUC for clinical model: train 0.95, test 0.91. The combined radiomics-clinical model performed best (AUC for combined model: train 0.98, test 0.95). Decision curve and calibration curve analyses confirmed the model's clinical efficacy and calibration. Conclusions This study showed the effectiveness of combining radiomics and clinical features from CT scans to identify KS-UTIs. A combined model using MLP exhibited strong classification abilities.
Collapse
Affiliation(s)
- Jianjuan Lu
- Department of Infectious Diseases, The First Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Kun Zhu
- Department of Orthopedics, The First Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Ning Yang
- Department of Infectious Diseases, The First Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Qiang Chen
- Department of Infectious Diseases, The First Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Lingrui Liu
- Department of Infectious Diseases, The First Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Yanyan Liu
- Department of Infectious Diseases, The First Affiliated Hospital of Anhui Medical University, Hefei, China
- Anhui Province Key Laboratory of Infectious Diseases, Anhui Medical University, Hefei, China
- Institute of Infectious Diseases, Anhui Medical University, Hefei, China
- Institute of Bacterial Resistance, Anhui Medical University, Hefei, China
- Anhui Center for Surveillance of Bacterial Resistance, The First Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Yi Yang
- Department of Infectious Diseases, The First Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Jiabin Li
- Department of Infectious Diseases, The First Affiliated Hospital of Anhui Medical University, Hefei, China
- Anhui Province Key Laboratory of Infectious Diseases, Anhui Medical University, Hefei, China
- Institute of Infectious Diseases, Anhui Medical University, Hefei, China
- Institute of Bacterial Resistance, Anhui Medical University, Hefei, China
- Anhui Center for Surveillance of Bacterial Resistance, The First Affiliated Hospital of Anhui Medical University, Hefei, China
| |
Collapse
|
6
|
Barreñada L, Dhiman P, Timmerman D, Boulesteix AL, Van Calster B. Understanding overfitting in random forest for probability estimation: a visualization and simulation study. Diagn Progn Res 2024; 8:14. [PMID: 39334348 PMCID: PMC11437774 DOI: 10.1186/s41512-024-00177-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 09/17/2024] [Indexed: 09/30/2024] Open
Abstract
BACKGROUND Random forests have become popular for clinical risk prediction modeling. In a case study on predicting ovarian malignancy, we observed training AUCs close to 1. Although this suggests overfitting, performance was competitive on test data. We aimed to understand the behavior of random forests for probability estimation by (1) visualizing data space in three real-world case studies and (2) a simulation study. METHODS For the case studies, multinomial risk estimates were visualized using heatmaps in a 2-dimensional subspace. The simulation study included 48 logistic data-generating mechanisms (DGM), varying the predictor distribution, the number of predictors, the correlation between predictors, the true AUC, and the strength of true predictors. For each DGM, 1000 training datasets of size 200 or 4000 with binary outcomes were simulated, and random forest models were trained with minimum node size 2 or 20 using the ranger R package, resulting in 192 scenarios in total. Model performance was evaluated on large test datasets (N = 100,000). RESULTS The visualizations suggested that the model learned "spikes of probability" around events in the training set. A cluster of events created a bigger peak or plateau (signal), isolated events local peaks (noise). In the simulation study, median training AUCs were between 0.97 and 1 unless there were 4 binary predictors or 16 binary predictors with a minimum node size of 20. The median discrimination loss, i.e., the difference between the median test AUC and the true AUC, was 0.025 (range 0.00 to 0.13). Median training AUCs had Spearman correlations of around 0.70 with discrimination loss. Median test AUCs were higher with higher events per variable, higher minimum node size, and binary predictors. Median training calibration slopes were always above 1 and were not correlated with median test slopes across scenarios (Spearman correlation - 0.11). Median test slopes were higher with higher true AUC, higher minimum node size, and higher sample size. CONCLUSIONS Random forests learn local probability peaks that often yield near perfect training AUCs without strongly affecting AUCs on test data. When the aim is probability estimation, the simulation results go against the common recommendation to use fully grown trees in random forest models.
Collapse
Affiliation(s)
- Lasai Barreñada
- Department of Development and Regeneration, Leuven, KU, Belgium
- Leuven Unit for Health Technology Assessment Research (LUHTAR), Leuven, KU, Belgium
| | - Paula Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Dirk Timmerman
- Department of Development and Regeneration, Leuven, KU, Belgium
- Department of Obstetrics and Gynecology, University Hospitals Leuven, Leuven, Belgium
| | | | - Ben Van Calster
- Department of Development and Regeneration, Leuven, KU, Belgium.
- Leuven Unit for Health Technology Assessment Research (LUHTAR), Leuven, KU, Belgium.
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, the Netherlands.
| |
Collapse
|
7
|
Nikouline A, Feng J, Rudzicz F, Nathens A, Nolan B. Machine learning in the prediction of massive transfusion in trauma: a retrospective analysis as a proof-of-concept. Eur J Trauma Emerg Surg 2024; 50:1073-1081. [PMID: 38265444 DOI: 10.1007/s00068-023-02423-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 12/04/2023] [Indexed: 01/25/2024]
Abstract
PURPOSE Early administration and protocolization of massive hemorrhage protocols (MHP) has been associated with decreases in mortality, multiorgan system failure, and number of blood products used. Various prediction tools have been developed for the initiation of MHP, but no single tool has demonstrated strong prediction with early clinical data. We sought to develop a massive transfusion prediction model using machine learning and early clinical data. METHODS Using the National Trauma Data Bank from 2013 to 2018, we included severely injured trauma patients and extracted clinical features available from the pre-hospital and emergency department. We subsequently balanced our dataset and used the Boruta algorithm to determine feature selection. Massive transfusion was defined as five units at 4 h and ten units at 24 h. Six machine learning models were trained on the balanced dataset and tested on the original. RESULTS A total of 326,758 patients met our inclusion with 18,871 (5.8%) requiring massive transfusion. Emergency department models demonstrated strong performance characteristics with mean areas under the receiver-operating characteristic curve of 0.83. Extreme gradient boost modeling slightly outperformed and demonstrated adequate predictive performance with pre-hospital data only, as well as 4-h transfusion thresholds. CONCLUSIONS We demonstrate the use of machine learning in developing an accurate prediction model for massive transfusion in trauma patients using early clinical data. This research demonstrates the potential utility of artificial intelligence as a clinical decision support tool.
Collapse
Affiliation(s)
- Anton Nikouline
- Department of Emergency Medicine, London Health Sciences Centre, 800 Commissioners Road E, London, ON, N6A 5W9, Canada.
- Division of Critical Care and Emergency Medicine, Department of Medicine, Western University, London, ON, Canada.
| | - Jinyue Feng
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Frank Rudzicz
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
- Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada
| | - Avery Nathens
- Department of Surgery, Sunnybrook Health Sciences Center, Toronto, ON, Canada
- American College of Surgeons, Chicago, IL, USA
| | - Brodie Nolan
- Division of Emergency Medicine, Department of Medicine, University of Toronto, Toronto, ON, Canada
- International Centre for Surgical Safety, St. Michael's Hospital, Toronto, ON, Canada
- Li Ka Shing Knowledge Institute, St. Michael's Hospital, Toronto, ON, Canada
- Department of Emergency Medicine, St. Michael's Hospital, Toronto, ON, Canada
| |
Collapse
|
8
|
Möller M, Recinos B, Rastner P, Marzeion B. Heterogeneous impacts of ocean thermal forcing on ice discharge from Greenland's peripheral tidewater glaciers over 2000-2021. Sci Rep 2024; 14:11316. [PMID: 38760481 PMCID: PMC11101662 DOI: 10.1038/s41598-024-61930-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 05/11/2024] [Indexed: 05/19/2024] Open
Abstract
The Greenland Ice Sheet is losing mass at increasing rates. Substantial amounts of this mass loss occur by ice discharge which is influenced by ocean thermal forcing. The ice sheet is surrounded by thousands of peripheral, dynamically decoupled glaciers. The mass loss from these glaciers is disproportionately high considering their negligible share in Greenland' overall ice mass. We study the relevance of ocean thermal forcing for ice discharge evolution in the context of this contrasting behaviour. Our estimate of ice discharge from the peripheral tidewater glaciers yields a rather stable Greenland-wide mean of 5.40 ± 3.54 Gt a-1 over 2000-2021. The evolutions of ice discharge and ocean thermal forcing are heterogeneous around Greenland. We observe a significant sector-wide increase of ice discharge in the East and a significant sector-wide decrease in the Northeast. Ocean thermal forcing shows significant increases along the northern/eastern coast, while otherwise unchanged conditions or decreases prevail. For East Greenland, this implies a clear influence of ocean thermal forcing on ice discharge. Similarly, we find clear influences at peripheral tidewater glaciers with thick termini that are similar to ice sheet outlet glaciers. At the peripheral glaciers in Northeast Greenland ice discharge evolution opposes ocean thermal forcing for unknown reasons.
Collapse
Affiliation(s)
- Marco Möller
- Institute of Geography, University of Bremen, Bremen, Germany.
- MARUM-Center for Marine Environmental Sciences, University of Bremen, Bremen, Germany.
- Geodesy and Glaciology, Bavarian Academy of Sciences and Humanities, Munich, Germany.
| | - Beatriz Recinos
- School of GeoSciences, University of Edinburgh, Edinburgh, Scotland, UK
| | - Philipp Rastner
- Department of Geography, University of Zurich, Zurich, Switzerland
| | - Ben Marzeion
- Institute of Geography, University of Bremen, Bremen, Germany
- MARUM-Center for Marine Environmental Sciences, University of Bremen, Bremen, Germany
| |
Collapse
|
9
|
Cleman J, Romain G, Callegari S, Scierka L, Jacque F, Smolderen KG, Mena-Hurtado C. Evaluation of short-term mortality in patients with Medicare undergoing endovascular interventions for chronic limb-threatening ischemia. Vasc Med 2024; 29:172-181. [PMID: 38334045 DOI: 10.1177/1358863x231224335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
INTRODUCTION Patients with chronic limb-threatening ischemia (CLTI) have high mortality rates after revascularization. Risk stratification for short-term outcomes is challenging. We aimed to develop machine-learning models to rank predictive variables for 30-day and 90-day all-cause mortality after peripheral vascular intervention (PVI). METHODS Patients undergoing PVI for CLTI in the Medicare-linked Vascular Quality Initiative were included. Sixty-six preprocedural variables were included. Random survival forest (RSF) models were constructed for 30-day and 90-day all-cause mortality in the training sample and evaluated in the testing sample. Predictive variables were ranked based on the frequency that they caused branch splitting nearest the root node by importance-weighted relative importance plots. Model performance was assessed by the Brier score, continuous ranked probability score, out-of-bag error rate, and Harrell's C-index. RESULTS A total of 10,114 patients were included. The crude mortality rate was 4.4% at 30 days and 10.6% at 90 days. RSF models commonly identified stage 5 chronic kidney disease (CKD), dementia, congestive heart failure (CHF), age, urgent procedures, and need for assisted care as the most predictive variables. For both models, eight of the top 10 variables were either medical comorbidities or functional status variables. Models showed good discrimination (C-statistic 0.72 and 0.73) and calibration (Brier score 0.03 and 0.10). CONCLUSION RSF models for 30-day and 90-day all-cause mortality commonly identified CKD, dementia, CHF, need for assisted care at home, urgent procedures, and age as the most predictive variables as critical factors in CLTI. Results may help guide individualized risk-benefit treatment conversations regarding PVI.
Collapse
Affiliation(s)
- Jacob Cleman
- Vascular Medicine Outcomes Program, Section of Cardiovascular Medicine, Yale University, New Haven, CT, USA
| | - Gaëlle Romain
- Vascular Medicine Outcomes Program, Section of Cardiovascular Medicine, Yale University, New Haven, CT, USA
| | - Santiago Callegari
- Vascular Medicine Outcomes Program, Section of Cardiovascular Medicine, Yale University, New Haven, CT, USA
| | - Lindsey Scierka
- Vascular Medicine Outcomes Program, Section of Cardiovascular Medicine, Yale University, New Haven, CT, USA
| | - Francky Jacque
- Vascular Medicine Outcomes Program, Section of Cardiovascular Medicine, Yale University, New Haven, CT, USA
| | - Kim G Smolderen
- Vascular Medicine Outcomes Program, Section of Cardiovascular Medicine, Yale University, New Haven, CT, USA
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
| | - Carlos Mena-Hurtado
- Vascular Medicine Outcomes Program, Section of Cardiovascular Medicine, Yale University, New Haven, CT, USA
| |
Collapse
|
10
|
Donnelly JP, Collins DP, Knetter JM, Gammonley JH, Boggie MA, Grisham BA, Nowak MC, Naugle DE. Flood-irrigated agriculture mediates climate-induced wetland scarcity for summering sandhill cranes in western North America. Ecol Evol 2024; 14:e10998. [PMID: 38450315 PMCID: PMC10915483 DOI: 10.1002/ece3.10998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 12/14/2023] [Accepted: 12/22/2023] [Indexed: 03/08/2024] Open
Abstract
Information about species distributions is lacking in many regions of the world, forcing resource managers to answer complex ecological questions with incomplete data. Information gaps are compounded by climate change, driving ecological bottlenecks that can act as new demographic constraints on fauna. Here, we construct greater sandhill crane (Antigone canadensis tabida) summering range in western North America using movement data from 120 GPS-tagged individuals to determine how landscape composition shaped their distributions. Landscape variables developed from remotely sensed data were combined with bird locations to model distribution probabilities. Additionally, land-use and ownership were summarized within summer range as a measure of general bird use. Wetland variables identified as important predictors of bird distributions were evaluated in a post hoc analysis to measure long-term (1984-2022) effects of climate-driven surface water drying. Wetlands and associated agricultural practices accounted for 1.2% of summer range but were key predictors of occurrence. Bird distributions were structured by riparian floodplains that concentrated wetlands, and flood-irrigated agriculture in otherwise arid and semi-arid landscapes. Findings highlighted the role of private lands in greater sandhill crane ecology as they accounted for 78% of predicted distributions. Wetland drying observed in portions of the range from 1984 to 2022 represented an emerging ecological bottleneck that could limit future greater sandhill crane summer range. Study outcomes provide novel insight into the significance of ecosystem services provided by flood-irrigated agriculture that supported nearly 60% of wetland resources used by birds. Findings suggest greater sandhill cranes function as a surrogate species for agroecology and climate change adaptation strategies seeking to reduce agricultural water use through improved efficiency while also maintaining distinct flood-irrigation practices supporting greater sandhill cranes and other wetland-dependent wildlife. We make our wetland and sandhill crane summering distributions available as interactive web-based mapping tools to inform conservation design.
Collapse
Affiliation(s)
- J. Patrick Donnelly
- Intermountain West Joint Venture—U.S. Fish and Wildlife Service Migratory Bird ProgramMissoulaMontanaUSA
| | - Daniel P. Collins
- W.A. Franke College of Forestry and ConservationUniversity of MontanaMissoulaMontanaUSA
| | | | | | - Matthew A. Boggie
- Intermountain West Joint Venture—U.S. Fish and Wildlife Service Migratory Bird ProgramMissoulaMontanaUSA
| | - Blake A. Grisham
- Department of Natural Resources ManagementTexas Tech UniversityLubbockTexasUSA
| | - M. Cathy Nowak
- Oregon Department of Fish and WildlifeLadd Marsh Wildlife AreaLa GrandeOregonUSA
- U.S. Fish and Wildlife ServiceSouthwest Region Migratory Bird ProgramAlbuquerqueNew MexicoUSA
| | - David E. Naugle
- W.A. Franke College of Forestry and ConservationUniversity of MontanaMissoulaMontanaUSA
| |
Collapse
|
11
|
Tappan I, Lindbeck EM, Nichols JA, Harley JB. Explainable AI Elucidates Musculoskeletal Biomechanics: A Case Study Using Wrist Surgeries. Ann Biomed Eng 2024; 52:498-509. [PMID: 37943340 PMCID: PMC11293275 DOI: 10.1007/s10439-023-03394-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 10/20/2023] [Indexed: 11/10/2023]
Abstract
As datasets increase in size and complexity, biomechanists have turned to artificial intelligence (AI) to aid their analyses. This paper explores how explainable AI (XAI) can enhance the interpretability of biomechanics data derived from musculoskeletal simulations. We use machine learning to classify the simulated lateral pinch data as belonging to models with healthy or one of two types of surgically altered wrists. This simulation-based classification task is analogous to using biomechanical movement and force data to clinically diagnose a pathological state. The XAI describes which musculoskeletal features best explain the classifications and, in turn, the pathological states, at both the local (individual decision) level and global (entire algorithm) level. We demonstrate that these descriptions agree with assessments in the literature and additionally identify the blind spots that can be missed with traditional statistical techniques.
Collapse
Affiliation(s)
- Isaly Tappan
- Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, 32611, USA
| | - Erica M Lindbeck
- Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, 32611, USA
| | - Jennifer A Nichols
- J. Crayton Pruitt Family Department of Biomedical Engineering, University of Florida, Gainesville, FL, 32611, USA
| | - Joel B Harley
- Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, 32611, USA.
| |
Collapse
|
12
|
Javeed A, Anderberg P, Ghazi AN, Noor A, Elmståhl S, Berglund JS. Breaking barriers: a statistical and machine learning-based hybrid system for predicting dementia. Front Bioeng Biotechnol 2024; 11:1336255. [PMID: 38260734 PMCID: PMC10801181 DOI: 10.3389/fbioe.2023.1336255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 12/05/2023] [Indexed: 01/24/2024] Open
Abstract
Introduction: Dementia is a condition (a collection of related signs and symptoms) that causes a continuing deterioration in cognitive function, and millions of people are impacted by dementia every year as the world population continues to rise. Conventional approaches for determining dementia rely primarily on clinical examinations, analyzing medical records, and administering cognitive and neuropsychological testing. However, these methods are time-consuming and costly in terms of treatment. Therefore, this study aims to present a noninvasive method for the early prediction of dementia so that preventive steps should be taken to avoid dementia. Methods: We developed a hybrid diagnostic system based on statistical and machine learning (ML) methods that used patient electronic health records to predict dementia. The dataset used for this study was obtained from the Swedish National Study on Aging and Care (SNAC), with a sample size of 43040 and 75 features. The newly constructed diagnostic extracts a subset of useful features from the dataset through a statistical method (F-score). For the classification, we developed an ensemble voting classifier based on five different ML models: decision tree (DT), naive Bayes (NB), logistic regression (LR), support vector machines (SVM), and random forest (RF). To address the problem of ML model overfitting, we used a cross-validation approach to evaluate the performance of the proposed diagnostic system. Various assessment measures, such as accuracy, sensitivity, specificity, receiver operating characteristic (ROC) curve, and Matthew's correlation coefficient (MCC), were used to thoroughly validate the devised diagnostic system's efficiency. Results: According to the experimental results, the proposed diagnostic method achieved the best accuracy of 98.25%, as well as sensitivity of 97.44%, specificity of 95.744%, and MCC of 0.7535. Discussion: The effectiveness of the proposed diagnostic approach is compared to various cutting-edge feature selection techniques and baseline ML models. From experimental results, it is evident that the proposed diagnostic system outperformed the prior feature selection strategies and baseline ML models regarding accuracy.
Collapse
Affiliation(s)
- Ashir Javeed
- Department of Health, Blekinge Institute of Technology, Karlskrona, Sweden
| | - Peter Anderberg
- Department of Health, Blekinge Institute of Technology, Karlskrona, Sweden
- School of Health Sciences, University of Skövde, Skövde, Sweden
| | - Ahmad Nauman Ghazi
- Department of Software Engineering, Blekinge Institute of Technology, Karlskrona, Sweden
| | - Adeeb Noor
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Sölve Elmståhl
- EpiHealth: Epidemiology for Health, Lund University, SUS Malmö, Malmö, Sweden
| | | |
Collapse
|
13
|
Moumen A, Shafqat A, Alraqad T, Alshawarbeh ES, Saber H, Shafqat R. Divorce prediction using machine learning algorithms in Ha'il region, KSA. Sci Rep 2024; 14:502. [PMID: 38177210 PMCID: PMC10766631 DOI: 10.1038/s41598-023-50839-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 12/26/2023] [Indexed: 01/06/2024] Open
Abstract
The application of artificial intelligence (AI) in predictive analytics is growing in popularity. It has the power to offer ground-breaking solutions for a range of social problems and real world societal difficulties. It is helpful in addressing some of the social issues that today's world seems incapable of solving. One of the most significant phenomena affecting people's lives is divorce. The goal of this paper is to study the use of machine learning algorithms to determine the effectiveness of divorce predictor scale (DPS) and identify the reasons that usually lead to divorce in the scenario of Hail region, KSA. For this purpose, in this study, the DPS, based on Gottman couples therapy, was used to predict divorce by applying different machine learning algorithms. There were 54 items of the DPS used as features or attributes for data collection. In addition to the DPS, a personal information form was utilized to gather participants' personal data in order to conduct this study in a more structured and traditional manner. Out of 148 participants 116 participants were married whereas 32 were divorced. With the use of algorithms artificial neural network (ANN), naïve bayes (NB), and random forest (RF), the effectiveness of DPS was examined in this study. The correlation based feature selection method was used to identify the top six features from the same dataset and the highest accuracy rate was 91.66% with RF. The results show that DPS can predict divorce. This scale can help family counselors and therapists in case formulation and intervention plan development process. Additionally, it may be argued that the Hail region, KSA sampling confirmed the Gottman couples treatment predictors.
Collapse
Affiliation(s)
- Abdelkader Moumen
- Department of Mathematics, College of Science, University of Ha'il, Ha'il, 55473, Saudi Arabia.
| | - Ayesha Shafqat
- Department of Education, The University of Lahore, Sargodha, 40100, Pakistan
| | - Tariq Alraqad
- Department of Mathematics, College of Science, University of Ha'il, Ha'il, 55473, Saudi Arabia
| | - Etaf Saleh Alshawarbeh
- Department of Mathematics, College of Science, University of Ha'il, Ha'il, 55473, Saudi Arabia
| | - Hicham Saber
- Department of Mathematics, College of Science, University of Ha'il, Ha'il, 55473, Saudi Arabia
| | - Ramsha Shafqat
- Department of Mathematics and Statistics, The University of Lahore, Sargodha, 40100, Pakistan
| |
Collapse
|
14
|
Du JH, Patil P, Roeder K, Kuchibhotla AK. Extrapolated cross-validation for randomized ensembles. J Comput Graph Stat 2024; 33:1061-1072. [PMID: 39439808 PMCID: PMC11492369 DOI: 10.1080/10618600.2023.2288194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 11/15/2023] [Indexed: 10/25/2024]
Abstract
Ensemble methods such as bagging and random forests are ubiquitous in various fields, from finance to genomics. Despite their prevalence, the question of the efficient tuning of ensemble parameters has received relatively little attention. This paper introduces a cross-validation method, ECV (Extrapolated Cross-Validation), for tuning the ensemble and subsample sizes in randomized ensembles. Our method builds on two primary ingredients: initial estimators for small ensemble sizes using out-of-bag errors and a novel risk extrapolation technique that leverages the structure of prediction risk decomposition. By establishing uniform consistency of our risk extrapolation technique over ensemble and subsample sizes, we show that ECV yields δ -optimal (with respect to the oracle-tuned risk) ensembles for squared prediction risk. Our theory accommodates general predictors, only requires mild moment assumptions, and allows for high-dimensional regimes where the feature dimension grows with the sample size. As a practical case study, we employ ECV to predict surface protein abundances from gene expressions in single-cell multiomics using random forests under a computational constraint on the maximum ensemble size. Compared to sample-split and K -fold cross-validation, ECV achieves higher accuracy by avoiding sample splitting. Meanwhile, its computational cost is considerably lower owing to the use of the risk extrapolation technique.
Collapse
Affiliation(s)
- Jin-Hong Du
- Department of Statistics and Data Science, Carnegie Mellon University
- Machine Learning Department, Carnegie Mellon University
| | - Pratik Patil
- Department of Statistics, University of California, Berkeley
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University
| | | |
Collapse
|
15
|
Wang S, Liu Y, Liu Y, Zhang Y, Zhu X. BERT-5mC: an interpretable model for predicting 5-methylcytosine sites of DNA based on BERT. PeerJ 2023; 11:e16600. [PMID: 38089911 PMCID: PMC10712318 DOI: 10.7717/peerj.16600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 11/15/2023] [Indexed: 12/18/2023] Open
Abstract
DNA 5-methylcytosine (5mC) is widely present in multicellular eukaryotes, which plays important roles in various developmental and physiological processes and a wide range of human diseases. Thus, it is essential to accurately detect the 5mC sites. Although current sequencing technologies can map genome-wide 5mC sites, these experimental methods are both costly and time-consuming. To achieve a fast and accurate prediction of 5mC sites, we propose a new computational approach, BERT-5mC. First, we pre-trained a domain-specific BERT (bidirectional encoder representations from transformers) model by using human promoter sequences as language corpus. BERT is a deep two-way language representation model based on Transformer. Second, we fine-tuned the domain-specific BERT model based on the 5mC training dataset to build the model. The cross-validation results show that our model achieves an AUROC of 0.966 which is higher than other state-of-the-art methods such as iPromoter-5mC, 5mC_Pred, and BiLSTM-5mC. Furthermore, our model was evaluated on the independent test set, which shows that our model achieves an AUROC of 0.966 that is also higher than other state-of-the-art methods. Moreover, we analyzed the attention weights generated by BERT to identify a number of nucleotide distributions that are closely associated with 5mC modifications. To facilitate the use of our model, we built a webserver which can be freely accessed at: http://5mc-pred.zhulab.org.cn.
Collapse
Affiliation(s)
- Shuyu Wang
- School of Sciences, Anhui Agricultural University, Hefei, Anhui, China
| | - Yinbo Liu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui, China
| | - Yufeng Liu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui, China
| | - Yong Zhang
- School of Sciences, Anhui Agricultural University, Hefei, Anhui, China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui, China
| |
Collapse
|
16
|
Jackson H, Bowen S, Jaki T. Using biomarkers to allocate patients in a response-adaptive clinical trial. COMMUN STAT-SIMUL C 2023; 52:5946-5965. [PMID: 38045870 PMCID: PMC7615340 DOI: 10.1080/03610918.2021.2004420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 11/05/2021] [Indexed: 10/19/2022]
Abstract
In this paper, we discuss a response adaptive randomization method, and why it should be used in clinical trials for rare diseases compared to a randomized controlled trial with equal fixed randomization. The developed method uses a patient's biomarkers to alter the allocation probability to each treatment, in order to emphasize the benefit to the trial population. The method starts with an initial burn-in period of a small number of patients, who with equal probability, are allocated to each treatment. We then use a regression method to predict the best outcome of the next patient, using their biomarkers and the information from the previous patients. This estimated best treatment is assigned to the next patient with high probability. A completed clinical trial for the effect of catumaxomab on the survival of cancer patients is used as an example to demonstrate the use of the method and the differences to a controlled trial with equal allocation. Different regression procedures are investigated and compared to a randomized controlled trial, using efficacy and ethical measures.
Collapse
Affiliation(s)
| | | | - T Jaki
- Lancaster University, Lancaster, UK
- University of Cambridge, Cambridge, UK
| |
Collapse
|
17
|
Tan F, Hu C, Hu Y, Yen K, Wei Z, Pappu A, Park S, Li K. MGEL: Multigrained Representation Analysis and Ensemble Learning for Text Moderation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7014-7023. [PMID: 35113788 DOI: 10.1109/tnnls.2021.3137045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this work, we describe our efforts in addressing two typical challenges involved in the popular text classification methods when they are applied to text moderation: the representation of multibyte characters and word obfuscations. Specifically, a multihot byte-level scheme is developed to significantly reduce the dimension of one-hot character-level encoding caused by the multiplicity of instance-scarce non-ASCII characters. In addition, we introduce a simple yet effective weighting approach for fusing n-gram features to empower the classical logistic regression. Surprisingly, it outperforms well-tuned representative neural networks greatly. As a continual effort toward text moderation, we endeavor to analyze the current state-of-the-art (SOTA) algorithm bidirectional encoder representations from transformers (BERT), which works well in context understanding but performs poorly on intentional word obfuscations. To resolve this crux, we then develop an enhanced variant and remedy this drawback by integrating byte and character decomposition. It advances the SOTA performance on the largest abusive language datasets as demonstrated by our comprehensive experiments. Our work offers a feasible and effective framework to tackle word obfuscations.
Collapse
|
18
|
Wassan JT, Wang H, Zheng H. Developing a New Phylogeny-Driven Random Forest Model for Functional Metagenomics. IEEE Trans Nanobioscience 2023; 22:763-770. [PMID: 37279136 DOI: 10.1109/tnb.2023.3283462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Metagenomics is an unobtrusive science linking microbial genes to biological functions or environmental states. Classifying microbial genes into their functional repertoire is an important task in the downstream analysis of Metagenomic studies. The task involves Machine Learning (ML) based supervised methods to achieve good classification performance. Random Forest (RF) has been applied rigorously to microbial gene abundance profiles, mapping them to functional phenotypes. The current research targets tuning RF by the evolutionary ancestry of microbial phylogeny, developing a Phylogeny-RF model for functional classification of metagenomes. This method facilitates capturing the effects of phylogenetic relatedness in an ML classifier itself rather than just applying a supervised classifier over the raw abundances of microbial genes. The idea is rooted in the fact that closely related microbes by phylogeny are highly correlated and tend to have similar genetic and phenotypic traits. Such microbes behave similarly; and hence tend to be selected together, or one of these could be dropped from the analysis, to improve the ML process. The proposed Phylogeny-RF algorithm has been compared with state-of-the-art classification methods including RF and the phylogeny-aware methods of MetaPhyl and PhILR, using three real-world 16S rRNA metagenomic datasets. It has been observed that the proposed method not only achieved significantly better performance than the traditional RF model but also performed better than the other phylogeny-driven benchmarks (p < 0.05). For example, Phylogeny-RF attained a highest AUC of 0.949 and Kappa of 0.891 over soil microbiomes in comparison to other benchmarks.
Collapse
|
19
|
Gaida M, Cain CN, Synovec RE, Focant JF, Stefanuto PH. Tile-Based Random Forest Analysis for Analyte Discovery in Balanced and Unbalanced GC × GC-TOFMS Data Sets. Anal Chem 2023; 95:13519-13527. [PMID: 37647642 DOI: 10.1021/acs.analchem.3c01872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
In this study, we introduce a new nontargeted tile-based supervised analysis method that combines the four-grid tiling scheme previously established for the Fisher ratio (F-ratio) analysis (FRA) with the estimation of tile hit importance using the machine learning (ML) algorithm Random Forest (RF). This approach is termed tile-based RF analysis. As opposed to the standard tile-based F-ratio analysis, the RF approach can be extended to the analysis of unbalanced data sets, i.e., different numbers of samples per class. Tile-based RF computes out-of-bag (oob) tile hit importance estimates for every summed chromatographic signal within each tile on a per-mass channel basis (m/z). These estimates are then used to rank tile hits in a descending order of importance. In the present investigation, the RF approach was applied for a two-class comparison of stool samples collected from omnivore (O) subjects and stored using two different storage conditions: liquid (Liq) and lyophilized (Lyo). Two final hit lists were generated using balanced (8 vs Eight comparison) and unbalanced (8 vs Nine comparison) data sets and compared to the hit list generated by the standard F-ratio analysis. Similar class-distinguishing analytes (p < 0.01) were discovered by both methods. However, while the FRA discovered a more comprehensive hit list (65 hits), the RF approach strictly discovered hits (31 hits for the balanced data set comparison and 29 hits for the unbalanced data set comparison) with concentration ratios, [OLiq]/[OLyo], greater than 2 (or less than 0.5). This difference is attributed to the more stringent feature selection process used by the RF algorithm. Moreover, our findings suggest that the RF approach is a promising method for identifying class-distinguishing analytes in settings characterized by both high between-class variance and high within-class variance, making it an advantageous method in the study of complex biological matrices.
Collapse
Affiliation(s)
- Meriem Gaida
- Organic and Biological Analytical Chemistry Group, Molecular Systems Research Unit, University of Liège, 4000 Liège, Belgium
| | - Caitlin N Cain
- Department of Chemistry, University of Washington, Seattle, Washington 98195-1700, United States
| | - Robert E Synovec
- Department of Chemistry, University of Washington, Seattle, Washington 98195-1700, United States
| | - Jean-François Focant
- Organic and Biological Analytical Chemistry Group, Molecular Systems Research Unit, University of Liège, 4000 Liège, Belgium
| | - Pierre-Hugues Stefanuto
- Organic and Biological Analytical Chemistry Group, Molecular Systems Research Unit, University of Liège, 4000 Liège, Belgium
| |
Collapse
|
20
|
Williams K, Michalska S, Cohen E, Szomszor M, Grant J. Exploring the application of machine learning to expert evaluation of research impact. PLoS One 2023; 18:e0288469. [PMID: 37535633 PMCID: PMC10399885 DOI: 10.1371/journal.pone.0288469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 06/27/2023] [Indexed: 08/05/2023] Open
Abstract
The objective of this study is to investigate the application of machine learning techniques to the large-scale human expert evaluation of the impact of academic research. Using publicly available impact case study data from the UK's Research Excellence Framework (2014), we trained five machine learning models on a range of qualitative and quantitative features, including institution, discipline, narrative style (explicit and implicit), and bibliometric and policy indicators. Our work makes two key contributions. Based on the accuracy metric in predicting high- and low-scoring impact case studies, it shows that machine learning models are able to process information to make decisions that resemble those of expert evaluators. It also provides insights into the characteristics of impact case studies that would be favoured if a machine learning approach was applied for their automated assessment. The results of the experiments showed strong influence of institutional context, selected metrics of narrative style, as well as the uptake of research by policy and academic audiences. Overall, the study demonstrates promise for a shift from descriptive to predictive analysis, but suggests caution around the use of machine learning for the assessment of impact case studies.
Collapse
Affiliation(s)
- Kate Williams
- School of Social and Political Sciences, University of Melbourne, Melbourne, Victoria, Australia
| | - Sandra Michalska
- Policy Institute, King’s College London, London, Greater London, United Kingdom
| | - Eliel Cohen
- Policy Institute, King’s College London, London, Greater London, United Kingdom
| | - Martin Szomszor
- Electric Data Solutions, London, Greater London, United Kingdom
| | - Jonathan Grant
- Different Angles, Cambridge, Cambridgeshire, United Kingdom
| |
Collapse
|
21
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
22
|
Tang FH, Xue C, Law MYY, Wong CY, Cho TH, Lai CK. Prognostic Prediction of Cancer Based on Radiomics Features of Diagnostic Imaging: The Performance of Machine Learning Strategies. J Digit Imaging 2023; 36:1081-1090. [PMID: 36781589 PMCID: PMC10287586 DOI: 10.1007/s10278-022-00770-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 12/16/2022] [Accepted: 12/20/2022] [Indexed: 02/15/2023] Open
Abstract
Tumor phenotypes can be characterized by radiomics features extracted from images. However, the prediction accuracy is challenged by difficulties such as small sample size and data imbalance. The purpose of the study was to evaluate the performance of machine learning strategies for the prediction of cancer prognosis. A total of 422 patients diagnosed with non-small cell lung carcinoma (NSCLC) were selected from The Cancer Imaging Archive (TCIA). The gross tumor volume (GTV) of each case was delineated from the respective CT images for radiomic features extraction. The samples were divided into 4 groups with survival endpoints of 1 year, 3 years, 5 years, and 7 years. The radiomic image features were analyzed with 6 different machine learning methods: decision tree (DT), boosted tree (BT), random forests (RF), support vector machine (SVM), generalized linear model (GLM), and deep learning artificial neural networks (DL-ANNs) with 70:30 cross-validation. The overall average prediction performance of the BT, RF, DT, SVM, GLM and DL-ANNs was AUC with 0.912, 0.938, 0.793, 0.746, 0.789 and 0.705 respectively. The RF and BT gave the best and second performance in the prediction. The DL-ANN did not show obvious advantage in predicting prognostic outcomes. Deep learning artificial neural networks did not show a significant improvement than traditional machine learning methods such as random forest and boosted trees. On the whole, the accurate outcome prediction using radiomics serves as a supportive reference for formulating treatment strategy for cancer patients.
Collapse
Affiliation(s)
- Fuk-hay Tang
- School of Medical and Health Sciences, Tung Wah College, Hong Kong, China
| | - Cheng Xue
- Department of Computer Science and Engineering, Southeast University, Nanjing, China
| | - Maria YY Law
- School of Medical and Health Sciences, Tung Wah College, Hong Kong, China
| | - Chui-ying Wong
- School of Medical and Health Sciences, Tung Wah College, Hong Kong, China
- Department of Radiotherapy, Hong Kong Sanatorium Hospital, Hong Kong, China
| | - Tze-hei Cho
- School of Medical and Health Sciences, Tung Wah College, Hong Kong, China
| | - Chun-kit Lai
- School of Medical and Health Sciences, Tung Wah College, Hong Kong, China
- Department of Oncology, Prince of Wales Hospital, Hong Kong, China
| |
Collapse
|
23
|
Walter M, Mondal P. Mapping of Phragmites in estuarine wetlands using high-resolution aerial imagery. ENVIRONMENTAL MONITORING AND ASSESSMENT 2023; 195:478. [PMID: 36928355 DOI: 10.1007/s10661-023-11071-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 02/28/2023] [Indexed: 06/18/2023]
Abstract
Phragmites australis is a widespread invasive plant species in the USA that greatly impacts estuarine wetlands by creating dense patches and outcompeting other plants. The invasion of Phragmites into wetland ecosystems is known to decrease biodiversity, destroy the habitat of threatened and endangered bird species, and alter biogeochemistry. While the impact of Phragmites is known, the spatial extent of this species is challenging to document due to its fragmented occurrence. Using high-resolution imagery from the National Agriculture Imagery Program (NAIP) from 2017, we evaluated a geospatial method of mapping the spatial extent of Phragmites across the state of DE. Normalized difference vegetation index (NDVI) and principal component analysis (PCA) bands are generated from the NAIP data and used as inputs in a random forest classifier to achieve a high overall accuracy for the Phragmites classification of around 95%. The classified gridded dataset has a spatial resolution of 1 m and documents the spatial distribution of Phragmites throughout the state's estuarine wetlands (around 11%). Such detailed classification could aid in monitoring the spread of this invasive species over space and time and would inform the decision-making process for landscape managers.
Collapse
Affiliation(s)
- Matthew Walter
- Department of Geography and Spatial Sciences, University of Delaware, Newark, DE, 19716, USA.
| | - Pinki Mondal
- Department of Geography and Spatial Sciences, University of Delaware, Newark, DE, 19716, USA
- Department of Plant and Soil Sciences, University of Delaware, Newark, DE, 19716, USA
| |
Collapse
|
24
|
Joint leaf-refinement and ensemble pruning through $$L_1$$ regularization. Data Min Knowl Discov 2023. [DOI: 10.1007/s10618-023-00921-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
Abstract
AbstractEnsembles are among the state-of-the-art in many machine learning applications. With the ongoing integration of ML models into everyday life, e.g., in the form of the Internet of Things, the deployment and continuous application of models become more and more an important issue. Therefore, small models that offer good predictive performance and use small amounts of memory are required. Ensemble pruning is a standard technique for removing unnecessary classifiers from a large ensemble that reduces the overall resource consumption and sometimes improves the performance of the original ensemble. Similarly, leaf-refinement is a technique that improves the performance of a tree ensemble by jointly re-learning the probability estimates in the leaf nodes of the trees, thereby allowing for smaller ensembles while preserving their predictive performance. In this paper, we develop a new method that combines both approaches into a single algorithm. To do so, we introduce $$L_1$$
L
1
regularization into the leaf-refinement objective, which allows us to jointly prune and refine trees at the same time. In an extensive experimental evaluation, we show that our approach not only offers statistically significantly better performance than the state-of-the-art but also offers a better accuracy-memory trade-off. We conclude our experimental evaluation with a case study showing the effectiveness of our method in a real-world setting.
Collapse
|
25
|
Ismaiel M, Gouda M, Li Y, Chen Y. Airtightness evaluation of Canadian dwellings and influencing factors based on measured data and predictive models. INDOOR + BUILT ENVIRONMENT : THE JOURNAL OF THE INTERNATIONAL SOCIETY OF THE BUILT ENVIRONMENT 2023; 32:553-573. [PMID: 36820005 PMCID: PMC9936450 DOI: 10.1177/1420326x221121519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
The airtightness of buildings has a significant impact on buildings' energy efficiency, maintenance and occupant comfort. The main goal of this study is to provide an evaluation of the air leakage characteristics of dwellings in different regions in Canada. This study evaluated the key influencing factors on airtightness performance based on a large set of measured data (73,450 dwellings located in Canada with 11 measurement parameters for each). Machine learning models based on multivariate regression (MVR) and Random Forest Ensemble (RFE) were developed to predict the air leakage value. The RFE model, which shows better results than MVR, was used to evaluate the effect of the ageing of buildings. Results showed that the maximum increase in air leakage occurs during the first year after construction - approximately 25%, and then 3.7% in the second year, after which the increase rate becomes insignificant and relatively constant - approximately 0.3% per year. The findings from this study can provide significant information for building designs, building performance simulations and strengthening standards and guidelines policies on indoor environmental quality.
Collapse
Affiliation(s)
- Maysoun Ismaiel
- Maysoun Ismaiel, University of Alberta, 116 St NW, Edmonton, AB T6G 2E1, Canada.
| | | | | | | |
Collapse
|
26
|
Poodendan C, Suwannakhan A, Chawalchitiporn T, Kasai Y, Nantasenamat C, Yurasakpong L, Iamsaard S, Chaiyamoon A. Morphometric analysis of dry atlas vertebrae in a northeastern Thai population and possible correlation with sex. SURGICAL AND RADIOLOGIC ANATOMY : SRA 2023; 45:175-181. [PMID: 36602583 DOI: 10.1007/s00276-022-03076-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 12/30/2022] [Indexed: 01/06/2023]
Abstract
PURPOSE The uppermost segment of the cervical vertebra or atlas (C1) is a critically important anatomical structure, housing the medulla oblongata and containing the grooves for the C1 spinal nerve and the vertebral vessels. Variations of the C1 vertebra can affect upper spine stability, and morphometric parameters have been reported to differ by population. However, there are few data regarding these parameters in Thais. The use of this bone to predict sex and age has never been reported. METHODS This study aimed to examine C1 morphometry and determine its ability to predict sex. Twelve diameter parameters were taken from the C1 vertebrae of identified skeletons (n = 104, males [n, 54], females [n, 50]). Correlation analysis was also performed for sex and age, which were predicted using machine learning algorithms. RESULTS The results showed that 8 of the 12 measured parameters were significantly longer in the male atlas (p < 0.05), while the remaining 4 (distance between both medial-most edges of the transverse foramen, transverse dimension of the superior articular surface, frontal plane passing through the canal's midpoint, and anteroposterior dimension of the inferior articular surface) did not differ significantly by sex. There was no statistically significant difference in these parameters on the lateral side. The decision stump classifier was trained on C1 parameters, and the resulting model could predict sex with 82.6% accuracy (root mean square error = 0.38). CONCLUSION Assertation of the morphometric parameters of the atlas is important for preoperative assessment, especially for the treatment of atlas dislocation. Our findings also highlighted the potential use of atlas measurements for sex prediction.
Collapse
Affiliation(s)
- Chanasorn Poodendan
- Department of Anatomy, Faculty of Medical Science, Naresuan University, Phitsanulok, 65000, Thailand
| | - Athikhun Suwannakhan
- Department of Anatomy, Faculty of Science, Mahidol University, Bangkok, 10400, Thailand.,In Silico and Clinical Anatomy Research Group (iSCAN), Bangkok, 10400, Thailand
| | - Tidarat Chawalchitiporn
- Department of Anatomy, Faculty of Medicine, Khon Kaen University, Khon Kaen, 40002, Thailand
| | - Yuichi Kasai
- Department of Orthopedics, Faculty of Medicine, Khon Kaen University, Mitraparp Road, Khon Kaen, Thailand
| | | | - Laphatrada Yurasakpong
- Princess Srisavangavadhana College of Medicine, Chulabhorn Royal Academy, Bangkok, 10210, Thailand
| | - Sitthichai Iamsaard
- Department of Anatomy, Faculty of Medicine, Khon Kaen University, Khon Kaen, 40002, Thailand.
| | - Arada Chaiyamoon
- Department of Anatomy, Faculty of Medicine, Khon Kaen University, Khon Kaen, 40002, Thailand.
| |
Collapse
|
27
|
Vaughan L, Zhang M, Gu H, Rose JB, Naughton CC, Medema G, Allan V, Roiko A, Blackall L, Zamyadi A. An exploration of challenges associated with machine learning for time series forecasting of COVID-19 community spread using wastewater-based epidemiological data. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 858:159748. [PMID: 36306840 PMCID: PMC9597519 DOI: 10.1016/j.scitotenv.2022.159748] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Revised: 10/22/2022] [Accepted: 10/22/2022] [Indexed: 05/19/2023]
Abstract
Wastewater-based epidemiology (WBE) has gained increasing attention as a complementary tool to conventional surveillance methods with potential for significant resource and labour savings when used for public health monitoring. Using WBE datasets to train machine learning algorithms and develop predictive models may also facilitate early warnings for the spread of outbreaks. The challenges associated with using machine learning for the analysis of WBE datasets and timeseries forecasting of COVID-19 were explored by running Random Forest (RF) algorithms on WBE datasets across 108 sites in five regions: Scotland, Catalonia, Ohio, the Netherlands, and Switzerland. This method uses measurements of SARS-CoV-2 RNA fragment concentration in samples taken at the inlets of wastewater treatment plants, providing insight into the prevalence of infection in upstream wastewater catchment populations. RF's forecasting performance at each site was quantitatively evaluated by determining mean absolute percentage error (MAPE) values, which was used to highlight challenges affecting future implementations of RF for WBE forecasting efforts. Performance was generally poor using WBE datasets from Catalonia, Scotland, and Ohio with 'reasonable' or better forecasts constituting 0 %, 5 %, and 0 % of these regions' forecasts, respectively. RF's performance was much stronger with WBE data from the Netherlands and Switzerland, which provided 55 % and 45 % 'reasonable' or better forecasts respectively. Sampling frequency and training set size were identified as key factors contributing to accuracy, while inclusion of too many unnecessary variables (or e.g., flow data) was identified as a contributing factor to poor performance. The contribution of catchment population on forecast accuracy was more ambiguous. This study determined that the factors governing RF's forecast performance are complicated and interrelated, which presents challenges for further work in this space. A sufficiently accurate further iteration of the tool discussed within this study would provide significant but varying value for public health departments for monitoring future, or ongoing outbreaks, assisting the implementation of on-time health response measures.
Collapse
Affiliation(s)
- Liam Vaughan
- Chemical Engineering Department, Faculty of Engineering and Information Technology, The University of Melbourne, Melbourne, Australia; Water Research Australia, Melbourne Based Team, Melbourne, Australia
| | - Muyang Zhang
- Chemical Engineering Department, Faculty of Engineering and Information Technology, The University of Melbourne, Melbourne, Australia
| | - Haoran Gu
- Chemical Engineering Department, Faculty of Engineering and Information Technology, The University of Melbourne, Melbourne, Australia
| | - Joan B Rose
- Department of Plant, Soil and Microbial Sciences, and Department of Fisheries and Wildlife, Michigan State University, East Lansing, United States of America
| | - Colleen C Naughton
- Civil and Environmental Engineering, University of California Merced, Merced, United States of America
| | - Gertjan Medema
- KWR Water Research Institute, Nieuwegein, the Netherlands
| | | | - Anne Roiko
- School of Pharmacy and Medical Sciences, and Cities Research Institute, Griffith University, Gold Coast, Australia
| | - Linda Blackall
- School of BioSciences, The University of Melbourne, Melbourne, Australia
| | - Arash Zamyadi
- Chemical Engineering Department, Faculty of Engineering and Information Technology, The University of Melbourne, Melbourne, Australia; Water Research Australia, Melbourne Based Team, Melbourne, Australia.
| |
Collapse
|
28
|
Remote sensing inversion and prediction of land use land cover in the middle reaches of the Yangtze River basin, China. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:46306-46320. [PMID: 36720789 DOI: 10.1007/s11356-023-25424-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Accepted: 01/16/2023] [Indexed: 02/02/2023]
Abstract
Land use and land cover (LULC) changes are dynamic and have been extensively studied; the change in LULC has become a crucial factor in decision making for planners and conservationists owing to its impact on natural ecosystems. Deriving accurate LULC data and analyzing their changes are important for assessing the energy balance, carbon balance, and hydrological cycle in a region. Therefore, we investigated the best classification method from the four methods and analyzed the change in LULC in the middle Yangtze River basin (MYRB) from 2001 to 2020 using the Google Earth Engine (GEE). The results suggest that (1) GEE platform enables to rapidly acquire and process remote sensing images for deriving LULC, and the random forest (RF) algorithm was able to calculate the highest overall accuracy and kappa coefficient (KC) of 87.7% and 0.84, respectively; (2) forestland occupied the largest area from 2001 to 2020, followed by water bodies and buildings. During the study period, there was a significant change in area occupied by both water bodies (overall increase of 46.2%) and buildings (decrease of 14.3% from 2001 to 2005); and (3) the simulation of LULC in the MYRB area was based on the primary drivers in the area, of which elevation changes had the largest effect on LULC changes. The patch generated land use simulation model (PLUS) was used to produce the simulation, with an overall accuracy and KC of 89.6% and 0.82, respectively. This study not only was useful for understanding the spatial and temporal characteristics of LULC in the MYRB, but also offered the basis for the simulation of ecological quality in this region.
Collapse
|
29
|
Chawla P, Rana SB, Kaur H, Singh K, Yuvaraj R, Murugappan M. A decision support system for automated diagnosis of Parkinson’s disease from EEG using FAWT and entropy features. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
30
|
Walls FN, McGarvey DJ. Building a macrosystems ecology framework to identify links between environmental and human health: A random forest modelling approach. PEOPLE AND NATURE 2022. [DOI: 10.1002/pan3.10427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
|
31
|
Barker J, Li X, Khavandi S, Koeckerling D, Mavilakandy A, Pepper C, Bountziouka V, Chen L, Kotb A, Antoun I, Mansir J, Smith-Byrne K, Schlindwein FS, Dhutia H, Tyukin I, Nicolson WB, Ng GA. Machine learning in sudden cardiac death risk prediction: a systematic review. Europace 2022; 24:1777-1787. [PMID: 36201237 DOI: 10.1093/europace/euac135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 07/07/2022] [Indexed: 11/23/2022] Open
Abstract
AIMS Most patients who receive implantable cardioverter defibrillators (ICDs) for primary prevention do not receive therapy during the lifespan of the ICD, whilst up to 50% of sudden cardiac death (SCD) occur in individuals who are considered low risk by conventional criteria. Machine learning offers a novel approach to risk stratification for ICD assignment. METHODS AND RESULTS Systematic search was performed in MEDLINE, Embase, Emcare, CINAHL, Cochrane Library, OpenGrey, MedrXiv, arXiv, Scopus, and Web of Science. Studies modelling SCD risk prediction within days to years using machine learning were eligible for inclusion. Transparency and quality of reporting (TRIPOD) and risk of bias (PROBAST) were assessed. A total of 4356 studies were screened with 11 meeting the inclusion criteria with heterogeneous populations, methods, and outcome measures preventing meta-analysis. The study size ranged from 122 to 124 097 participants. Input data sources included demographic, clinical, electrocardiogram, electrophysiological, imaging, and genetic data ranging from 4 to 72 variables per model. The most common outcome metric reported was the area under the receiver operator characteristic (n = 7) ranging between 0.71 and 0.96. In six studies comparing machine learning models and regression, machine learning improved performance in five. No studies adhered to a reporting standard. Five of the papers were at high risk of bias. CONCLUSION Machine learning for SCD prediction has been under-applied and incorrectly implemented but is ripe for future investigation. It may have some incremental utility in predicting SCD over traditional models. The development of reporting standards for machine learning is required to improve the quality of evidence reporting in the field.
Collapse
Affiliation(s)
- Joseph Barker
- Department of Cardiovascular Sciences, University of Leicester, Leicester, UK
- Cardiology Department, Glenfield Hospital, University Hospitals Leicester, Leicester, UK
| | - Xin Li
- Department of Cardiovascular Sciences, University of Leicester, Leicester, UK
- School of Engineering, University of Leicester, Leicester, UK
| | - Sarah Khavandi
- Faculty of Medicine, Imperial College School of Medicine, Imperial College London, London, UK
| | - David Koeckerling
- Division of Angiology, Swiss Cardiovascular Center, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Akash Mavilakandy
- Department of Cardiovascular Sciences, University of Leicester, Leicester, UK
| | - Coral Pepper
- Library and Information Service, University Hospitals of Leicester NHS Trust, Leicester, UK
| | | | - Long Chen
- School of Computing and Mathematical Sciences, University of Leicester, Leicester, UK
| | - Ahmed Kotb
- Department of Cardiovascular Sciences, University of Leicester, Leicester, UK
- Cardiology Department, Glenfield Hospital, University Hospitals Leicester, Leicester, UK
| | - Ibrahim Antoun
- Department of Cardiovascular Sciences, University of Leicester, Leicester, UK
| | | | - Karl Smith-Byrne
- Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Fernando S Schlindwein
- Department of Cardiovascular Sciences, University of Leicester, Leicester, UK
- School of Engineering, University of Leicester, Leicester, UK
| | - Harshil Dhutia
- Department of Cardiovascular Sciences, University of Leicester, Leicester, UK
- Cardiology Department, Glenfield Hospital, University Hospitals Leicester, Leicester, UK
| | - Ivan Tyukin
- Department of Mathematics, University of Leicester, Leicester, UK
| | - William B Nicolson
- Department of Cardiovascular Sciences, University of Leicester, Leicester, UK
- Cardiology Department, Glenfield Hospital, University Hospitals Leicester, Leicester, UK
| | - G Andre Ng
- Department of Cardiovascular Sciences, University of Leicester, Leicester, UK
- Cardiology Department, Glenfield Hospital, University Hospitals Leicester, Leicester, UK
- Cardiovascular Theme, National Institute for Health Research, Leicester Biomedical Research Centre, Leicester, UK
| |
Collapse
|
32
|
Phan TTH, Nguyen-Doan D, Nguyen-Huu D, Nguyen-Van H, Pham-Hong T. Investigation on new Mel frequency cepstral coefficients features and hyper-parameters tuning technique for bee sound recognition. Soft comput 2022. [DOI: 10.1007/s00500-022-07596-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
33
|
Moridian P, Ghassemi N, Jafari M, Salloum-Asfar S, Sadeghi D, Khodatars M, Shoeibi A, Khosravi A, Ling SH, Subasi A, Alizadehsani R, Gorriz JM, Abdulla SA, Acharya UR. Automatic autism spectrum disorder detection using artificial intelligence methods with MRI neuroimaging: A review. Front Mol Neurosci 2022; 15:999605. [PMID: 36267703 PMCID: PMC9577321 DOI: 10.3389/fnmol.2022.999605] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 08/09/2022] [Indexed: 12/04/2022] Open
Abstract
Autism spectrum disorder (ASD) is a brain condition characterized by diverse signs and symptoms that appear in early childhood. ASD is also associated with communication deficits and repetitive behavior in affected individuals. Various ASD detection methods have been developed, including neuroimaging modalities and psychological tests. Among these methods, magnetic resonance imaging (MRI) imaging modalities are of paramount importance to physicians. Clinicians rely on MRI modalities to diagnose ASD accurately. The MRI modalities are non-invasive methods that include functional (fMRI) and structural (sMRI) neuroimaging methods. However, diagnosing ASD with fMRI and sMRI for specialists is often laborious and time-consuming; therefore, several computer-aided design systems (CADS) based on artificial intelligence (AI) have been developed to assist specialist physicians. Conventional machine learning (ML) and deep learning (DL) are the most popular schemes of AI used for diagnosing ASD. This study aims to review the automated detection of ASD using AI. We review several CADS that have been developed using ML techniques for the automated diagnosis of ASD using MRI modalities. There has been very limited work on the use of DL techniques to develop automated diagnostic models for ASD. A summary of the studies developed using DL is provided in the Supplementary Appendix. Then, the challenges encountered during the automated diagnosis of ASD using MRI and AI techniques are described in detail. Additionally, a graphical comparison of studies using ML and DL to diagnose ASD automatically is discussed. We suggest future approaches to detecting ASDs using AI techniques and MRI neuroimaging.
Collapse
Affiliation(s)
- Parisa Moridian
- Faculty of Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Navid Ghassemi
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Mahboobeh Jafari
- Faculty of Electrical and Computer Engineering, Semnan University, Semnan, Iran
| | - Salam Salloum-Asfar
- Neurological Disorders Research Center, Qatar Biomedical Research Institute, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Delaram Sadeghi
- Department of Medical Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran
| | - Marjane Khodatars
- Department of Medical Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran
| | - Afshin Shoeibi
- Data Science and Computational Intelligence Institute, University of Granada, Granada, Spain
| | - Abbas Khosravi
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Geelong, VIC, Australia
| | - Sai Ho Ling
- Faculty of Engineering and IT, University of Technology Sydney (UTS), Ultimo, NSW, Australia
| | - Abdulhamit Subasi
- Faculty of Medicine, Institute of Biomedicine, University of Turku, Turku, Finland
- Department of Computer Science, College of Engineering, Effat University, Jeddah, Saudi Arabia
| | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Geelong, VIC, Australia
| | - Juan M. Gorriz
- Data Science and Computational Intelligence Institute, University of Granada, Granada, Spain
| | - Sara A. Abdulla
- Neurological Disorders Research Center, Qatar Biomedical Research Institute, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - U. Rajendra Acharya
- Ngee Ann Polytechnic, Singapore, Singapore
- Department of Biomedical Informatics and Medical Engineering, Asia University, Taichung, Taiwan
- Department of Biomedical Engineering, School of Science and Technology, Singapore University of Social Sciences, Singapore, Singapore
| |
Collapse
|
34
|
Pantavou K, Delibasis KK, Nikolopoulos GK. Machine learning and features for the prediction of thermal sensation and comfort using data from field surveys in Cyprus. INTERNATIONAL JOURNAL OF BIOMETEOROLOGY 2022; 66:1973-1984. [PMID: 35895145 DOI: 10.1007/s00484-022-02333-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 07/13/2022] [Indexed: 06/15/2023]
Abstract
Perception can influence individuals' behaviour and attitude affecting responses and compliance to precautionary measures. This study aims to investigate the performance of methods for thermal sensation and comfort prediction. Four machine learning algorithms (MLA), artificial neural networks, random forest (RF), support vector machines, and linear discriminant analysis were examined and compared to the physiologically equivalent temperature (PET). Data were collected in field surveys conducted in outdoor sites in Cyprus. The seven- and nine-point assessment scales of thermal sensation and a two-point scale of thermal comfort were considered. The models of MLA included meteorological and physiological features. The results indicate RF as the best MLA applied to the data. All MLA outperformed PET. For thermal sensation, the lowest prediction error (1.32 points) and the highest accuracy (30%) were found in the seven-point scale for the feature vector consisting of air temperature, relative humidity, wind speed, grey globe temperature, clothing insulation, activity, age, sex, and body mass index. The accuracy increased to 63.8% when considering prediction with at most one-point difference from the correct thermal sensation category. The best performed feature vector for thermal sensation also produced one of the best models for thermal comfort yielding an accuracy of 71% and an F-score of 0.81.
Collapse
Affiliation(s)
- Katerina Pantavou
- Medical School, University of Cyprus, P.O.Box 20537, 1678, Nicosia, Cyprus.
| | - Konstantinos K Delibasis
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Papasiopoulou 2-4, 35131, Lamia, Greece
| | | |
Collapse
|
35
|
Saadatmand S, Salimifard K, Mohammadi R, Kuiper A, Marzban M, Farhadi A. Using machine learning in prediction of ICU admission, mortality, and length of stay in the early stage of admission of COVID-19 patients. ANNALS OF OPERATIONS RESEARCH 2022; 328:1-29. [PMID: 36196268 PMCID: PMC9521862 DOI: 10.1007/s10479-022-04984-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/06/2022] [Indexed: 05/19/2023]
Abstract
The recent COVID-19 pandemic has affected health systems across the world. Especially, Intensive Care Units (ICUs) have played a pivotal role in the treatment of critically-ill patients. At the same time however, the increasing number of admissions due to the vast prevalence of the virus have caused several problems for ICU wards such as overburdening of staff and shortages of medical resources. These issues might have affected the quality of healthcare services provided directly impacting a patient's survival. The objective of this research is to leverage Machine Learning (ML) on hospital data in order to support hospital managers and practitioners with the treatment of COVID-19 patients. This is accomplished by providing more detailed inference about a patient's likelihood of ICU admission, mortality and in case of hospitalization the length of stay (LOS). In this pursuit, the outcome variables are in three separate models predicted by five different ML algorithms: eXtreme Gradient Boosting (XGB), K-Nearest Neighbor (KNN), Random Forest (RF), bagged-CART (b-CART), and LogitBoost (LB). With the exception of KNN, the studied models show good predictive capabilities when evaluating relevant accuracy scores, such as area under the curve. By implementing an ensemble stacking approach (either a Neural Net or a General Linear Model) on top of the aforementioned ML algorithms the performance is further boosted. Ultimately, for the prediction of admission to the ICU, the ensemble stacking via a Neural Net achieved the best result with an accuracy of over 95%. For mortality at the ICU, the vanilla XGB performed slightly better (1% difference with the meta-model). To predict large length of stays both ensemble stacking approaches yield comparable results. Besides it direct implications for managing COVID-19 patients, the approach presented serves as an example how data can be employed in future pandemics or crises.
Collapse
Affiliation(s)
- Sara Saadatmand
- Computational Intelligence and Intelligent Optimization Research Group, Persian Gulf University, Bushehr, 75169 Iran
| | - Khodakaram Salimifard
- Computational Intelligence and Intelligent Optimization Research Group, Persian Gulf University, Bushehr, 75169 Iran
| | - Reza Mohammadi
- Section Business Analytics, Amsterdam Business School, University of Amsterdam, Amsterdam, The Netherlands
| | - Alex Kuiper
- Section Business Analytics, Amsterdam Business School, University of Amsterdam, Amsterdam, The Netherlands
| | - Maryam Marzban
- Department of Public Health, School of Public Health, Bushehr University of Medical Science, Bushehr, Iran
| | - Akram Farhadi
- The Persian Gulf Tropical Medicine Research Center, The Persian Gulf Biomedical Science Research Institute, Bushehr University of Medical Science, Bushehr, Iran
| |
Collapse
|
36
|
Dental Caries Risk Assessment in Children 5 Years Old and under via Machine Learning. Dent J (Basel) 2022; 10:dj10090164. [PMID: 36135159 PMCID: PMC9497737 DOI: 10.3390/dj10090164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 08/21/2022] [Accepted: 08/29/2022] [Indexed: 11/16/2022] Open
Abstract
Background: Dental caries is a prevalent, complex, chronic illness that is avoidable. Better dental health outcomes are achieved as a result of accurate and early caries risk prediction in children, which also helps to avoid additional expenses and repercussions. In recent years, artificial intelligence (AI) has been employed in the medical field to aid in the diagnosis and treatment of medical diseases. This technology is a critical tool for the early prediction of the risk of developing caries. Aim: Through the development of computational models and the use of machine learning classification techniques, we investigated the potential for dental caries factors and lifestyle among children under the age of five. Design: A total of 780 parents and their children under the age of five made up the sample. To build a classification model with high accuracy to predict caries risk in 0–5-year-old children, ten different machine learning modelling techniques (DT, XGBoost, KNN, LR, MLP, RF, SVM (linear, rbf, poly, sigmoid)) and two assessment methods (Leave-One-Out and K-fold) were utilised. The best classification model for caries risk prediction was chosen by analysing each classification model’s accuracy, specificity, and sensitivity. Results: Machine learning helped with the creation of computer algorithms that could take a variety of parameters into account, as well as the identification of risk factors for childhood caries. The performance of the classifier is almost unbiased, making it generalizable. Among all applied machine learning algorithms, Multilayer Perceptron and Random Forest had the best accuracy, with 97.4%. Support Vector Machine with RBF Kernel (with an accuracy of 97.4%) was better than Extreme Gradient Boosting (with 94.9% accuracy). Conclusion: The outcomes of this study show the potential of regular screening of children for caries risk by experts and finding the risk scores of dental caries for any individual. Therefore, in order to avoid dental caries, it is possible to concentrate on each individual by utilizing machine learning modelling.
Collapse
|
37
|
Lao C, Lane J, Suominen H. Analyzing Suicide Risk From Linguistic Features in Social Media: Evaluation Study. JMIR Form Res 2022; 6:e35563. [PMID: 36040781 PMCID: PMC9472054 DOI: 10.2196/35563] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 06/28/2022] [Accepted: 07/21/2022] [Indexed: 11/13/2022] Open
Abstract
Background Effective suicide risk assessments and interventions are vital for suicide prevention. Although assessing such risks is best done by health care professionals, people experiencing suicidal ideation may not seek help. Hence, machine learning (ML) and computational linguistics can provide analytical tools for understanding and analyzing risks. This, therefore, facilitates suicide intervention and prevention. Objective This study aims to explore, using statistical analyses and ML, whether computerized language analysis could be applied to assess and better understand a person’s suicide risk on social media. Methods We used the University of Maryland Suicidality Dataset comprising text posts written by users (N=866) of mental health–related forums on Reddit. Each user was classified with a suicide risk rating (no, low, moderate, or severe) by either medical experts or crowdsourced annotators, denoting their estimated likelihood of dying by suicide. In language analysis, the Linguistic Inquiry and Word Count lexicon assessed sentiment, thinking styles, and part of speech, whereas readability was explored using the TextStat library. The Mann-Whitney U test identified differences between at-risk (low, moderate, and severe risk) and no-risk users. Meanwhile, the Kruskal-Wallis test and Spearman correlation coefficient were used for granular analysis between risk levels and to identify redundancy, respectively. In the ML experiments, gradient boost, random forest, and support vector machine models were trained using 10-fold cross validation. The area under the receiver operator curve and F1-score were the primary measures. Finally, permutation importance uncovered the features that contributed the most to each model’s decision-making. Results Statistically significant differences (P<.05) were identified between the at-risk (671/866, 77.5%) and no-risk groups (195/866, 22.5%). This was true for both the crowd- and expert-annotated samples. Overall, at-risk users had higher median values for most variables (authenticity, first-person pronouns, and negation), with a notable exception of clout, which indicated that at-risk users were less likely to engage in social posturing. A high positive correlation (ρ>0.84) was present between the part of speech variables, which implied redundancy and demonstrated the utility of aggregate features. All ML models performed similarly in their area under the curve (0.66-0.68); however, the random forest and gradient boost models were noticeably better in their F1-score (0.65 and 0.62) than the support vector machine (0.52). The features that contributed the most to the ML models were authenticity, clout, and negative emotions. Conclusions In summary, our statistical analyses found linguistic features associated with suicide risk, such as social posturing (eg, authenticity and clout), first-person singular pronouns, and negation. This increased our understanding of the behavioral and thought patterns of social media users and provided insights into the mechanisms behind ML models. We also demonstrated the applicative potential of ML in assisting health care professionals to assess and manage individuals experiencing suicide risk.
Collapse
Affiliation(s)
- Cecilia Lao
- School of Computing, College of Engineering and Computer Science, The Australian National University, Canberra, ACT, Australia
| | - Jo Lane
- National Centre for Epidemiology and Population Health, College of Health and Medicine, The Australian National University, Canberra, ACT, Australia
| | - Hanna Suominen
- School of Computing, College of Engineering and Computer Science, The Australian National University, Canberra, ACT, Australia
- Department of Computing, Faculty of Technology, University of Turku, Turku, Finland
| |
Collapse
|
38
|
Seal S, Carreras-Puigvert J, Trapotsi MA, Yang H, Spjuth O, Bender A. Integrating cell morphology with gene expression and chemical structure to aid mitochondrial toxicity detection. Commun Biol 2022; 5:858. [PMID: 35999457 PMCID: PMC9399120 DOI: 10.1038/s42003-022-03763-5] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 07/25/2022] [Indexed: 12/05/2022] Open
Abstract
Mitochondrial toxicity is an important safety endpoint in drug discovery. Models based solely on chemical structure for predicting mitochondrial toxicity are currently limited in accuracy and applicability domain to the chemical space of the training compounds. In this work, we aimed to utilize both -omics and chemical data to push beyond the state-of-the-art. We combined Cell Painting and Gene Expression data with chemical structural information from Morgan fingerprints for 382 chemical perturbants tested in the Tox21 mitochondrial membrane depolarization assay. We observed that mitochondrial toxicants differ from non-toxic compounds in morphological space and identified compound clusters having similar mechanisms of mitochondrial toxicity, thereby indicating that morphological space provides biological insights related to mechanisms of action of this endpoint. We further showed that models combining Cell Painting, Gene Expression features and Morgan fingerprints improved model performance on an external test set of 244 compounds by 60% (in terms of F1 score) and improved extrapolation to new chemical space. The performance of our combined models was comparable with dedicated in vitro assays for mitochondrial toxicity. Our results suggest that combining chemical descriptors with biological readouts enhances the detection of mitochondrial toxicants, with practical implications in drug discovery.
Collapse
Affiliation(s)
- Srijit Seal
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Rd, Cambridge, CB2 1EW, UK
| | - Jordi Carreras-Puigvert
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, SE-75124, Uppsala, Sweden
| | - Maria-Anna Trapotsi
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Rd, Cambridge, CB2 1EW, UK
| | - Hongbin Yang
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Rd, Cambridge, CB2 1EW, UK
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, SE-75124, Uppsala, Sweden.
| | - Andreas Bender
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Rd, Cambridge, CB2 1EW, UK.
| |
Collapse
|
39
|
Kuo RJ, Chen HJ, Kuo YH. The development of an eye movement-based deep learning system for laparoscopic surgical skills assessment. Sci Rep 2022; 12:11036. [PMID: 35970911 PMCID: PMC9378740 DOI: 10.1038/s41598-022-15053-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 06/17/2022] [Indexed: 11/23/2022] Open
Abstract
The development of valid, reliable, and objective methods of skills assessment is central to modern surgical training. Numerous rating scales have been developed and validated for quantifying surgical performance. However, many of these scoring systems are potentially flawed in their design in terms of reliability. Eye-tracking techniques, which provide a more objective investigation of the visual-cognitive aspects of the decision-making process, recently have been utilized in surgery domains for skill assessment and training, and their use has been focused on investigating differences between expert and novice surgeons to understand task performance, identify experienced surgeons, and establish training approaches. Ten graduate students at the National Taiwan University of Science and Technology with no prior laparoscopic surgical skills were recruited to perform the FLS peg transfer task. Then k-means clustering algorithm was used to split 500 trials into three dissimilar clusters, grouped as novice, intermediate, and expert levels, by an objective performance assessment parameter incorporating task duration with error score. Two types of data sets, namely, time series data extracted from coordinates of eye fixation and image data from videos, were used to implement and test our proposed skill level detection system with ensemble learning and a CNN algorithm. Results indicated that ensemble learning and the CNN were able to correctly classify skill levels with accuracies of 76.0% and 81.2%, respectively. Furthermore, the incorporation of coordinates of eye fixation and image data allowed the discrimination of skill levels with a classification accuracy of 82.5%. We examined more levels of training experience and further integrated an eye tracking technique and deep learning algorithms to develop a tool for objective assessment of laparoscopic surgical skill. With a relatively unbalanced sample, our results have demonstrated that the approach combining the features of visual fixation coordinates and images achieved a very promising level of performance for classifying skill levels of trainees.
Collapse
Affiliation(s)
- R J Kuo
- Department of Industrial Management, National Taiwan University of Science and Technology, Taipei, Taiwan
| | - Hung-Jen Chen
- Department of Data Science, Soochow University, No. 70, Linhsi Road, Shihlin District, Taipei City, 111, Taiwan.
- Department of Marketing and Distribution Management, National Kaohsiung University of Science and Technology, No.1, University Road, Yanchao District, Kaohsiung City, 82445, Taiwan.
| | - Yi-Hung Kuo
- Department of New Product Introduction, Solid State Storage Technology Corporation, Hsinchu City, Taiwan
| |
Collapse
|
40
|
Weng P, Wei K, Chen T, Chen M, Liu G. Fuzzy Approximate Entropy of Extrema Based on Multiple Moving Averages as a Novel Approach in Obstructive Sleep Apnea Screening. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2022; 10:4901211. [PMID: 36247084 PMCID: PMC9564195 DOI: 10.1109/jtehm.2022.3197084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 06/17/2022] [Accepted: 07/25/2022] [Indexed: 11/29/2022]
Abstract
OBJECTIVE Obstructive sleep apnea (OSA) is a respiratory disease associated with autonomic nervous system dysfunction. As a novel method for analyzing OSA depending on heart rate variability, fuzzy approximate entropy of extrema based on multiple moving averages (Emma-fApEn) can effectively assess the sympathetic tension limits, thereby realizing a good performance in the disease severity screening. METHOD Sixty 6-h electrocardiogram recordings (20 healthy, 16 mild/moderate OSA and 34 severe OSA) from the PhysioNet database were used in this study. The performances of minima of Emma-fApEn (fApEn-minima), maxima of Emma-fApEn (fApEn-maxima) and classic time-frequency domain indices for each recording were assessed by significance analysis, correlation analysis, parameter optimization and OSA screening. RESULTS fApEn-minima and fApEn-maxima had significant differences between the severe OSA group and the other two groups, while the mean value (Mean) and the ratio of low-frequency power and high-frequency power (LH) could significantly differentiate OSA recordings from healthy recordings. The correlation coefficient between fApEn-minima and apnea-hypopnea index was the highest (|R| = 0.705). Machine learning methods were used to evaluate the performances of the above four indices. Random forest (RF) achieved the highest accuracy of 96.67% in OSA screening and 91.67% in severe OSA screening, with a good balance in both. CONCLUSION Emma-fApEn may be used as a simple preliminary detection tool to assess the severity of OSA prior to polysomnography analysis.
Collapse
Affiliation(s)
- Peiyu Weng
- Key Laboratory of Sensing Technology and Biomedical Instrument of Guangdong Province, School of Biomedical EngineeringSun Yat-sen UniversityGuangzhou510006China
| | - Keming Wei
- Key Laboratory of Sensing Technology and Biomedical Instrument of Guangdong Province, School of Biomedical EngineeringSun Yat-sen UniversityGuangzhou510006China
| | - Tian Chen
- Key Laboratory of Sensing Technology and Biomedical Instrument of Guangdong Province, School of Biomedical EngineeringSun Yat-sen UniversityGuangzhou510006China
| | - Mingjing Chen
- Key Laboratory of Sensing Technology and Biomedical Instrument of Guangdong Province, School of Biomedical EngineeringSun Yat-sen UniversityGuangzhou510006China
| | - Guanzheng Liu
- Key Laboratory of Sensing Technology and Biomedical Instrument of Guangdong Province, School of Biomedical EngineeringSun Yat-sen UniversityGuangzhou510006China
| |
Collapse
|
41
|
Taher J, Hakala T, Jaakkola A, Hyyti H, Kukko A, Manninen P, Maanpää J, Hyyppä J. Feasibility of Hyperspectral Single Photon Lidar for Robust Autonomous Vehicle Perception. SENSORS (BASEL, SWITZERLAND) 2022; 22:5759. [PMID: 35957316 PMCID: PMC9371088 DOI: 10.3390/s22155759] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 07/26/2022] [Accepted: 07/26/2022] [Indexed: 06/15/2023]
Abstract
Autonomous vehicle perception systems typically rely on single-wavelength lidar sensors to obtain three-dimensional information about the road environment. In contrast to cameras, lidars are unaffected by challenging illumination conditions, such as low light during night-time and various bidirectional effects changing the return reflectance. However, as many commercial lidars operate on a monochromatic basis, the ability to distinguish objects based on material spectral properties is limited. In this work, we describe the prototype hardware for a hyperspectral single photon lidar and demonstrate the feasibility of its use in an autonomous-driving-related object classification task. We also introduce a simple statistical model for estimating the reflectance measurement accuracy of single photon sensitive lidar devices. The single photon receiver frame was used to receive 30 12.3 nm spectral channels in the spectral band 1200-1570 nm, with a maximum channel-wise intensity of 32 photons. A varying number of frames were used to accumulate the signal photon count. Multiple objects covering 10 different categories of road environment, such as car, dry asphalt, gravel road, snowy asphalt, wet asphalt, wall, granite, grass, moss, and spruce tree, were included in the experiments. We test the influence of the number of spectral channels and the number of frames on the classification accuracy with random forest classifier and find that the spectral information increases the classification accuracy in the high-photon flux regime from 50% to 94% with 2 channels and 30 channels, respectively. In the low-photon flux regime, the classification accuracy increases from 30% to 38% with 2 channels and 6 channels, respectively. Additionally, we visualize the data with the t-SNE algorithm and show that the photon shot noise in the single photon sensitive hyperspectral data contributes the most to the separability of material specific spectral signatures. The results of this study provide support for the use of hyperspectral single photon lidar data on more advanced object detection and classification methods, and motivates the development of advanced single photon sensitive hyperspectral lidar devices for use in autonomous vehicles and in robotics.
Collapse
Affiliation(s)
- Josef Taher
- Department of Remote Sensing and Photogrammetry, Finnish Geospatial Research Institute FGI, National Land Survey of Finland, 02150 Espoo, Finland; (T.H.); (A.J.); (H.H.); (A.K.); (P.M.); (J.M.); (J.H.)
- Department of Computer Science, Aalto University School of Science, 02150 Espoo, Finland
| | - Teemu Hakala
- Department of Remote Sensing and Photogrammetry, Finnish Geospatial Research Institute FGI, National Land Survey of Finland, 02150 Espoo, Finland; (T.H.); (A.J.); (H.H.); (A.K.); (P.M.); (J.M.); (J.H.)
| | - Anttoni Jaakkola
- Department of Remote Sensing and Photogrammetry, Finnish Geospatial Research Institute FGI, National Land Survey of Finland, 02150 Espoo, Finland; (T.H.); (A.J.); (H.H.); (A.K.); (P.M.); (J.M.); (J.H.)
| | - Heikki Hyyti
- Department of Remote Sensing and Photogrammetry, Finnish Geospatial Research Institute FGI, National Land Survey of Finland, 02150 Espoo, Finland; (T.H.); (A.J.); (H.H.); (A.K.); (P.M.); (J.M.); (J.H.)
| | - Antero Kukko
- Department of Remote Sensing and Photogrammetry, Finnish Geospatial Research Institute FGI, National Land Survey of Finland, 02150 Espoo, Finland; (T.H.); (A.J.); (H.H.); (A.K.); (P.M.); (J.M.); (J.H.)
| | - Petri Manninen
- Department of Remote Sensing and Photogrammetry, Finnish Geospatial Research Institute FGI, National Land Survey of Finland, 02150 Espoo, Finland; (T.H.); (A.J.); (H.H.); (A.K.); (P.M.); (J.M.); (J.H.)
| | - Jyri Maanpää
- Department of Remote Sensing and Photogrammetry, Finnish Geospatial Research Institute FGI, National Land Survey of Finland, 02150 Espoo, Finland; (T.H.); (A.J.); (H.H.); (A.K.); (P.M.); (J.M.); (J.H.)
- Department of Computer Science, Aalto University School of Science, 02150 Espoo, Finland
| | - Juha Hyyppä
- Department of Remote Sensing and Photogrammetry, Finnish Geospatial Research Institute FGI, National Land Survey of Finland, 02150 Espoo, Finland; (T.H.); (A.J.); (H.H.); (A.K.); (P.M.); (J.M.); (J.H.)
| |
Collapse
|
42
|
Yu Y, Aitken SN, Rieseberg LH, Wang T. Using landscape genomics to delineate seed and breeding zones for lodgepole pine. THE NEW PHYTOLOGIST 2022; 235:1653-1664. [PMID: 35569109 PMCID: PMC9545436 DOI: 10.1111/nph.18223] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 05/08/2022] [Indexed: 06/15/2023]
Abstract
Seed and breeding zones traditionally are delineated based on local adaptation of phenotypic traits associated with climate variables, an approach requiring long-term field experiments. In this study, we applied a landscape genomics approach to delineate seed and breeding zones for lodgepole pine. We used a gradient forest (GF) model to select environment-associated single nucleotide polymorphisms (SNPs) using three SNP datasets (full, neutral and candidate) and 20 climate variables for 1906 lodgepole pine (Pinus contorta) individuals in British Columbia and Alberta, Canada. The two GF models built with the full (28 954) and candidate (982) SNPs were compared. The GF models identified winter-related climate as major climatic factors driving genomic patterns of lodgepole pine's local adaptation. Based on the genomic gradients predicted by the full and candidate GF models, lodgepole pine distribution range in British Columbia and Alberta was delineated into six seed and breeding zones. Our approach is a novel and effective alternative to traditional common garden approaches for delineating seed and breeding zone, and could be applied to tree species lacking data from provenance trials or common garden experiments.
Collapse
Affiliation(s)
- Yue Yu
- Department of Forest Sciences, Centre for Forest Conservation GeneticsUniversity of British Columbia3041‐2424 Main MallVancouverBCV6T 1Z4Canada
| | - Sally N. Aitken
- Department of Forest Sciences, Centre for Forest Conservation GeneticsUniversity of British Columbia3041‐2424 Main MallVancouverBCV6T 1Z4Canada
| | - Loren H. Rieseberg
- Department of Botany and Biodiversity Research CentreUniversity of British Columbia6270 University BoulevardVancouverBCV6T 1Z4Canada
| | - Tongli Wang
- Department of Forest Sciences, Centre for Forest Conservation GeneticsUniversity of British Columbia3041‐2424 Main MallVancouverBCV6T 1Z4Canada
| |
Collapse
|
43
|
Johnson MD, Krynkin A, Dolcetti G, Alkmim M, Cuenca J, De Ryck L. Surface shape reconstruction from phaseless scattered acoustic data using a random forest algorithm. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1045. [PMID: 36050146 DOI: 10.1121/10.0013506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 07/25/2022] [Indexed: 06/15/2023]
Abstract
Recent studies have demonstrated that acoustic waves can be used to reconstruct the roughness profile of a rigid scattering surface. In particular, the use of multiple microphones placed above a rough surface as well as an analytical model based on the linearised Kirchhoff integral equations provides a sufficient base for the inversion algorithm to estimate surface geometrical properties. Prone to fail in the presence of high noise and measurement uncertainties, the analytical approach may not always be suitable in analysing measured scattered acoustic pressure. With the aim to improve the robustness of the surface reconstruction algorithms, here it is proposed to use a data-driven approach through the application of a random forest regression algorithm to reconstruct specific parameters of one-dimensional sinusoidal surfaces from airborne acoustic phase-removed pressure data. The data for the training set are synthetically generated through the application of the Kirchhoff integral in predicting scattered sound, and they are further verified with data produced from laboratory measurements. The surface parameters from the measurement sample were found to be recovered accurately for various receiver combinations and with a wide range of noise levels ranging from 0.1% to 30% of the average scattered acoustical pressure amplitude.
Collapse
Affiliation(s)
- Michael-David Johnson
- Department of Mechanical Engineering, University of Sheffield, Sheffield, United Kingdom
| | - Anton Krynkin
- Department of Mechanical Engineering, University of Sheffield, Sheffield, United Kingdom
| | - Giulio Dolcetti
- Department of Civil and Structural Engineering, University of Sheffield, Sheffield, United Kingdom
| | - Mansour Alkmim
- Siemens Digital Industries Software, Interleuvenlaan 68, B-3001 Leuven, Belgium
| | - Jacques Cuenca
- Siemens Digital Industries Software, Interleuvenlaan 68, B-3001 Leuven, Belgium
| | - Laurent De Ryck
- Siemens Digital Industries Software, Interleuvenlaan 68, B-3001 Leuven, Belgium
| |
Collapse
|
44
|
Distance- and Momentum-Based Symbolic Aggregate Approximation for Highly Imbalanced Classification. SENSORS 2022; 22:s22145095. [PMID: 35890775 PMCID: PMC9315809 DOI: 10.3390/s22145095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 06/30/2022] [Accepted: 07/04/2022] [Indexed: 11/17/2022]
Abstract
Time-series representation is the most important task in time-series analysis. One of the most widely employed time-series representation method is symbolic aggregate approximation (SAX), which converts the results from piecewise aggregate approximation to a symbol sequence. SAX is a simple and effective method; however, it only focuses on the mean value of each segment in the time-series. Here, we propose a novel time-series representation method—distance- and momentum-based symbolic aggregate approximation (DM-SAX)—that can secure time-series distributions by calculating the perpendicular distance from the time-axis to each data point and consider the time-series trend by adding a momentum factor reflecting the direction of previous data points. Experimental results for 29 highly imbalanced classification problems on the UCR datasets revealed that DM-SAX affords the optimal area under the curve (AUC) among competing time-series representation methods (SAX, extreme-SAX, overlap-SAX, and distance-based SAX). We statistically verified that performance improvements resulted in significant differences in the rankings. In addition, DM-SAX yielded the optimal AUC for real-world wire cutting and crimping process dataset. Meaningful data points such as outliers could be identified in a time-series outlier detection framework via the proposed method.
Collapse
|
45
|
Islam MS, Islam MT, Sarker S, Jame HA, Nishat SS, Jani MR, Rauf A, Ahsan S, Shorowordi KM, Efstathiadis H, Carbonara J, Ahmed S. Machine Learning Approach to Delineate the Impact of Material Properties on Solar Cell Device Physics. ACS OMEGA 2022; 7:22263-22278. [PMID: 35811908 PMCID: PMC9260917 DOI: 10.1021/acsomega.2c01076] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 06/03/2022] [Indexed: 06/15/2023]
Abstract
In this research, solar cell capacitance simulator-one-dimensional (SCAPS-1D) software was used to build and probe nontoxic Cs-based perovskite solar devices and investigate modulations of key material parameters on ultimate power conversion efficiency (PCE). The input material parameters of the absorber Cs-perovskite layer were incrementally changed, and with the various resulting combinations, 63,500 unique devices were formed and probed to produce device PCE. Versatile and well-established machine learning algorithms were thereafter utilized to train, test, and evaluate the output dataset with a focused goal to delineate and rank the input material parameters for their impact on ultimate device performance and PCE. The most impactful parameters were then tuned to showcase unique ranges that would ultimately lead to higher device PCE values. As a validation step, the predicted results were confirmed against SCAPS simulated results as well, highlighting high accuracy and low error metrics. Further optimization of intrinsic material parameters was conducted through modulation of absorber layer thickness, back contact metal, and bulk defect concentration, resulting in an improvement in the PCE of the device from 13.29 to 16.68%. Overall, the results from this investigation provide much-needed insight and guidance for researchers at large, and experimentalists in particular, toward fabricating commercially viable nontoxic inorganic perovskite alternatives for the burgeoning solar industry.
Collapse
Affiliation(s)
- Md. Shafiqul Islam
- Department
of Materials and Metallurgical Engineering (MME), Bangladesh University of Engineering and Technology (BUET), East Campus, Dhaka 1000, Bangladesh
| | - Md. Tohidul Islam
- Department
of Materials Design and Innovation, University
at Buffalo, Buffalo, New York 14260, United States
| | - Saugata Sarker
- Department
of Materials and Metallurgical Engineering (MME), Bangladesh University of Engineering and Technology (BUET), East Campus, Dhaka 1000, Bangladesh
| | - Hasan Al Jame
- Department
of Materials and Metallurgical Engineering (MME), Bangladesh University of Engineering and Technology (BUET), East Campus, Dhaka 1000, Bangladesh
| | - Sadiq Shahriyar Nishat
- Department
of Materials Science and Engineering (MSE), Rensselaer Polytechnic Institute, 110 8th street, Troy, New York 12180, United States
| | - Md. Rafsun Jani
- Department
of Materials and Metallurgical Engineering (MME), Bangladesh University of Engineering and Technology (BUET), East Campus, Dhaka 1000, Bangladesh
| | - Abrar Rauf
- Department
of Materials and Metallurgical Engineering (MME), Bangladesh University of Engineering and Technology (BUET), East Campus, Dhaka 1000, Bangladesh
| | - Sumaiyatul Ahsan
- Department
of Materials and Metallurgical Engineering (MME), Bangladesh University of Engineering and Technology (BUET), East Campus, Dhaka 1000, Bangladesh
| | - Kazi Md. Shorowordi
- Department
of Materials and Metallurgical Engineering (MME), Bangladesh University of Engineering and Technology (BUET), East Campus, Dhaka 1000, Bangladesh
| | - Harry Efstathiadis
- College
of Nanoscale Science and Nanoengineering, SUNY Polytechnic Institute, 257 Fuller Road, Albany, New York 12203, United
States
| | - Joaquin Carbonara
- Department
of Mathematics, SUNY − Buffalo State, 1300 Elmwood Avenue, Buffalo, New York 14222, United States
| | - Saquib Ahmed
- Department
of Mechanical Engineering Technology, SUNY
− Buffalo State, 1300 Elmwood Avenue, Buffalo, New York 14222, United
States
| |
Collapse
|
46
|
Clinical and Laboratory Approach to Diagnose COVID-19 Using Machine Learning. Interdiscip Sci 2022; 14:452-470. [PMID: 35133633 PMCID: PMC8846962 DOI: 10.1007/s12539-021-00499-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 12/17/2021] [Accepted: 12/23/2021] [Indexed: 12/18/2022]
Abstract
Coronavirus 2 (SARS-CoV-2), often known by the name COVID-19, is a type of acute respiratory syndrome that has had a significant influence on both economy and health infrastructure worldwide. This novel virus is diagnosed utilising a conventional method known as the RT-PCR (Reverse Transcription Polymerase Chain Reaction) test. This approach, however, produces a lot of false-negative and erroneous outcomes. According to recent studies, COVID-19 can also be diagnosed using X-rays, CT scans, blood tests and cough sounds. In this article, we use blood tests and machine learning to predict the diagnosis of this deadly virus. We also present an extensive review of various existing machine-learning applications that diagnose COVID-19 from clinical and laboratory markers. Four different classifiers along with a technique called Synthetic Minority Oversampling Technique (SMOTE) were used for classification. Shapley Additive Explanations (SHAP) method was utilized to calculate the gravity of each feature and it was found that eosinophils, monocytes, leukocytes and platelets were the most critical blood parameters that distinguished COVID-19 infection for our dataset. These classifiers can be utilized in conjunction with RT-PCR tests to improve sensitivity and in emergency situations such as a pandemic outbreak that might happen due to new strains of the virus. The positive results indicate the prospective use of an automated framework that could help clinicians and medical personnel diagnose and screen patients.
Collapse
|
47
|
Manzali Y, Akhiat Y, Chahhou M, Elmohajir M, Zinedine A. Reducing the number of trees in a forest using noisy features. EVOLVING SYSTEMS 2022. [DOI: 10.1007/s12530-022-09441-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
48
|
Clustering-based adaptive data augmentation for class-imbalance in machine learning (CADA): additive manufacturing use case. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07347-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractLarge amount of data are generated from in-situ monitoring of additive manufacturing (AM) processes which is later used in prediction modelling for defect classification to speed up quality inspection of products. A high volume of this process data is defect-free (majority class) and a lower volume of this data has defects (minority class) which result in the class-imbalance issue. Using imbalanced datasets, classifiers often provide sub-optimal classification results, i.e. better performance on the majority class than the minority class. However, it is important for process engineers that models classify defects more accurately than the class with no defects since this is crucial for quality inspection. Hence, we address the class-imbalance issue in manufacturing process data to support in-situ quality control of additive manufactured components. For this, we propose cluster-based adaptive data augmentation (CADA) for oversampling to address the class-imbalance problem. Quantitative experiments are conducted to evaluate the performance of the proposed method and to compare with other selected oversampling methods using AM datasets from an aerospace industry and a publicly available casting manufacturing dataset. The results show that CADA outperformed random oversampling and the SMOTE method and is similar to random data augmentation and cluster-based oversampling. Furthermore, the results of the statistical significance test show that there is a significant difference between the studied methods. As such, the CADA method can be considered as an alternative method for oversampling to improve the performance of models on the minority class.
Collapse
|
49
|
Seabed Sediment Classification Using Spatial Statistical Characteristics. JOURNAL OF MARINE SCIENCE AND ENGINEERING 2022. [DOI: 10.3390/jmse10050691] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Conventional sediment classification methods based on Multibeam Echo System (MBES) data have low accuracy since the correlation between features and sediment has not been fully considered. Moreover, their poor resistance to the residual error of MBES backscatter strength (BS) processing also degrades their performances. Toward these problems, we propose a seabed sediment classification method using spatial statistical features extracted from angular response curve (ARC), topography, and geomorphology. First, to reduce interference of noise and residual error of beam pattern correction, we propose a robust method combining the Generic Seafloor Acoustic Backscatter (GSAB) model and Huber loss function to estimate the parameters of ARC which is strongly correlated with seabed sediments. Second, a feature set is constructed by AR features composed of GSAB parameters, BS mosaic and its derivatives, and seabed topography and its derivatives to characterize seabed sediments. After that, feature selection and probability map acquisition are employed based on the random forest algorithm (RF). Finally, a denoising and final sediment map generation method is proposed and applied to probability maps to obtain the sediment map with reasonable sediment distribution and clear boundaries between classes. We implement experiments and achieve the classification accuracy of 93.3%, which verifies the validity of our method.
Collapse
|
50
|
Trends in using IoT with machine learning in smart health assessment. Int J Health Sci (Qassim) 2022. [DOI: 10.53730/ijhs.v6ns3.6404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The Internet of Things (IoT) provides a rich source of information that can be uncovered using machine learning (ML). The decision-making processes in several industries, such as education, security, business, and healthcare, have been aided by these hybrid technologies. For optimum prediction and recommendation systems, ML enhances the Internet of Things (IoT). Machines are already making medical records, diagnosing diseases, and monitoring patients using IoT and ML in the healthcare industry. Various datasets need different ML algorithms to perform well. It's possible that the total findings will be impacted if the predicted results are not consistent. In clinical decision-making, the variability of prediction outcomes is a major consideration. To effectively utilise IoT data in healthcare, it's critical to have a firm grasp of the various machine learning techniques in use. Algorithms for categorization and prediction that have been employed in the healthcare industry are highlighted in this article. As stated earlier, the purpose of this work is to provide readers with an in-depth look at current machine learning algorithms and how they apply to IoT medical data.
Collapse
|