1
|
Bifarin OO, Fernández FM. Automated Machine Learning and Explainable AI (AutoML-XAI) for Metabolomics: Improving Cancer Diagnostics. J Am Soc Mass Spectrom 2024. [PMID: 38690775 DOI: 10.1021/jasms.3c00403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2024]
Abstract
Metabolomics generates complex data necessitating advanced computational methods for generating biological insight. While machine learning (ML) is promising, the challenges of selecting the best algorithms and tuning hyperparameters, particularly for nonexperts, remain. Automated machine learning (AutoML) can streamline this process; however, the issue of interpretability could persist. This research introduces a unified pipeline that combines AutoML with explainable AI (XAI) techniques to optimize metabolomics analysis. We tested our approach on two data sets: renal cell carcinoma (RCC) urine metabolomics and ovarian cancer (OC) serum metabolomics. AutoML, using Auto-sklearn, surpassed standalone ML algorithms like SVM and k-Nearest Neighbors in differentiating between RCC and healthy controls, as well as OC patients and those with other gynecological cancers. The effectiveness of Auto-sklearn is highlighted by its AUC scores of 0.97 for RCC and 0.85 for OC, obtained from the unseen test sets. Importantly, on most of the metrics considered, Auto-sklearn demonstrated a better classification performance, leveraging a mix of algorithms and ensemble techniques. Shapley Additive Explanations (SHAP) provided a global ranking of feature importance, identifying dibutylamine and ganglioside GM(d34:1) as the top discriminative metabolites for RCC and OC, respectively. Waterfall plots offered local explanations by illustrating the influence of each metabolite on individual predictions. Dependence plots spotlighted metabolite interactions, such as the connection between hippuric acid and one of its derivatives in RCC, and between GM3(d34:1) and GM3(18:1_16:0) in OC, hinting at potential mechanistic relationships. Through decision plots, a detailed error analysis was conducted, contrasting feature importance for correctly versus incorrectly classified samples. In essence, our pipeline emphasizes the importance of harmonizing AutoML and XAI, facilitating both simplified ML application and improved interpretability in metabolomics data science.
Collapse
Affiliation(s)
- Olatomiwa O Bifarin
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Facundo M Fernández
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
- Petit Institute of Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| |
Collapse
|
2
|
Meng L, Zhu P, Xia K. Application value of the automated machine learning model based on modified CT index combined with serological indices in the early prediction of lung cancer. Front Public Health 2024; 12:1368217. [PMID: 38645446 PMCID: PMC11027066 DOI: 10.3389/fpubh.2024.1368217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 03/19/2024] [Indexed: 04/23/2024] Open
Abstract
Background and objective Accurately predicting the extent of lung tumor infiltration is crucial for improving patient survival and cure rates. This study aims to evaluate the application value of an improved CT index combined with serum biomarkers, obtained through an artificial intelligence recognition system analyzing CT features of pulmonary nodules, in early prediction of lung cancer infiltration using machine learning models. Patients and methods A retrospective analysis was conducted on clinical data of 803 patients hospitalized for lung cancer treatment from January 2020 to December 2023 at two hospitals: Hospital 1 (Affiliated Changshu Hospital of Soochow University) and Hospital 2 (Nantong Eighth People's Hospital). Data from Hospital 1 were used for internal training, while data from Hospital 2 were used for external validation. Five algorithms, including traditional logistic regression (LR) and machine learning techniques (generalized linear models [GLM], random forest [RF], gradient boosting machine [GBM], deep neural network [DL], and naive Bayes [NB]), were employed to construct models predicting early lung cancer infiltration and were analyzed. The models were comprehensively evaluated through receiver operating characteristic curve (AUC) analysis based on LR, calibration curves, decision curve analysis (DCA), as well as global and individual interpretative analyses using variable feature importance and SHapley additive explanations (SHAP) plots. Results A total of 560 patients were used for model development in the training dataset, while a dataset comprising 243 patients was used for external validation. The GBM model exhibited the best performance among the five algorithms, with AUCs of 0.931 and 0.99 in the validation and test sets, respectively, and accuracies of 0.857 and 0.955 in the validation and test groups, respectively, outperforming other models. Additionally, the study found that nodule diameter and average CT value were the most significant features for predicting lung cancer infiltration using machine learning models. Conclusion The GBM model established in this study can effectively predict the risk of infiltration in early-stage lung cancer patients, thereby improving the accuracy of lung cancer screening and facilitating timely intervention for infiltrative lung cancer patients by clinicians, leading to early diagnosis and treatment of lung cancer, and ultimately reducing lung cancer-related mortality.
Collapse
Affiliation(s)
- Leyuan Meng
- Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Nantong University, Medical School of Nantong University, Jiangsu, Nantong, China
| | - Ping Zhu
- Department of Scientific Research, The Changshu Affiliated Hospital of Soochow University, Jiangsu, Suzhou, China
- Changshu Key Laboratory of Medical Artificial Intelligence and Big Data, Jiangsu, Suzhou, China
| | - Kaijian Xia
- Department of Scientific Research, The Changshu Affiliated Hospital of Soochow University, Jiangsu, Suzhou, China
- Changshu Key Laboratory of Medical Artificial Intelligence and Big Data, Jiangsu, Suzhou, China
| |
Collapse
|
3
|
Zhang S, Chen D, Sun H, Kemp GJ, Chen Y, Tan Q, Yang Y, Gong Q, Yue Q. Whole brain morphologic features improve the predictive accuracy of IDH status and VEGF expression levels in gliomas. Cereb Cortex 2024; 34:bhae151. [PMID: 38642107 DOI: 10.1093/cercor/bhae151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 03/14/2024] [Accepted: 03/23/2024] [Indexed: 04/22/2024] Open
Abstract
Glioma is a systemic disease that can induce micro and macro alternations of whole brain. Isocitrate dehydrogenase and vascular endothelial growth factor are proven prognostic markers and antiangiogenic therapy targets in glioma. The aim of this study was to determine the ability of whole brain morphologic features and radiomics to predict isocitrate dehydrogenase status and vascular endothelial growth factor expression levels. This study recruited 80 glioma patients with isocitrate dehydrogenase wildtype and high vascular endothelial growth factor expression levels, and 102 patients with isocitrate dehydrogenase mutation and low vascular endothelial growth factor expression levels. Virtual brain grafting, combined with Freesurfer, was used to compute morphologic features including cortical thickness, LGI, and subcortical volume in glioma patient. Radiomics features were extracted from multiregional tumor. Pycaret was used to construct the machine learning pipeline. Among the radiomics models, the whole tumor model achieved the best performance (accuracy 0.80, Area Under the Curve 0.86), while, after incorporating whole brain morphologic features, the model had a superior predictive performance (accuracy 0.82, Area Under the Curve 0.88). The features contributed most in predicting model including the right caudate volume, left middle temporal cortical thickness, first-order statistics, shape, and gray-level cooccurrence matrix. Pycaret, based on morphologic features, combined with radiomics, yielded highest accuracy in predicting isocitrate dehydrogenase mutation and vascular endothelial growth factor levels, indicating that morphologic abnormalities induced by glioma were associated with tumor biology.
Collapse
Affiliation(s)
- Simin Zhang
- Huaxi MR Research Center (HMRRC), Department of Radiology, West China Hospital of Sichuan University, Chengdu, Sichuan 610041, China
- Research Unit of Psychoradiology, Chinese Academy of Medical Sciences, Chengdu, Sichuan 610041, China
| | - Di Chen
- Functional and Molecular Imaging Key Laboratory of Sichuan Province, West China Hospital of Sichuan University, Chengdu, Sichuan 610041, China
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, Sichuan 610041, China
| | - Huaiqiang Sun
- Huaxi MR Research Center (HMRRC), Department of Radiology, West China Hospital of Sichuan University, Chengdu, Sichuan 610041, China
| | - Graham J Kemp
- Liverpool Magnetic Resonance Imaging Centre (LiMRIC) and Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L69 7ZX, United Kingdom
| | - Yinying Chen
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, Sichuan 610041, China
| | - Qiaoyue Tan
- Division of Radiation Physics, State Key Laboratory of Biotherapy and Cancer Center, West China Hospital of Sichuan University, Chengdu, Sichuan 610041, China
| | - Yuan Yang
- Department of Neurosurgery, West China Hospital of Sichuan University, Chengdu, Sichuan 610041, China
- Huaxi Glioma Center, West China Hospital of Sichuan University, Chengdu, Sichuan 610041, China
| | - Qiyong Gong
- Department of Radiology, West China Xiamen Hospital of Sichuan University, Xiamen, Sichuan 610041, China
| | - Qiang Yue
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, Sichuan 610041, China
| |
Collapse
|
4
|
Bibi I, Schaffert D, Blauth M, Lull C, von Ahnen JA, Gross G, Weigandt WA, Knitza J, Kuhn S, Benecke J, Leipe J, Schmieder A, Olsavszky V. Automated Machine Learning Analysis of Patients With Chronic Skin Disease Using a Medical Smartphone App: Retrospective Study. J Med Internet Res 2023; 25:e50886. [PMID: 38015608 PMCID: PMC10716771 DOI: 10.2196/50886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 09/18/2023] [Accepted: 09/19/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Rapid digitalization in health care has led to the adoption of digital technologies; however, limited trust in internet-based health decisions and the need for technical personnel hinder the use of smartphones and machine learning applications. To address this, automated machine learning (AutoML) is a promising tool that can empower health care professionals to enhance the effectiveness of mobile health apps. OBJECTIVE We used AutoML to analyze data from clinical studies involving patients with chronic hand and/or foot eczema or psoriasis vulgaris who used a smartphone monitoring app. The analysis focused on itching, pain, Dermatology Life Quality Index (DLQI) development, and app use. METHODS After extensive data set preparation, which consisted of combining 3 primary data sets by extracting common features and by computing new features, a new pseudonymized secondary data set with a total of 368 patients was created. Next, multiple machine learning classification models were built during AutoML processing, with the most accurate models ultimately selected for further data set analysis. RESULTS Itching development for 6 months was accurately modeled using the light gradient boosted trees classifier model (log loss: 0.9302 for validation, 1.0193 for cross-validation, and 0.9167 for holdout). Pain development for 6 months was assessed using the random forest classifier model (log loss: 1.1799 for validation, 1.1561 for cross-validation, and 1.0976 for holdout). Then, the random forest classifier model (log loss: 1.3670 for validation, 1.4354 for cross-validation, and 1.3974 for holdout) was used again to estimate the DLQI development for 6 months. Finally, app use was analyzed using an elastic net blender model (area under the curve: 0.6567 for validation, 0.6207 for cross-validation, and 0.7232 for holdout). Influential feature correlations were identified, including BMI, age, disease activity, DLQI, and Hospital Anxiety and Depression Scale-Anxiety scores at follow-up. App use increased with BMI >35, was less common in patients aged >47 years and those aged 23 to 31 years, and was more common in those with higher disease activity. A Hospital Anxiety and Depression Scale-Anxiety score >8 had a slightly positive effect on app use. CONCLUSIONS This study provides valuable insights into the relationship between data characteristics and targeted outcomes in patients with chronic eczema or psoriasis, highlighting the potential of smartphone and AutoML techniques in improving chronic disease management and patient care.
Collapse
Affiliation(s)
- Igor Bibi
- Department of Dermatology, Venereology and Allergology, University Medical Center and Medical Faculty Mannheim, Center of Excellence in Dermatology, Heidelberg University, Mannheim, Germany
| | - Daniel Schaffert
- Department of Dermatology, Venereology and Allergology, University Medical Center and Medical Faculty Mannheim, Center of Excellence in Dermatology, Heidelberg University, Mannheim, Germany
| | - Mara Blauth
- Department of Dermatology, Venereology and Allergology, University Medical Center and Medical Faculty Mannheim, Center of Excellence in Dermatology, Heidelberg University, Mannheim, Germany
| | - Christian Lull
- Department of Dermatology, Venereology and Allergology, University Medical Center and Medical Faculty Mannheim, Center of Excellence in Dermatology, Heidelberg University, Mannheim, Germany
| | - Jan Alwin von Ahnen
- Department of Dermatology, Venereology and Allergology, University Medical Center and Medical Faculty Mannheim, Center of Excellence in Dermatology, Heidelberg University, Mannheim, Germany
| | - Georg Gross
- Department of Medicine V, Division of Rheumatology, University Medical Centre and Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Wanja Alexander Weigandt
- Department of Dermatology, Venereology and Allergology, University Medical Center and Medical Faculty Mannheim, Center of Excellence in Dermatology, Heidelberg University, Mannheim, Germany
| | - Johannes Knitza
- Institute of Digital Medicine, Philipps-University Marburg and University Hospital of Giessen and Marburg, Marburg, Germany
| | - Sebastian Kuhn
- Institute of Digital Medicine, Philipps-University Marburg and University Hospital of Giessen and Marburg, Marburg, Germany
| | - Johannes Benecke
- Department of Dermatology, Venereology and Allergology, University Medical Center and Medical Faculty Mannheim, Center of Excellence in Dermatology, Heidelberg University, Mannheim, Germany
| | - Jan Leipe
- Department of Medicine V, Division of Rheumatology, University Medical Centre and Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Astrid Schmieder
- Department of Dermatology, Venereology, and Allergology, University Hospital Würzburg, Würzburg, Germany
| | - Victor Olsavszky
- Department of Dermatology, Venereology and Allergology, University Medical Center and Medical Faculty Mannheim, Center of Excellence in Dermatology, Heidelberg University, Mannheim, Germany
| |
Collapse
|
5
|
Chen D, Wang SJ, Zhao ZJ, Ji X, Shen Q, Yu Y, Cui SD, Wang JG, Chen ZY, Wang JY, Guo ZY, Wu PX, Tang GQ. Genomic prediction of pig growth traits based on machine learning. Yi Chuan 2023; 45:922-932. [PMID: 37872114 DOI: 10.16288/j.yczz.23-120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
This study aimed to assess and compare the performance of different machine learning models in predicting selected pig growth traits and genomic estimated breeding values (GEBV) using automated machine learning, with the goal of optimizing whole-genome evaluation methods in pig breeding. The research employed genomic information, pedigree matrices, fixed effects, and phenotype data from 9968 pigs across multiple companies to derive four optimal machine learning models: deep learning (DL), random forest (RF), gradient boosting machine (GBM), and extreme gradient boosting (XGB). Through 10-fold cross-validation, predictions were made for GEBV and phenotypes of pigs reaching weight milestones (100 kg and 115 kg) with adjustments for backfat and days to weight. The findings indicated that machine learning models exhibited higher accuracy in predicting GEBV compared to phenotypic traits. Notably, GBM demonstrated superior GEBV prediction accuracy, with values of 0.683, 0.710, 0.866, and 0.871 for B100, B115, D100, and D115, respectively, slightly outperforming other methods. In phenotype prediction, GBM emerged as the best-performing model for pigs with B100, B115, D100, and D115 traits, achieving prediction accuracies of 0.547, followed by DL at 0.547, and then XGB with accuracies of 0.672 and 0.670. In terms of model training time, RF required the most time, while GBM and DL fell in between, and XGB demonstrated the shortest training time. In summary, machine learning models obtained through automated techniques exhibited higher GEBV prediction accuracy compared to phenotypic traits. GBM emerged as the overall top performer in terms of prediction accuracy and training time efficiency, while XGB demonstrated the ability to train accurate prediction models within a short timeframe. RF, on the other hand, had longer training times and insufficient accuracy, rendering it unsuitable for predicting pig growth traits and GEBV.
Collapse
Affiliation(s)
- Dong Chen
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Shu-Jie Wang
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Zhen-Jian Zhao
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Xiang Ji
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Qi Shen
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Yang Yu
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Sheng-di Cui
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Jun-Ge Wang
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Zi-Yang Chen
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| | - Jin-Yong Wang
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Zong-Yi Guo
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Ping-Xian Wu
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Guo-Qing Tang
- Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
- State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
| |
Collapse
|
6
|
Krupp L, Wiede C, Friedhoff J, Grabmaier A. Explainable Remaining Tool Life Prediction for Individualized Production Using Automated Machine Learning. Sensors (Basel) 2023; 23:8523. [PMID: 37896615 PMCID: PMC10610891 DOI: 10.3390/s23208523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 10/07/2023] [Accepted: 10/15/2023] [Indexed: 10/29/2023]
Abstract
The increasing demand for customized products is a core driver of novel automation concepts in Industry 4.0. For the case of machining complex free-form workpieces, e.g., in die making and mold making, individualized manufacturing is already the industrial practice. The varying process conditions and demanding machining processes lead to a high relevance of machining domain experts and a low degree of manufacturing flow automation. In order to increase the degree of automation, online process monitoring and the prediction of the quality-related remaining cutting tool life is indispensable. However, the varying process conditions complicate this as the correlation between the sensor signals and tool condition is not directly apparent. Furthermore, machine learning (ML) knowledge is limited on the shop floor, preventing a manual adaption of the models to changing conditions. Therefore, this paper introduces a new method for remaining tool life prediction in individualized production using automated machine learning (AutoML). The method enables the incorporation of machining expert knowledge via the model inputs and outputs. It automatically creates end-to-end ML pipelines based on optimized ensembles of regression and forecasting models. An explainability algorithm visualizes the relevance of the model inputs for the decision making. The method is analyzed and compared to a manual state-of-the-art approach for series production in a comprehensive evaluation using a new milling dataset. The dataset represents gradual tool wear under changing workpieces and process parameters. Our AutoML method outperforms the state-of-the-art approach and the evaluation indicates that a transfer of methods designed for series production to variable process conditions is not easily possible. Overall, the new method optimizes individualized production economically and in terms of resources. Machining experts with limited ML knowledge can leverage their domain knowledge to develop, validate and adapt tool life models.
Collapse
Affiliation(s)
- Lukas Krupp
- Fraunhofer Institute for Microelectronic Circuits and Systems, 47057 Duisburg, Germany
| | - Christian Wiede
- Fraunhofer Institute for Microelectronic Circuits and Systems, 47057 Duisburg, Germany
| | - Joachim Friedhoff
- CAX Technologies, University of Applied Sciences Ruhr West, 45407 Mülheim an der Ruhr, Germany
| | - Anton Grabmaier
- Fraunhofer Institute for Microelectronic Circuits and Systems, 47057 Duisburg, Germany
- Department of Electronic Components and Circuits, University of Duisburg-Essen, 47057 Duisburg, Germany
| |
Collapse
|
7
|
Thirunavukarasu AJ, Elangovan K, Gutierrez L, Li Y, Tan I, Keane PA, Korot E, Ting DSW. Democratizing Artificial Intelligence Imaging Analysis With Automated Machine Learning: Tutorial. J Med Internet Res 2023; 25:e49949. [PMID: 37824185 PMCID: PMC10603560 DOI: 10.2196/49949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/21/2023] [Accepted: 09/13/2023] [Indexed: 10/13/2023] Open
Abstract
Deep learning-based clinical imaging analysis underlies diagnostic artificial intelligence (AI) models, which can match or even exceed the performance of clinical experts, having the potential to revolutionize clinical practice. A wide variety of automated machine learning (autoML) platforms lower the technical barrier to entry to deep learning, extending AI capabilities to clinicians with limited technical expertise, and even autonomous foundation models such as multimodal large language models. Here, we provide a technical overview of autoML with descriptions of how autoML may be applied in education, research, and clinical practice. Each stage of the process of conducting an autoML project is outlined, with an emphasis on ethical and technical best practices. Specifically, data acquisition, data partitioning, model training, model validation, analysis, and model deployment are considered. The strengths and limitations of available code-free, code-minimal, and code-intensive autoML platforms are considered. AutoML has great potential to democratize AI in medicine, improving AI literacy by enabling "hands-on" education. AutoML may serve as a useful adjunct in research by facilitating rapid testing and benchmarking before significant computational resources are committed. AutoML may also be applied in clinical contexts, provided regulatory requirements are met. The abstraction by autoML of arduous aspects of AI engineering promotes prioritization of data set curation, supporting the transition from conventional model-driven approaches to data-centric development. To fulfill its potential, clinicians must be educated on how to apply these technologies ethically, rigorously, and effectively; this tutorial represents a comprehensive summary of relevant considerations.
Collapse
Affiliation(s)
- Arun James Thirunavukarasu
- University of Cambridge School of Clinical Medicine, Cambridge, United Kingdom
- Artificial Intelligence and Digital Innovation Research Group, Singapore Eye Research Institute, Singapore, Singapore
| | - Kabilan Elangovan
- Artificial Intelligence and Digital Innovation Research Group, Singapore Eye Research Institute, Singapore, Singapore
| | - Laura Gutierrez
- Artificial Intelligence and Digital Innovation Research Group, Singapore Eye Research Institute, Singapore, Singapore
| | - Yong Li
- Artificial Intelligence and Digital Innovation Research Group, Singapore Eye Research Institute, Singapore, Singapore
| | - Iris Tan
- Artificial Intelligence and Digital Innovation Research Group, Singapore Eye Research Institute, Singapore, Singapore
| | - Pearse A Keane
- Moorfields Eye Hospital NHS Foundation Trust, London, United Kingdom
| | - Edward Korot
- Byers Eye Institute, Stanford University, Palo Alto, CA, United States
- Retina Specialists of Michigan, Grand Rapids, MI, United States
| | - Daniel Shu Wei Ting
- Artificial Intelligence and Digital Innovation Research Group, Singapore Eye Research Institute, Singapore, Singapore
- Byers Eye Institute, Stanford University, Palo Alto, CA, United States
- Singapore National Eye Centre, Singapore, Singapore
| |
Collapse
|
8
|
Omar I, Khan M, Starr A, Abou Rok Ba K. Automated Prediction of Crack Propagation Using H2O AutoML. Sensors (Basel) 2023; 23:8419. [PMID: 37896512 PMCID: PMC10611134 DOI: 10.3390/s23208419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 10/06/2023] [Accepted: 10/09/2023] [Indexed: 10/29/2023]
Abstract
Crack propagation is a critical phenomenon in materials science and engineering, significantly impacting structural integrity, reliability, and safety across various applications. The accurate prediction of crack propagation behavior is paramount for ensuring the performance and durability of engineering components, as extensively explored in prior research. Nevertheless, there is a pressing demand for automated models capable of efficiently and precisely forecasting crack propagation. In this study, we address this need by developing a machine learning-based automated model using the powerful H2O library. This model aims to accurately predict crack propagation behavior in various materials by analyzing intricate crack patterns and delivering reliable predictions. To achieve this, we employed a comprehensive dataset derived from measured instances of crack propagation in Acrylonitrile Butadiene Styrene (ABS) specimens. Rigorous evaluation metrics, including Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R-squared (R2) values, were applied to assess the model's predictive accuracy. Cross-validation techniques were utilized to ensure its robustness and generalizability across diverse datasets. Our results underscore the automated model's remarkable accuracy and reliability in predicting crack propagation. This study not only highlights the immense potential of the H2O library as a valuable tool for structural health monitoring but also advocates for the broader adoption of Automated Machine Learning (AutoML) solutions in engineering applications. In addition to presenting these findings, we define H2O as a powerful machine learning library and AutoML as Automated Machine Learning to ensure clarity and understanding for readers unfamiliar with these terms. This research not only demonstrates the significance of AutoML in future-proofing our approach to structural integrity and safety but also emphasizes the need for comprehensive reporting and understanding in scientific discourse.
Collapse
Affiliation(s)
| | - Muhammad Khan
- School of Aerospace, Transport and Manufacturing, Cranfield University, Bedford MK43 0AL, UK
| | | | | |
Collapse
|
9
|
Sakagianni A, Koufopoulou C, Kalles D, Loupelis E, Verykios VS, Feretzakis G. Automated ML Techniques for Predicting COVID-19 Mortality in the ICU. Stud Health Technol Inform 2023; 305:517-520. [PMID: 37387081 DOI: 10.3233/shti230547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
The COVID-19 infection is still a serious threat to public health and healthcare systems. Numerous practical machine learning applications have been investigated in this context to support clinical decision-making, forecast disease severity and admission to the intensive care unit, as well as to predict the demand for hospital beds, equipment, and staff in the future. We retrospectively analyzed demographics, and routine blood biomarkers from consecutive Covid-19 patients admitted to the intensive care unit (ICU) of a public tertiary hospital, during a 17-month period, relative to the outcome, in order to build a prognostic model. We used the Google Vertex AI platform, on the one hand, to evaluate its performance in predicting ICU mortality, and on the other hand to show the ease with which even non-experts can make prognostic models. The model's performance regarding the area under the receiver operating characteristic curve (AUC-ROC) was 0.955. The six highest-ranked predictors of mortality in the prognostic model were age, serum urea, platelets, C-reactive protein, hemoglobin, and SGOT.
Collapse
Affiliation(s)
| | - Christina Koufopoulou
- Aretaieio Hospital, National and Kapodistrian University of Athens, Anesthesiology Department, Athens, Greece
| | - Dimitrios Kalles
- School of Science and Technology, Hellenic Open University, Patras, Greece
| | | | | | - Georgios Feretzakis
- School of Science and Technology, Hellenic Open University, Patras, Greece
- Sismanogleio General Hospital, Department of Quality Control, Research and Continuing Education, Marousi, Greece
| |
Collapse
|
10
|
Valeri JA, Soenksen LR, Collins KM, Ramesh P, Cai G, Powers R, Angenent-Mari NM, Camacho DM, Wong F, Lu TK, Collins JJ. BioAutoMATED: An end-to-end automated machine learning tool for explanation and design of biological sequences. Cell Syst 2023; 14:525-542.e9. [PMID: 37348466 PMCID: PMC10700034 DOI: 10.1016/j.cels.2023.05.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 02/17/2023] [Accepted: 05/22/2023] [Indexed: 06/24/2023]
Abstract
The design choices underlying machine-learning (ML) models present important barriers to entry for many biologists who aim to incorporate ML in their research. Automated machine-learning (AutoML) algorithms can address many challenges that come with applying ML to the life sciences. However, these algorithms are rarely used in systems and synthetic biology studies because they typically do not explicitly handle biological sequences (e.g., nucleotide, amino acid, or glycan sequences) and cannot be easily compared with other AutoML algorithms. Here, we present BioAutoMATED, an AutoML platform for biological sequence analysis that integrates multiple AutoML methods into a unified framework. Users are automatically provided with relevant techniques for analyzing, interpreting, and designing biological sequences. BioAutoMATED predicts gene regulation, peptide-drug interactions, and glycan annotation, and designs optimized synthetic biology components, revealing salient sequence characteristics. By automating sequence modeling, BioAutoMATED allows life scientists to incorporate ML more readily into their work.
Collapse
Affiliation(s)
- Jacqueline A Valeri
- Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Luis R Soenksen
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Department of Mechanical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA
| | - Katherine M Collins
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Department of Engineering, University of Cambridge, Trumpington St, Cambridge CB2 1PZ, UK
| | - Pradeep Ramesh
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - George Cai
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Rani Powers
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Pluto Biosciences, Golden, CO 80402, USA
| | - Nicolaas M Angenent-Mari
- Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Diogo M Camacho
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Felix Wong
- Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Timothy K Lu
- Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Synthetic Biology Group, Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - James J Collins
- Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA 02139, USA; Abdul Latif Jameel Clinic for Machine Learning in Health, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
11
|
Jeong D, Jeong W, Lee JH, Park SY. Use of Automated Machine Learning for Classifying Hemoperitoneum on Ultrasonographic Images of Morrison's Pouch: A Multicenter Retrospective Study. J Clin Med 2023; 12:4043. [PMID: 37373736 DOI: 10.3390/jcm12124043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 06/09/2023] [Accepted: 06/11/2023] [Indexed: 06/29/2023] Open
Abstract
This study evaluated automated machine learning (AutoML) in classifying the presence or absence of hemoperitoneum in ultrasonography (USG) images of Morrison's pouch. In this multicenter, retrospective study, 864 trauma patients from trauma and emergency medical centers in South Korea were included. In all, 2200 USG images (1100 hemoperitoneum and 1100 normal) were collected. Of these, 1800 images were used for training and 200 were used for the internal validation of AutoML. External validation was performed using 100 hemoperitoneum images and 100 normal images collected separately from a trauma center that were not included in the training and internal validation sets. Google's open-source AutoML was used to train the algorithm in classifying hemoperitoneum in USG images, followed by internal and external validation. In the internal validation, the sensitivity, specificity, and area under the receiver operating characteristic (AUROC) curve were 95%, 99%, and 0.97, respectively. In the external validation, the sensitivity, specificity, and AUROC were 94%, 99%, and 0.97, respectively. The performances of AutoML in the internal and external validation were not statistically different (p = 0.78). A publicly available, general-purpose AutoML can accurately classify the presence or absence of hemoperitoneum in USG images of the Morrison's pouch of real-world trauma patients.
Collapse
Affiliation(s)
- Dongkil Jeong
- Department of Emergency Medicine, College of Medicine, Soonchunhyang University, Cheonan 31151, Republic of Korea
| | - Wonjoon Jeong
- Department of Emergency Medicine, School of Medicine, Chungnam National University, Daejeon 35015, Republic of Korea
| | - Ji Han Lee
- Division of Emergency Medicine, Department of Medicine, The Catholic University of Korea, Seoul 11765, Republic of Korea
| | - Sin-Youl Park
- Department of Emergency Medicine, College of Medicine, Yeungnam University, Daegu 42415, Republic of Korea
| |
Collapse
|
12
|
Liu L, Zhang R, Shi D, Li R, Wang Q, Feng Y, Lu F, Zong Y, Xu X. Automated machine learning to predict the difficulty for endoscopic resection of gastric gastrointestinal stromal tumor. Front Oncol 2023; 13:1190987. [PMID: 37234977 PMCID: PMC10206233 DOI: 10.3389/fonc.2023.1190987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 04/26/2023] [Indexed: 05/28/2023] Open
Abstract
Background Accurate preoperative assessment of surgical difficulty is crucial to the success of the surgery and patient safety. This study aimed to evaluate the difficulty for endoscopic resection (ER) of gastric gastrointestinal stromal tumors (gGISTs) using multiple machine learning (ML) algorithms. Methods From December 2010 to December 2022, 555 patients with gGISTs in multi-centers were retrospectively studied and assigned to a training, validation, and test cohort. A difficult case was defined as meeting one of the following criteria: an operative time ≥ 90 min, severe intraoperative bleeding, or conversion to laparoscopic resection. Five types of algorithms were employed in building models, including traditional logistic regression (LR) and automated machine learning (AutoML) analysis (gradient boost machine (GBM), deep neural net (DL), generalized linear model (GLM), and default random forest (DRF)). We assessed the performance of the models using the areas under the receiver operating characteristic curves (AUC), the calibration curve, and the decision curve analysis (DCA) based on LR, as well as feature importance, SHapley Additive exPlanation (SHAP) Plots and Local Interpretable Model Agnostic Explanation (LIME) based on AutoML. Results The GBM model outperformed other models with an AUC of 0.894 in the validation and 0.791 in the test cohorts. Furthermore, the GBM model achieved the highest accuracy among these AutoML models, with 0.935 and 0.911 in the validation and test cohorts, respectively. In addition, it was found that tumor size and endoscopists' experience were the most prominent features that significantly impacted the AutoML model's performance in predicting the difficulty for ER of gGISTs. Conclusion The AutoML model based on the GBM algorithm can accurately predict the difficulty for ER of gGISTs before surgery.
Collapse
Affiliation(s)
- Luojie Liu
- Department of Gastroenterology, Changshu Hospital Affiliated to Soochow University, Suzhou, China
| | - Rufa Zhang
- Department of Gastroenterology, Changshu Hospital Affiliated to Soochow University, Suzhou, China
| | - Dongtao Shi
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Rui Li
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Qinghua Wang
- Department of Gastroenterology, No.1 People’s Hospital of Kunshan, Suzhou, China
| | - Yunfu Feng
- Department of Gastroenterology, No.1 People’s Hospital of Kunshan, Suzhou, China
| | - Fenying Lu
- Department of Gastroenterology, No.2 People’s Hospital of Changshu, Suzhou, China
| | - Yang Zong
- Department of General Surgery, Changshu Hospital Affiliated to Soochow University, Suzhou, China
| | - Xiaodan Xu
- Department of Gastroenterology, Changshu Hospital Affiliated to Soochow University, Suzhou, China
| |
Collapse
|
13
|
Chen F, Zhou B, Yang L, Chen X, Zhuang J. Predicting bacterial transport through saturated porous media using an automated machine learning model. Front Microbiol 2023; 14:1152059. [PMID: 37234532 PMCID: PMC10206036 DOI: 10.3389/fmicb.2023.1152059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 04/25/2023] [Indexed: 05/28/2023] Open
Abstract
Escherichia coli, as an indicator of fecal contamination, can move from manure-amended soil to groundwater under rainfall or irrigation events. Predicting its vertical transport in the subsurface is essential for the development of engineering solutions to reduce the risk of microbiological contamination. In this study, we collected 377 datasets from 61 published papers addressing E. coli transport through saturated porous media and trained six types of machine learning algorithms to predict bacterial transport. Eight variables, including bacterial concentration, porous medium type, median grain size, ionic strength, pore water velocity, column length, saturated hydraulic conductivity, and organic matter content were used as input variables while the first-order attachment coefficient and spatial removal rate were set as target variables. The eight input variables have low correlations with the target variables, namely, they cannot predict target variables independently. However, using the predictive models, input variables can effectively predict the target variables. For scenarios with higher bacterial retention, such as smaller median grain size, the predictive models showed better performance. Among six types of machine learning algorithms, Gradient Boosting Machine and Extreme Gradient Boosting outperformed other algorithms. In most predictive models, pore water velocity, ionic strength, median grain size, and column length showed higher importance than other input variables. This study provided a valuable tool to evaluate the transport risk of E.coli in the subsurface under saturated water flow conditions. It also proved the feasibility of data-driven methods that could be used for predicting other contaminants' transport in the environment.
Collapse
Affiliation(s)
- Fengxian Chen
- Key Laboratory of Pollution Ecology and Environmental Engineering, Institute of Applied Ecology, Chinese Academy of Sciences, Shenyang, Liaoning, China
| | - Bin Zhou
- Faculty of Medicine, University of Augsburg, Augsburg, Germany
| | - Liqiong Yang
- Key Laboratory of Pollution Ecology and Environmental Engineering, Institute of Applied Ecology, Chinese Academy of Sciences, Shenyang, Liaoning, China
| | - Xijuan Chen
- Key Laboratory of Pollution Ecology and Environmental Engineering, Institute of Applied Ecology, Chinese Academy of Sciences, Shenyang, Liaoning, China
| | - Jie Zhuang
- Department of Biosystems Engineering and Soil Science, Center for Environmental Biotechnology, The University of Tennessee, Knoxville, TN, United States
| |
Collapse
|
14
|
Chung J, Oh DJ, Park J, Kim SH, Lim YJ. Automatic Classification of GI Organs in Wireless Capsule Endoscopy Using a No-Code Platform-Based Deep Learning Model. Diagnostics (Basel) 2023; 13:diagnostics13081389. [PMID: 37189489 DOI: 10.3390/diagnostics13081389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 04/03/2023] [Accepted: 04/10/2023] [Indexed: 05/17/2023] Open
Abstract
The first step in reading a capsule endoscopy (CE) is determining the gastrointestinal (GI) organ. Because CE produces too many inappropriate and repetitive images, automatic organ classification cannot be directly applied to CE videos. In this study, we developed a deep learning algorithm to classify GI organs (the esophagus, stomach, small bowel, and colon) using a no-code platform, applied it to CE videos, and proposed a novel method to visualize the transitional area of each GI organ. We used training data (37,307 images from 24 CE videos) and test data (39,781 images from 30 CE videos) for model development. This model was validated using 100 CE videos that included "normal", "blood", "inflamed", "vascular", and "polypoid" lesions. Our model achieved an overall accuracy of 0.98, precision of 0.89, recall of 0.97, and F1 score of 0.92. When we validated this model relative to the 100 CE videos, it produced average accuracies for the esophagus, stomach, small bowel, and colon of 0.98, 0.96, 0.87, and 0.87, respectively. Increasing the AI score's cut-off improved most performance metrics in each organ (p < 0.05). To locate a transitional area, we visualized the predicted results over time, and setting the cut-off of the AI score to 99.9% resulted in a better intuitive presentation than the baseline. In conclusion, the GI organ classification AI model demonstrated high accuracy on CE videos. The transitional area could be more easily located by adjusting the cut-off of the AI score and visualization of its result over time.
Collapse
Affiliation(s)
- Joowon Chung
- Department of Internal Medicine, Nowon Eulji Medical Center, Eulji University School of Medicine, Seoul 01830, Republic of Korea
| | - Dong Jun Oh
- Department of Internal Medicine, Dongguk University Ilsan Hospital, Dongguk University College of Medicine, Goyang 10326, Republic of Korea
| | - Junseok Park
- Department of Internal Medicine, Digestive Disease Center, Institute for Digestive Research, Soonchunhyang University College of Medicine, Seoul 04401, Republic of Korea
| | - Su Hwan Kim
- Department of Internal Medicine, Seoul Metropolitan Government Seoul National University Boramae Medical Center, Seoul 07061, Republic of Korea
| | - Yun Jeong Lim
- Department of Internal Medicine, Dongguk University Ilsan Hospital, Dongguk University College of Medicine, Goyang 10326, Republic of Korea
| |
Collapse
|
15
|
Lai FL, Gao F. Auto-Kla: a novel web server to discriminate lysine lactylation sites using automated machine learning. Brief Bioinform 2023; 24:7068952. [PMID: 36869843 DOI: 10.1093/bib/bbad070] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/24/2023] [Accepted: 02/07/2023] [Indexed: 03/05/2023] Open
Abstract
Recently, lysine lactylation (Kla), a novel post-translational modification (PTM), which can be stimulated by lactate, has been found to regulate gene expression and life activities. Therefore, it is imperative to accurately identify Kla sites. Currently, mass spectrometry is the fundamental method for identifying PTM sites. However, it is expensive and time-consuming to achieve this through experiments alone. Herein, we proposed a novel computational model, Auto-Kla, to quickly and accurately predict Kla sites in gastric cancer cells based on automated machine learning (AutoML). With stable and reliable performance, our model outperforms the recently published model in the 10-fold cross-validation. To investigate the generalizability and transferability of our approach, we evaluated the performance of our models trained on two other widely studied types of PTM, including phosphorylation sites in host cells infected with SARS-CoV-2 and lysine crotonylation sites in HeLa cells. The results show that our models achieve comparable or better performance than current outstanding models. We believe that this method will become a useful analytical tool for PTM prediction and provide a reference for the future development of related models. The web server and source code are available at http://tubic.org/Kla and https://github.com/tubic/Auto-Kla, respectively.
Collapse
Affiliation(s)
- Fei-Liao Lai
- Department of Physics, School of Science, Tianjin University, Tianjin 300072, China
| | - Feng Gao
- Department of Physics, School of Science, Tianjin University, Tianjin 300072, China
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
- SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin 300072, China
| |
Collapse
|
16
|
González-Nóvoa JA, Campanioni S, Busto L, Fariña J, Rodríguez-Andina JJ, Vila D, Íñiguez A, Veiga C. Improving Intensive Care Unit Early Readmission Prediction Using Optimized and Explainable Machine Learning. Int J Environ Res Public Health 2023; 20:3455. [PMID: 36834150 PMCID: PMC9960143 DOI: 10.3390/ijerph20043455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/10/2023] [Accepted: 02/14/2023] [Indexed: 06/18/2023]
Abstract
It is of great interest to develop and introduce new techniques to automatically and efficiently analyze the enormous amount of data generated in today's hospitals, using state-of-the-art artificial intelligence methods. Patients readmitted to the ICU in the same hospital stay have a higher risk of mortality, morbidity, longer length of stay, and increased cost. The methodology proposed to predict ICU readmission could improve the patients' care. The objective of this work is to explore and evaluate the potential improvement of existing models for predicting early ICU patient readmission by using optimized artificial intelligence algorithms and explainability techniques. In this work, XGBoost is used as a predictor model, combined with Bayesian techniques to optimize it. The results obtained predicted early ICU readmission (AUROC of 0.92 ± 0.03) improves state-of-the-art consulted works (whose AUROC oscillate between 0.66 and 0.78). Moreover, we explain the internal functioning of the model by using Shapley Additive Explanation-based techniques, allowing us to understand the model internal performance and to obtain useful information, as patient-specific information, the thresholds from which a feature begins to be critical for a certain group of patients, and the feature importance ranking.
Collapse
Affiliation(s)
- José A. González-Nóvoa
- Galicia Sur Health Research Institute (IIS Galicia Sur), Álvaro Cunqueiro Hospital, 36310 Vigo, Spain
| | - Silvia Campanioni
- Galicia Sur Health Research Institute (IIS Galicia Sur), Álvaro Cunqueiro Hospital, 36310 Vigo, Spain
| | - Laura Busto
- Galicia Sur Health Research Institute (IIS Galicia Sur), Álvaro Cunqueiro Hospital, 36310 Vigo, Spain
| | - José Fariña
- Department of Electronic Technology, University of Vigo, 36310 Vigo, Spain
| | | | - Dolores Vila
- Intensive Care Unit Department, Complexo Hospitalario Universitario de Vigo (SERGAS), Álvaro Cunqueiro Hospital, 36213 Vigo, Spain
| | - Andrés Íñiguez
- Cardiology Department, Complexo Hospitalario Universitario de Vigo (SERGAS), Álvaro Cunqueiro Hospital, 36213 Vigo, Spain
| | - César Veiga
- Galicia Sur Health Research Institute (IIS Galicia Sur), Álvaro Cunqueiro Hospital, 36310 Vigo, Spain
| |
Collapse
|
17
|
Yuan P, Xu S, Zhai Z, Xu H. Research of intelligent reasoning system of Arabidopsis thaliana phenotype based on automated multi-task machine learning. Front Plant Sci 2023; 14:1048016. [PMID: 36866380 PMCID: PMC9974140 DOI: 10.3389/fpls.2023.1048016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 01/13/2023] [Indexed: 06/18/2023]
Abstract
Traditional machine learning in plant phenotyping research requires the assistance of professional data scientists and domain experts to adjust the structure and hy-perparameters tuning of neural network models with much human intervention, making the model training and deployment ineffective. In this paper, the automated machine learning method is researched to construct a multi-task learning model for Arabidopsis thaliana genotype classification, leaf number, and leaf area regression tasks. The experimental results show that the genotype classification task's accuracy and recall achieved 98.78%, precision reached 98.83%, and classification F 1 value reached 98.79%, as well as the R 2 of leaf number regression task and leaf area regression task reached 0.9925 and 0.9997 respectively. The experimental results demonstrated that the multi-task automated machine learning model can combine the benefits of multi-task learning and automated machine learning, which achieved more bias information from related tasks and improved the overall classification and prediction effect. Additionally, the model can be created automatically and has a high degree of generalization for better phenotype reasoning. In addition, the trained model and system can be deployed on cloud platforms for convenient application.
Collapse
Affiliation(s)
- Peisen Yuan
- *Correspondence: Peisen Yuan, ; Zhaoyu Zhai,
| | | | - Zhaoyu Zhai
- *Correspondence: Peisen Yuan, ; Zhaoyu Zhai,
| | | |
Collapse
|
18
|
González-Nóvoa JA, Busto L, Campanioni S, Fariña J, Rodríguez-Andina JJ, Vila D, Veiga C. Two-Step Approach for Occupancy Estimation in Intensive Care Units Based on Bayesian Optimization Techniques. Sensors (Basel) 2023; 23:1162. [PMID: 36772202 PMCID: PMC9919941 DOI: 10.3390/s23031162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 01/14/2023] [Accepted: 01/17/2023] [Indexed: 06/18/2023]
Abstract
Due to the high occupational pressure suffered by intensive care units (ICUs), a correct estimation of the patients' length of stay (LoS) in the ICU is of great interest to predict possible situations of collapse, to help healthcare personnel to select appropriate treatment options and to predict patients' conditions. There has been a high amount of data collected by biomedical sensors during the continuous monitoring process of patients in the ICU, so the use of artificial intelligence techniques in automatic LoS estimation would improve patients' care and facilitate the work of healthcare personnel. In this work, a novel methodology to estimate the LoS using data of the first 24 h in the ICU is presented. To achieve this, XGBoost, one of the most popular and efficient state-of-the-art algorithms, is used as an estimator model, and its performance is optimized both from computational and precision viewpoints using Bayesian techniques. For this optimization, a novel two-step approach is presented. The methodology was carefully designed to execute codes on a high-performance computing system based on graphics processing units, which considerably reduces the execution time. The algorithm scalability is analyzed. With the proposed methodology, the best set of XGBoost hyperparameters are identified, estimating LoS with a MAE of 2.529 days, improving the results reported in the current state of the art and probing the validity and utility of the proposed approach.
Collapse
Affiliation(s)
- José A. González-Nóvoa
- Galicia Sur Health Research Institute (IIS Galicia Sur), Álvaro Cunqueiro Hospital, 36310 Vigo, Spain
| | - Laura Busto
- Galicia Sur Health Research Institute (IIS Galicia Sur), Álvaro Cunqueiro Hospital, 36310 Vigo, Spain
| | - Silvia Campanioni
- Galicia Sur Health Research Institute (IIS Galicia Sur), Álvaro Cunqueiro Hospital, 36310 Vigo, Spain
| | - José Fariña
- Department of Electronic Technology, University of Vigo, 36310 Vigo, Spain
| | | | - Dolores Vila
- Intensive Care Unit Department, Complexo Hospitalario Universitario de Vigo (SERGAS), Álvaro Cunqueiro Hospital, 36213 Vigo, Spain
| | - César Veiga
- Galicia Sur Health Research Institute (IIS Galicia Sur), Álvaro Cunqueiro Hospital, 36310 Vigo, Spain
| |
Collapse
|
19
|
Chen T, Or CK. Automated machine learning-based prediction of the progression of knee pain, functional decline, and incidence of knee osteoarthritis in individuals at high risk of knee osteoarthritis: Data from the osteoarthritis initiative study. Digit Health 2023; 9:20552076231216419. [PMID: 38033512 PMCID: PMC10685797 DOI: 10.1177/20552076231216419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 11/07/2023] [Indexed: 12/02/2023] Open
Abstract
Objective This study aimed to examine the performance of machine learning models in predicting the progression of knee pain, functional decline, and incidence of knee osteoarthritis (OA) in high-risk individuals, with automated machine learning (AutoML) being used to automate the prediction process. Design There were four stages in the process of our AutoML-integrated prediction. Stage 1-Data preparation: The data of 3200 eligible individuals in the Osteoarthritis Initiative (OAI) study who were considered at high risk of knee OA at the baseline visit were extracted and used. Specifically, 1094 variables from the OAI study were used to predict the changes in knee pain, physical function, and incidence of knee OA (i.e. the first occurrence of frequent knee symptoms and definite tibial osteophytes (Kellgren and Lawrence grade ≥2)) over a 9-year period. Stage 2-Model training: The AutoML approach was used to automatically train nine widely used machine learning (ML) models. Stage 3-Model testing: The AutoML approach was used to automatically test the performance of the ML models. Stage 4-Selection of important input variables: The AutoML approach automated the process of computing the importance scores of all input variables and identifying the most important ones, using the technique of permutation feature importance. Results Using the AutoML approach, the weighted ensemble model and the CatBoost model showed the best performance among all nine ML models. For the prediction of each outcome in each year, the five most important input variables were identified, most of which were obtained from self-reported questionnaire surveys and radiographic imaging reports. Conclusion The AutoML approach has shown potential in automating the process of using ML models to predict long-term changes in knee OA-related outcomes. Its use could support the deployment of ML solutions, facilitating the provision of personalized interventions to prevent the deterioration of knee health and incident knee OA.
Collapse
Affiliation(s)
- Tianrong Chen
- Department of Industrial and Manufacturing Systems Engineering, The University of Hong Kong, Hong Kong, China
| | - Calvin Kalun Or
- Department of Industrial and Manufacturing Systems Engineering, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
20
|
Yu C, Li Y, Yin M, Gao J, Xi L, Lin J, Liu L, Zhang H, Wu A, Xu C, Liu X, Wang Y, Zhu J. Automated Machine Learning in Predicting 30-Day Mortality in Patients with Non-Cholestatic Cirrhosis. J Pers Med 2022; 12:1930. [PMID: 36422105 PMCID: PMC9693570 DOI: 10.3390/jpm12111930] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/09/2022] [Accepted: 11/18/2022] [Indexed: 07/30/2023] Open
Abstract
OBJECTIVE To evaluate the feasibility of automated machine learning (AutoML) in predicting 30-day mortality in non-cholestatic cirrhosis. METHODS A total of 932 cirrhotic patients were included from the First Affiliated Hospital of Soochow University between 2014 and 2020. Participants were divided into training and validation datasets at a ratio of 8.5:1.5. Models were developed on the H2O AutoML platform in the training dataset, and then were evaluated in the validation dataset by area under receiver operating characteristic curves (AUC). The best AutoML model was interpreted by SHapley Additive exPlanation (SHAP) Plot, Partial Dependence Plots (PDP), and Local Interpretable Model Agnostic Explanation (LIME). RESULTS The model, based on the extreme gradient boosting (XGBoost) algorithm, performed better (AUC 0.888) than the other AutoML models (logistic regression 0.673, gradient boost machine 0.886, random forest 0.866, deep learning 0.830, stacking 0.850), as well as the existing scorings (the model of end-stage liver disease [MELD] score 0.778, MELD-Na score 0.782, and albumin-bilirubin [ALBI] score 0.662). The most key variable in the XGBoost model was high-density lipoprotein cholesterol, followed by creatinine, white blood cell count, international normalized ratio, etc. Conclusion: The AutoML model based on the XGBoost algorithm presented better performance than the existing scoring systems for predicting 30-day mortality in patients with non-cholestatic cirrhosis. It shows the promise of AutoML in its future medical application.
Collapse
Affiliation(s)
- Chenyan Yu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou 215000, China
- Department of Gastroenterology, Dushu Lake Hospital Affiliated to Soochow University, Suzhou 215000, China
| | - Yao Li
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou 215000, China
| | - Minyue Yin
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou 215000, China
| | - Jingwen Gao
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou 215000, China
| | - Liting Xi
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou 215000, China
| | - Jiaxi Lin
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou 215000, China
| | - Lu Liu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou 215000, China
| | - Huixian Zhang
- Department of Gastroenterology, Dushu Lake Hospital Affiliated to Soochow University, Suzhou 215000, China
| | - Airong Wu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou 215000, China
| | - Chunfang Xu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou 215000, China
| | - Xiaolin Liu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou 215000, China
| | - Yue Wang
- Department of Hepatology, The Fifth People’s Hospital of Suzhou, Suzhou 215000, China
| | - Jinzhou Zhu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou 215000, China
| |
Collapse
|
21
|
Zhou K, Huang X, Song Q, Chen R, Hu X. Auto-GNN: Neural architecture search of graph neural networks. Front Big Data 2022; 5:1029307. [PMID: 36466713 PMCID: PMC9714572 DOI: 10.3389/fdata.2022.1029307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Accepted: 10/26/2022] [Indexed: 09/19/2023] Open
Abstract
Graph neural networks (GNNs) have been widely used in various graph analysis tasks. As the graph characteristics vary significantly in real-world systems, given a specific scenario, the architecture parameters need to be tuned carefully to identify a suitable GNN. Neural architecture search (NAS) has shown its potential in discovering the effective architectures for the learning tasks in image and language modeling. However, the existing NAS algorithms cannot be applied efficiently to GNN search problem because of two facts. First, the large-step exploration in the traditional controller fails to learn the sensitive performance variations with slight architecture modifications in GNNs. Second, the search space is composed of heterogeneous GNNs, which prevents the direct adoption of parameter sharing among them to accelerate the search progress. To tackle the challenges, we propose an automated graph neural networks (AGNN) framework, which aims to find the optimal GNN architecture efficiently. Specifically, a reinforced conservative controller is designed to explore the architecture space with small steps. To accelerate the validation, a novel constrained parameter sharing strategy is presented to regularize the weight transferring among GNNs. It avoids training from scratch and saves the computation time. Experimental results on the benchmark datasets demonstrate that the architecture identified by AGNN achieves the best performance and search efficiency, comparing with existing human-invented models and the traditional search methods.
Collapse
Affiliation(s)
- Kaixiong Zhou
- DATA Lab, Department of Computer Science, Rice University, Houston, TX, United States
| | - Xiao Huang
- Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China
| | | | - Rui Chen
- Samsung Research America, Silicon Valley, CA, United States
| | - Xia Hu
- DATA Lab, Department of Computer Science, Rice University, Houston, TX, United States
| |
Collapse
|
22
|
Chen J, Zhang J, Zhao H. Quantifying Alignment Deviations for the In-Plane Biaxial Test System via a Shape-Optimised Cruciform Specimen. Materials (Basel) 2022; 15:4949. [PMID: 35888416 DOI: 10.3390/ma15144949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/07/2022] [Accepted: 07/10/2022] [Indexed: 11/17/2022]
Abstract
The loading coaxiality of an in-plane biaxial test system and the structure of a cruciform specimen markedly affect the test results. However, due to the lack of methods for correcting the loading coaxiality and designing the cruciform specimen, the data scatter of the test results of the in-plane biaxial test systems varies from the laboratory to different tests. To quantify the loading coaxiality of the in-plane biaxial test system, we first developed a model to calculate alignment deviations with strain distribution of the shape-optimised cruciform specimen with Automated Machine Learning (AutoML). Our results demonstrated that 99.2% (54,536 of 54,976) of the quantified errors are less than 5%. Quantifying alignment deviations for an in-plane biaxial test system has been solved. The quantified method of alignment deviations could enhance the reliability of test data, improve assembly efficiency, and aid in constructing failure criteria of materials under biaxial stress.
Collapse
|
23
|
Topcu Dİ, Bayraktar N. Searching for the urine osmolality surrogate: an automated machine learning approach. Clin Chem Lab Med 2022; 60:1911-1920. [PMID: 35778953 DOI: 10.1515/cclm-2022-0415] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 06/22/2022] [Indexed: 12/30/2022]
Abstract
OBJECTIVES Automated machine learning (AutoML) tools can help clinical laboratory professionals to develop machine learning models. The objective of this study was to develop a novel formula for the estimation of urine osmolality using an AutoML tool and to determine the efficiency of AutoML tools in a clinical laboratory setting. METHODS Three hundred routine urinalysis samples were used for reference osmolality and urine clinical chemistry analysis. The H2O AutoML engine completed the machine learning development steps with minimum human intervention. Four feature groups were created, which include different urinalysis measurements according to the Boruta feature selection algorithm. Method comparison statistics including Spearman correlation, Passing-Bablok regression analysis were performed, and Bland Altman plots were created to compare model predictions with the reference method. The minimum allowable bias (24.17%) from biological variation data was used as the limit of agreement. RESULTS The AutoML engine developed a total of 183 ML models. Conductivity and specific gravity had the highest variable importance. Models that include conductivity, specific gravity, and other urinalysis parameters had the highest R2 (0.70-0.83), and 70-84% of results were within the limit of agreement. CONCLUSIONS Combining urinary conductivity with other urinalysis parameters using validated machine learning models can yield a promising surrogate. Additionally, AutoML tools facilitate the machine learning development cycle and should be considered for developing ML models in clinical laboratories.
Collapse
Affiliation(s)
- Deniz İlhan Topcu
- Department of Medical Biochemistry, Başkent University Faculty of Medicine, Ankara, Turkey
| | - Nilüfer Bayraktar
- Department of Medical Biochemistry, Başkent University Faculty of Medicine, Ankara, Turkey
| |
Collapse
|
24
|
Yin M, Zhang R, Zhou Z, Liu L, Gao J, Xu W, Yu C, Lin J, Liu X, Xu C, Zhu J. Automated Machine Learning for the Early Prediction of the Severity of Acute Pancreatitis in Hospitals. Front Cell Infect Microbiol 2022; 12:886935. [PMID: 35755847 PMCID: PMC9226483 DOI: 10.3389/fcimb.2022.886935] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 04/29/2022] [Indexed: 11/13/2022] Open
Abstract
Background Machine learning (ML) algorithms are widely applied in building models of medicine due to their powerful studying and generalizing ability. This study aims to explore different ML models for early identification of severe acute pancreatitis (SAP) among patients hospitalized for acute pancreatitis. Methods This retrospective study enrolled patients with acute pancreatitis (AP) from multiple centers. Data from the First Affiliated Hospital and Changshu No. 1 Hospital of Soochow University were adopted for training and internal validation, and data from the Second Affiliated Hospital of Soochow University were adopted for external validation from January 2017 to December 2021. The diagnosis of AP and SAP was based on the 2012 revised Atlanta classification of acute pancreatitis. Models were built using traditional logistic regression (LR) and automated machine learning (AutoML) analysis with five types of algorithms. The performance of models was evaluated by the receiver operating characteristic (ROC) curve, the calibration curve, and the decision curve analysis (DCA) based on LR and feature importance, SHapley Additive exPlanation (SHAP) Plot, and Local Interpretable Model Agnostic Explanation (LIME) based on AutoML. Results A total of 1,012 patients were included in this study to develop the AutoML models in the training/validation dataset. An independent dataset of 212 patients was used to test the models. The model developed by the gradient boost machine (GBM) outperformed other models with an area under the ROC curve (AUC) of 0.937 in the validation set and an AUC of 0.945 in the test set. Furthermore, the GBM model achieved the highest sensitivity value of 0.583 among these AutoML models. The model developed by eXtreme Gradient Boosting (XGBoost) achieved the highest specificity value of 0.980 and the highest accuracy of 0.958 in the test set. Conclusions The AutoML model based on the GBM algorithm for early prediction of SAP showed evident clinical practicability.
Collapse
Affiliation(s)
- Minyue Yin
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Rufa Zhang
- Department of Gastroenterology, The Changshu No. 1 Hospital of Soochow University, Suzhou, China
| | - Zhirun Zhou
- Department of Obstetrics and Gynecology, The Second Affiliated Hospital of Soochow University, Suzhou, China
| | - Lu Liu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Jingwen Gao
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Wei Xu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Chenyan Yu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Jiaxi Lin
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Xiaolin Liu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Chunfang Xu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Jinzhou Zhu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, China
| |
Collapse
|
25
|
Ritter Z, Papp L, Zámbó K, Tóth Z, Dezső D, Veres DS, Máthé D, Budán F, Karádi É, Balikó A, Pajor L, Szomor Á, Schmidt E, Alizadeh H. Two-Year Event-Free Survival Prediction in DLBCL Patients Based on In Vivo Radiomics and Clinical Parameters. Front Oncol 2022; 12:820136. [PMID: 35756658 PMCID: PMC9216187 DOI: 10.3389/fonc.2022.820136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 05/18/2022] [Indexed: 12/11/2022] Open
Abstract
Purpose For the identification of high-risk patients in diffuse large B-cell lymphoma (DLBCL), we investigated the prognostic significance of in vivo radiomics derived from baseline [18F]FDG PET/CT and clinical parameters. Methods Pre-treatment [18F]FDG PET/CT scans of 85 patients diagnosed with DLBCL were assessed. The scans were carried out in two clinical centers. Two-year event-free survival (EFS) was defined. After delineation of lymphoma lesions, conventional PET parameters and in vivo radiomics were extracted. For 2-year EFS prognosis assessment, the Center 1 dataset was utilized as the training set and underwent automated machine learning analysis. The dataset of Center 2 was utilized as an independent test set to validate the established predictive model built by the dataset of Center 1. Results The automated machine learning analysis of the Center 1 dataset revealed that the most important features for building 2-year EFS are as follows: max diameter, neighbor gray tone difference matrix (NGTDM) busyness, total lesion glycolysis, total metabolic tumor volume, and NGTDM coarseness. The predictive model built on the Center 1 dataset yielded 79% sensitivity, 83% specificity, 69% positive predictive value, 89% negative predictive value, and 0.85 AUC by evaluating the Center 2 dataset. Conclusion Based on our dual-center retrospective analysis, predicting 2-year EFS built on imaging features is feasible by utilizing high-performance automated machine learning.
Collapse
Affiliation(s)
- Zsombor Ritter
- Department of Medical Imaging, Medical School, University of Pécs, Pécs, Hungary
| | - László Papp
- Medical University of Vienna, Center for Medical Physics and Biomedical Engineering, Vienna, Austria
| | - Katalin Zámbó
- Department of Medical Imaging, Medical School, University of Pécs, Pécs, Hungary
| | - Zoltán Tóth
- University of Kaposvár, PET Medicopus Nonprofit Ltd., Kaposvár, Hungary
| | - Dániel Dezső
- Department of Medical Imaging, Medical School, University of Pécs, Pécs, Hungary
| | - Dániel Sándor Veres
- Department of Biophysics and Radiation Biology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Domokos Máthé
- Department of Biophysics and Radiation Biology, Faculty of Medicine, Semmelweis University, Budapest, Hungary.,In Vivo Imaging Advanced Core Facility, Hungarian Centre of Excellence for Molecular Medicine, Budapest, Hungary
| | - Ferenc Budán
- Institute of Transdisciplinary Discoveries, Medical School, University of Pécs, Pécs, Hungary.,Institute of Physiology, Medical School, University of Pécs, Pécs, Hungary
| | - Éva Karádi
- Department of Hematology, University of Kaposvár, Kaposvár, Hungary
| | - Anett Balikó
- County Hospital Tolna, János Balassa Hospital, Szekszárd, Hungary
| | - László Pajor
- Department of Pathology, Medical School, University of Pécs, Pécs, Hungary
| | - Árpád Szomor
- 1st Department of Internal Medicine, Medical School, University of Pécs, Pécs, Hungary
| | - Erzsébet Schmidt
- Department of Medical Imaging, Medical School, University of Pécs, Pécs, Hungary
| | - Hussain Alizadeh
- 1st Department of Internal Medicine, Medical School, University of Pécs, Pécs, Hungary
| |
Collapse
|
26
|
Manduchi E, Le TT, Fu W, Moore JH. Genetic Analysis of Coronary Artery Disease Using Tree-Based Automated Machine Learning Informed By Biology-Based Feature Selection. IEEE/ACM Trans Comput Biol Bioinform 2022; 19:1379-1386. [PMID: 34310318 PMCID: PMC9291719 DOI: 10.1109/tcbb.2021.3099068] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Machine Learning (ML) approaches are increasingly being used in biomedical applications. Important challenges of ML include choosing the right algorithm and tuning the parameters for optimal performance. Automated ML (AutoML) methods, such as Tree-based Pipeline Optimization Tool (TPOT), have been developed to take some of the guesswork out of ML thus making this technology available to users from more diverse backgrounds. The goals of this study were to assess applicability of TPOT to genomics and to identify combinations of single nucleotide polymorphisms (SNPs) associated with coronary artery disease (CAD), with a focus on genes with high likelihood of being good CAD drug targets. We leveraged public functional genomic resources to group SNPs into biologically meaningful sets to be selected by TPOT. We applied this strategy to data from the U.K. Biobank, detecting a strikingly recurrent signal stemming from a group of 28 SNPs. Importance analysis of these SNPs uncovered functional relevance of the top SNPs to genes whose association with CAD is supported in the literature and other resources. Furthermore, we employed game-theory based metrics to study SNP contributions to individual-level TPOT predictions and discover distinct clusters of well-predicted CAD cases. The latter indicates a promising approach towards precision medicine.
Collapse
|
27
|
Yuan H, Xie F, Eng Hock Ong M, Ning Y, Lucas Chee M, Ehsan Saffari S, Rizal Abdullah H, Alan Goldstein B, Chakraborty B, Liu N. AutoScore-Imbalance: An Interpretable Machine Learning Tool for Development of Clinical Scores with Rare Events Data. J Biomed Inform 2022; 129:104072. [PMID: 35421602 DOI: 10.1016/j.jbi.2022.104072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 03/10/2022] [Accepted: 04/07/2022] [Indexed: 02/06/2023]
Abstract
BACKGROUND Medical decision-making impacts both individual and public health. Clinical scores are commonly used among various decision-making models to determine the degree of disease deterioration at the bedside. AutoScore was proposed as a useful clinical score generator based on machine learning and a generalized linear model. However, its current framework still leaves room for improvement when addressing unbalanced data of rare events. METHODS Using machine intelligence approaches, we developed AutoScore-Imbalance, which comprises three components: training dataset optimization, sample weight optimization, and adjusted AutoScore. Baseline techniques for performance comparison included the original AutoScore, full logistic regression, stepwise logistic regression, least absolute shrinkage and selection operator (LASSO), full random forest, and random forest with a reduced number of variables. These models were evaluated based on their area under the curve (AUC) in the receiver operating characteristic analysis and balanced accuracy (i.e., mean value of sensitivity and specificity). By utilizing a publicly accessible dataset from Beth Israel Deaconess Medical Center, we assessed the proposed model and baseline approaches to predict inpatient mortality. RESULTS AutoScore-Imbalance outperformed baselines in terms of AUC and balanced accuracy. The nine-variable AutoScore-Imbalance sub-model achieved the highest AUC of 0.786 (0.732-0.839), while the eleven-variable original AutoScore obtained an AUC of 0.723 (0.663-0.783), and the logistic regression with 21 variables obtained an AUC of 0.743 (0.685-0.800). The AutoScore-Imbalance sub-model (using a down-sampling algorithm) yielded an AUC of 0.771 (0.718-0.823) with only five variables, demonstrating a good balance between performance and variable sparsity. Furthermore, AutoScore-Imbalance obtained the highest balanced accuracy of 0.757 (0.702-0.805), compared to 0.698 (0.643-0.753) by the original AutoScore and the maximum of 0.720 (0.664-0.769) by other baseline models. CONCLUSIONS We have developed an interpretable tool to handle clinical data imbalance, presented its structure, and demonstrated its superiority over baselines. The AutoScore-Imbalance tool can be applied to highly unbalanced datasets to gain further insight into rare medical events and facilitate real-world clinical decision-making.
Collapse
Affiliation(s)
- Han Yuan
- Duke-NUS Medical School, National University of Singapore, Singapore
| | - Feng Xie
- Duke-NUS Medical School, National University of Singapore, Singapore
| | - Marcus Eng Hock Ong
- Duke-NUS Medical School, National University of Singapore, Singapore; Department of Emergency Medicine, Singapore General Hospital, Singapore; Health Services Research Centre, Singapore Health Services, Singapore
| | - Yilin Ning
- Duke-NUS Medical School, National University of Singapore, Singapore
| | - Marcel Lucas Chee
- Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, Australia
| | | | - Hairil Rizal Abdullah
- Duke-NUS Medical School, National University of Singapore, Singapore; Department of Anaesthesiology, Singapore General Hospital, Singapore
| | - Benjamin Alan Goldstein
- Duke-NUS Medical School, National University of Singapore, Singapore; Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States
| | - Bibhas Chakraborty
- Duke-NUS Medical School, National University of Singapore, Singapore; Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States; Department of Statistics and Data Science, National University of Singapore, Singapore
| | - Nan Liu
- Duke-NUS Medical School, National University of Singapore, Singapore; Health Services Research Centre, Singapore Health Services, Singapore; Institute of Data Science, National University of Singapore, Singapore.
| |
Collapse
|
28
|
Angarita-Zapata JS, Maestre-Gongora G, Calderín JF. A Bibliometric Analysis and Benchmark of Machine Learning and AutoML in Crash Severity Prediction: The Case Study of Three Colombian Cities. Sensors (Basel) 2021; 21:s21248401. [PMID: 34960494 PMCID: PMC8708527 DOI: 10.3390/s21248401] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 12/10/2021] [Accepted: 12/14/2021] [Indexed: 11/16/2022]
Abstract
Traffic accidents are of worldwide concern, as they are one of the leading causes of death globally. One policy designed to cope with them is the design and deployment of road safety systems. These aim to predict crashes based on historical records, provided by new Internet of Things (IoT) technologies, to enhance traffic flow management and promote safer roads. Increasing data availability has helped machine learning (ML) to address the prediction of crashes and their severity. The literature reports numerous contributions regarding survey papers, experimental comparisons of various techniques, and the design of new methods at the point where crash severity prediction (CSP) and ML converge. Despite such progress, and as far as we know, there are no comprehensive research articles that theoretically and practically approach the model selection problem (MSP) in CSP. Thus, this paper introduces a bibliometric analysis and experimental benchmark of ML and automated machine learning (AutoML) as a suitable approach to automatically address the MSP in CSP. Firstly, 2318 bibliographic references were consulted to identify relevant authors, trending topics, keywords evolution, and the most common ML methods used in related-case studies, which revealed an opportunity for the use AutoML in the transportation field. Then, we compared AutoML (AutoGluon, Auto-sklearn, TPOT) and ML (CatBoost, Decision Tree, Extra Trees, Gradient Boosting, Gaussian Naive Bayes, Light Gradient Boosting Machine, Random Forest) methods in three case studies using open data portals belonging to the cities of Medellín, Bogotá, and Bucaramanga in Colombia. Our experimentation reveals that AutoGluon and CatBoost are competitive and robust ML approaches to deal with various CSP problems. In addition, we concluded that general-purpose AutoML effectively supports the MSP in CSP without developing domain-focused AutoML methods for this supervised learning problem. Finally, based on the results obtained, we introduce challenges and research opportunities that the community should explore to enhance the contributions that ML and AutoML can bring to CSP and other transportation areas.
Collapse
Affiliation(s)
- Juan S. Angarita-Zapata
- DeustoTech, Faculty of Engineering, University of Deusto, 48007 Bilbao, Spain;
- Correspondence:
| | - Gina Maestre-Gongora
- Faculty of Engineering, Universidad Cooperativa de Colombia, Medellín 050012, Colombia;
| | | |
Collapse
|
29
|
Manduchi E, Moore JH. Leveraging Automated Machine Learning for the Analysis of Global Public Health Data: A Case Study in Malaria. Int J Public Health 2021; 66:614296. [PMID: 34744577 PMCID: PMC8565284 DOI: 10.3389/ijph.2021.614296] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 03/17/2021] [Indexed: 11/13/2022] Open
Affiliation(s)
- Elisabetta Manduchi
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, United States.,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, United States
| | - Jason H Moore
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, United States.,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
30
|
Wang K, Xue Q, Lu JJ. Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework. Int J Environ Res Public Health 2021; 18:7534. [PMID: 34299986 DOI: 10.3390/ijerph18147534] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 06/26/2021] [Accepted: 07/03/2021] [Indexed: 11/26/2022]
Abstract
Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset sampling or cost-sensitive learning, this paper proposes a novel automated machine learning framework that simultaneously and automatically searches for the optimal sampling, cost-sensitive loss function, and probability calibration to handle class-imbalance problem in recognition of risky drivers. The hyperparameters that control sampling ratio and class weight, along with other hyperparameters, are optimized by Bayesian optimization. To demonstrate the performance of the proposed automated learning framework, we establish a risky driver recognition model as a case study, using video-extracted vehicle trajectory data of 2427 private cars on a German highway. Based on rear-end collision risk evaluation, only 4.29% of all drivers are labeled as risky drivers. The inputs of the recognition model are the discrete Fourier transform coefficients of target vehicle’s longitudinal speed, lateral speed, and the gap between the target vehicle and its preceding vehicle. Among 12 sampling methods, 2 cost-sensitive loss functions, and 2 probability calibration methods, the result of automated machine learning is consistent with manual searching but much more computation-efficient. We find that the combination of Support Vector Machine-based Synthetic Minority Oversampling TEchnique (SVMSMOTE) sampling, cost-sensitive cross-entropy loss function, and isotonic regression can significantly improve the recognition ability and reduce the error of predicted probability.
Collapse
|
31
|
Abstract
Neural architecture search (NAS), which aims at automatically seeking proper neural architectures given a specific task, has attracted extensive attention recently in supervised learning applications. In most real-world situations, the class labels provided in the training data would be noisy due to many reasons, such as subjective judgments, inadequate information, and random human errors. Existing work has demonstrated the adverse effects of label noise on the learning of weights of neural networks. These effects could become more critical in NAS since the architectures are not only trained with noisy labels but are also compared based on their performances on noisy validation sets. In this paper, we systematically explore the robustness of NAS under label noise. We show that label noise in the training and/or validation data can lead to various degrees of performance variations. Through empirical experiments, using robust loss functions can mitigate the performance degradation under symmetric label noise as well as under a simple model of class conditional label noise. We also provide a theoretical justification for this. Both empirical and theoretical results provide a strong argument in favor of employing the robust loss function in NAS under high-level noise.
Collapse
Affiliation(s)
- Yi-Wei Chen
- DATALab, Department of Computer Science and Engineering, Texas A&M University, College Station, TX, United States
| | - Qingquan Song
- DATALab, Department of Computer Science and Engineering, Texas A&M University, College Station, TX, United States
| | - Xi Liu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, United States
| | - P S Sastry
- Department of Electrical Engineering, Indian Institute of Science, Bangalore, India
| | - Xia Hu
- DATALab, Department of Computer Science and Engineering, Texas A&M University, College Station, TX, United States
| |
Collapse
|
32
|
Ikemura K, Bellin E, Yagi Y, Billett H, Saada M, Simone K, Stahl L, Szymanski J, Goldstein DY, Reyes Gil M. Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study. J Med Internet Res 2021; 23:e23458. [PMID: 33539308 PMCID: PMC7919846 DOI: 10.2196/23458] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 12/23/2020] [Accepted: 02/03/2021] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND During a pandemic, it is important for clinicians to stratify patients and decide who receives limited medical resources. Machine learning models have been proposed to accurately predict COVID-19 disease severity. Previous studies have typically tested only one machine learning algorithm and limited performance evaluation to area under the curve analysis. To obtain the best results possible, it may be important to test different machine learning algorithms to find the best prediction model. OBJECTIVE In this study, we aimed to use automated machine learning (autoML) to train various machine learning algorithms. We selected the model that best predicted patients' chances of surviving a SARS-CoV-2 infection. In addition, we identified which variables (ie, vital signs, biomarkers, comorbidities, etc) were the most influential in generating an accurate model. METHODS Data were retrospectively collected from all patients who tested positive for COVID-19 at our institution between March 1 and July 3, 2020. We collected 48 variables from each patient within 36 hours before or after the index time (ie, real-time polymerase chain reaction positivity). Patients were followed for 30 days or until death. Patients' data were used to build 20 machine learning models with various algorithms via autoML. The performance of machine learning models was measured by analyzing the area under the precision-recall curve (AUPCR). Subsequently, we established model interpretability via Shapley additive explanation and partial dependence plots to identify and rank variables that drove model predictions. Afterward, we conducted dimensionality reduction to extract the 10 most influential variables. AutoML models were retrained by only using these 10 variables, and the output models were evaluated against the model that used 48 variables. RESULTS Data from 4313 patients were used to develop the models. The best model that was generated by using autoML and 48 variables was the stacked ensemble model (AUPRC=0.807). The two best independent models were the gradient boost machine and extreme gradient boost models, which had an AUPRC of 0.803 and 0.793, respectively. The deep learning model (AUPRC=0.73) was substantially inferior to the other models. The 10 most influential variables for generating high-performing models were systolic and diastolic blood pressure, age, pulse oximetry level, blood urea nitrogen level, lactate dehydrogenase level, D-dimer level, troponin level, respiratory rate, and Charlson comorbidity score. After the autoML models were retrained with these 10 variables, the stacked ensemble model still had the best performance (AUPRC=0.791). CONCLUSIONS We used autoML to develop high-performing models that predicted the survival of patients with COVID-19. In addition, we identified important variables that correlated with mortality. This is proof of concept that autoML is an efficient, effective, and informative method for generating machine learning-based clinical decision support tools.
Collapse
Affiliation(s)
- Kenji Ikemura
- Department of Pathology, Albert Einstein College of Medicine, Montefiore Medical Center, The Bronx, NY, United States.,Tsubomi Technology, The Bronx, NY, United States
| | - Eran Bellin
- Department of Epidemiology and Population Health and Medicine, Albert Einstein College of Medicine, Montefiore Medical Center, The Bronx, NY, United States
| | - Yukako Yagi
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, United States
| | - Henny Billett
- Department of Oncology and Medicine, Albert Einstein College of Medicine, Montefiore Medical Center, The Bronx, NY, United States
| | | | | | - Lindsay Stahl
- Department of Epidemiology and Population Health and Medicine, Albert Einstein College of Medicine, Montefiore Medical Center, The Bronx, NY, United States
| | - James Szymanski
- Department of Pathology, Albert Einstein College of Medicine, Montefiore Medical Center, The Bronx, NY, United States
| | - D Y Goldstein
- Department of Pathology, Albert Einstein College of Medicine, Montefiore Medical Center, The Bronx, NY, United States
| | - Morayma Reyes Gil
- Department of Pathology, Albert Einstein College of Medicine, Montefiore Medical Center, The Bronx, NY, United States
| |
Collapse
|
33
|
Zhang S, Sun H, Su X, Yang X, Wang W, Wan X, Tan Q, Chen N, Yue Q, Gong Q. Automated machine learning to predict the co-occurrence of isocitrate dehydrogenase mutations and O 6 -methylguanine-DNA methyltransferase promoter methylation in patients with gliomas. J Magn Reson Imaging 2021; 54:197-205. [PMID: 33393131 DOI: 10.1002/jmri.27498] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 12/17/2020] [Accepted: 12/18/2020] [Indexed: 02/05/2023] Open
Abstract
Combining isocitrate dehydrogenase mutation (IDHmut) with O6 -methylguanine-DNA methyltransferase promoter methylation (MGMTmet) has been identified as a critical prognostic molecular marker for gliomas. The aim of this study was to determine the ability of glioma radiomics features from magnetic resonance imaging (MRI) to predict the co-occurrence of IDHmut and MGMTmet by applying the tree-based pipeline optimization tool (TPOT), an automated machine learning (autoML) approach. This was a retrospective study, in which 162 patients with gliomas were evaluated, including 58 patients with co-occurrence of IDHmut and MGMTmet and 104 patients with other status comprising: IDH wildtype and MGMT unmethylated (n = 67), IDH wildtype and MGMTmet (n = 36), and IDHmut and MGMT unmethylated (n = 1). Three-dimensional (3D) T1-weighted images, gadolinium-enhanced 3D T1-weighted images (Gd-3DT1WI), T2-weighted images, and fluid-attenuated inversion recovery (FLAIR) images acquired at 3.0 T were used. Radiomics features were extracted from FLAIR and Gd-3DT1WI images. The TPOT was employed to generate the best machine learning pipeline, which contains both feature selector and classifier, based on input feature sets. A 4-fold cross-validation was used to evaluate the performance of automatically generated models. For each iteration, the training set included 121 subjects, while the test set included 41 subjects. Student's t-test or a chi-square test was applied on different clinical characteristics between two groups. Sensitivity, specificity, accuracy, kappa score, and AUC were used to evaluate the performance of TPOT-generated models. Finally, we compared the above metrics of TPOT-generated models to identify the best-performing model. Patients' ages and grades between two groups were significantly different (p = 0.002 and p = 0.000, respectively). The 4-fold cross-validation showed that gradient boosting classifier trained on shape and textual features from the Laplacian-of-Gaussian-filtered Gd-3DT1 achieved the best performance (average sensitivity = 81.1%, average specificity = 94%, average accuracy = 89.4%, average kappa score = 0.76, average AUC = 0.951). Using autoML based on radiomics features from MRI, a high discriminatory accuracy was achieved for predicting co-occurrence of IDHmut and MGMTmet in gliomas. LEVEL OF EVIDENCE: 3 TECHNICAL EFFICACY STAGE: 3.
Collapse
Affiliation(s)
- Simin Zhang
- Huaxi MR Research Center (HMRRC), Functional and Molecular Imaging Key Laboratory of Sichuan Province, Department of Radiology, West China Hospital of Sichuan University, Chengdu, China.,Huaxi Glioma Center, West China Hospital of Sichuan University, Chengdu, China
| | - Huaiqiang Sun
- Huaxi MR Research Center (HMRRC), Functional and Molecular Imaging Key Laboratory of Sichuan Province, Department of Radiology, West China Hospital of Sichuan University, Chengdu, China
| | - Xiaorui Su
- Huaxi MR Research Center (HMRRC), Functional and Molecular Imaging Key Laboratory of Sichuan Province, Department of Radiology, West China Hospital of Sichuan University, Chengdu, China.,Huaxi Glioma Center, West China Hospital of Sichuan University, Chengdu, China
| | - Xibiao Yang
- Huaxi Glioma Center, West China Hospital of Sichuan University, Chengdu, China.,Department of Radiology, West China Hospital of Sichuan University, Chengdu, China
| | - Weina Wang
- Huaxi MR Research Center (HMRRC), Functional and Molecular Imaging Key Laboratory of Sichuan Province, Department of Radiology, West China Hospital of Sichuan University, Chengdu, China
| | - Xinyue Wan
- Huaxi MR Research Center (HMRRC), Functional and Molecular Imaging Key Laboratory of Sichuan Province, Department of Radiology, West China Hospital of Sichuan University, Chengdu, China
| | - Qiaoyue Tan
- Huaxi MR Research Center (HMRRC), Functional and Molecular Imaging Key Laboratory of Sichuan Province, Department of Radiology, West China Hospital of Sichuan University, Chengdu, China.,Division of Radiation Physics, State Key Laboratory of Biotherapy and Cancer Center, West China Hospital of Sichuan University, Chengdu, China
| | - Ni Chen
- Department of Pathology, West China Hospital of Sichuan University, Chengdu, China
| | - Qiang Yue
- Huaxi Glioma Center, West China Hospital of Sichuan University, Chengdu, China.,Department of Radiology, West China Hospital of Sichuan University, Chengdu, China
| | - Qiyong Gong
- Huaxi MR Research Center (HMRRC), Functional and Molecular Imaging Key Laboratory of Sichuan Province, Department of Radiology, West China Hospital of Sichuan University, Chengdu, China
| |
Collapse
|
34
|
Dafflon J, Pinaya WHL, Turkheimer F, Cole JH, Leech R, Harris MA, Cox SR, Whalley HC, McIntosh AM, Hellyer PJ. An automated machine learning approach to predict brain age from cortical anatomical measures. Hum Brain Mapp 2020; 41:3555-3566. [PMID: 32415917 PMCID: PMC7416036 DOI: 10.1002/hbm.25028] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 04/10/2020] [Accepted: 04/21/2020] [Indexed: 12/31/2022] Open
Abstract
The use of machine learning (ML) algorithms has significantly increased in neuroscience. However, from the vast extent of possible ML algorithms, which one is the optimal model to predict the target variable? What are the hyperparameters for such a model? Given the plethora of possible answers to these questions, in the last years, automated ML (autoML) has been gaining attention. Here, we apply an autoML library called Tree-based Pipeline Optimisation Tool (TPOT) which uses a tree-based representation of ML pipelines and conducts a genetic programming-based approach to find the model and its hyperparameters that more closely predicts the subject's true age. To explore autoML and evaluate its efficacy within neuroimaging data sets, we chose a problem that has been the focus of previous extensive study: brain age prediction. Without any prior knowledge, TPOT was able to scan through the model space and create pipelines that outperformed the state-of-the-art accuracy for Freesurfer-based models using only thickness and volume information for anatomical structure. In particular, we compared the performance of TPOT (mean absolute error [MAE]: 4.612 ± .124 years) and a relevance vector regression (MAE 5.474 ± .140 years). TPOT also suggested interesting combinations of models that do not match the current most used models for brain prediction but generalise well to unseen data. AutoML showed promising results as a data-driven approach to find optimal models for neuroimaging applications.
Collapse
Affiliation(s)
- Jessica Dafflon
- Department of NeuroimagingInstitute of Psychiatry, Psychology and Neuroscience, King's College LondonLondonUK
| | - Walter H. L. Pinaya
- Department of Psychosis StudiesInstitute of Psychiatry, Psychology and Neuroscience, King's College LondonLondonUK
- Center of Mathematics, Computation and CognitionUniversidade Federal do ABCSanto AndréBrazil
| | - Federico Turkheimer
- Department of NeuroimagingInstitute of Psychiatry, Psychology and Neuroscience, King's College LondonLondonUK
| | - James H. Cole
- Department of NeuroimagingInstitute of Psychiatry, Psychology and Neuroscience, King's College LondonLondonUK
| | - Robert Leech
- Department of NeuroimagingInstitute of Psychiatry, Psychology and Neuroscience, King's College LondonLondonUK
| | | | - Simon R. Cox
- Lothian Birth Cohorts group, Department of PsychologyUniversity of EdinburghEdinburghUK
- Scottish Imaging Network, A Platform for Scientific Excellence (SINAPSE) CollaborationEdinburghUK
| | | | | | - Peter J. Hellyer
- Department of NeuroimagingInstitute of Psychiatry, Psychology and Neuroscience, King's College LondonLondonUK
| |
Collapse
|
35
|
Olsavszky V, Dosius M, Vladescu C, Benecke J. Time Series Analysis and Forecasting with Automated Machine Learning on a National ICD-10 Database. Int J Environ Res Public Health 2020; 17:E4979. [PMID: 32664331 DOI: 10.3390/ijerph17144979] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 06/29/2020] [Accepted: 07/07/2020] [Indexed: 12/22/2022]
Abstract
The application of machine learning (ML) for use in generating insights and making predictions on new records continues to expand within the medical community. Despite this progress to date, the application of time series analysis has remained underexplored due to complexity of the underlying techniques. In this study, we have deployed a novel ML, called automated time series (AutoTS) machine learning, to automate data processing and the application of a multitude of models to assess which best forecasts future values. This rapid experimentation allows for and enables the selection of the most accurate model in order to perform time series predictions. By using the nation-wide ICD-10 (International Classification of Diseases, Tenth Revision) dataset of hospitalized patients of Romania, we have generated time series datasets over the period of 2008–2018 and performed highly accurate AutoTS predictions for the ten deadliest diseases. Forecast results for the years 2019 and 2020 were generated on a NUTS 2 (Nomenclature of Territorial Units for Statistics) regional level. This is the first study to our knowledge to perform time series forecasting of multiple diseases at a regional level using automated time series machine learning on a national ICD-10 dataset. The deployment of AutoTS technology can help decision makers in implementing targeted national health policies more efficiently.
Collapse
|
36
|
Sakagianni A, Feretzakis G, Kalles D, Koufopoulou C, Kaldis V. Setting up an Easy-to-Use Machine Learning Pipeline for Medical Decision Support: A Case Study for COVID-19 Diagnosis Based on Deep Learning with CT Scans. Stud Health Technol Inform 2020; 272:13-16. [PMID: 32604588 DOI: 10.3233/shti200481] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Coronavirus disease (COVID-19) constitutes an ongoing global health problem with significant morbidity and mortality. It usually presents characteristic findings on a chest CT scan, which may lead to early detection of the disease. A timely and accurate diagnosis of COVID-19 is the cornerstone for the prompt management of the patients. The aim of the present study was to evaluate the performance of an automated machine learning algorithm in the diagnosis of Covid-19 pneumonia using chest CT scans. Diagnostic performance was assessed by the area under the receiver operating characteristic curve (AUC), sensitivity, and positive predictive value. The method's average precision was 0.932. We suggest that auto-ML platforms help users with limited ML expertise train image recognition models by only uploading the examined dataset and performing some basic settings. Such methods could deliver significant potential benefits for patients in the future by allowing for earlier disease detection and care.
Collapse
Affiliation(s)
| | - Georgios Feretzakis
- School of Science and Technology, Hellenic Open University, Patras, Greece.,Sismanogleio General Hospital, Department of Quality Control, Research and Continuing Education, Marousi, Greece
| | - Dimitris Kalles
- School of Science and Technology, Hellenic Open University, Patras, Greece
| | | | - Vasileios Kaldis
- Sismanogleio General Hospital, Emergency Department, Marousi, Greece
| |
Collapse
|
37
|
Bhat GS, Shankar N, Panahi IMS. Automated machine learning based speech classification for hearing aid applications and its real-time implementation on smartphone. Annu Int Conf IEEE Eng Med Biol Soc 2020; 2020:956-959. [PMID: 33018143 PMCID: PMC7545263 DOI: 10.1109/embc44109.2020.9175693] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Deep neural networks (DNNs) have been useful in solving benchmark problems in various domains including audio. DNNs have been used to improve several speech processing algorithms that improve speech perception for hearing impaired listeners. To make use of DNNs to their full potential and to configure models easily, automated machine learning (AutoML) systems are developed, focusing on model optimization. As an application of AutoML to audio and hearing aids, this work presents an AutoML based voice activity detector (VAD) that is implemented on a smartphone as a real-time application. The developed VAD can be used to elevate the performance of speech processing applications like speech enhancement that are widely used in hearing aid devices. The classification model generated by AutoML is computationally fast and has minimal processing delay, which enables an efficient, real-time operation on a smartphone. The steps involved in real-time implementation are discussed in detail. The key contribution of this work include the utilization of AutoML platform for hearing aid applications and the realization of AutoML model on smartphone. The experimental analysis and results demonstrate the significance and importance of using the AutoML for the current approach. The evaluations also show improvements over the state of art techniques and reflect the practical usability of the developed smartphone app in different noisy environments.
Collapse
|
38
|
Montesanto A, D'Aquila P, Lagani V, Paparazzo E, Geracitano S, Formentini L, Giacconi R, Cardelli M, Provinciali M, Bellizzi D, Passarino G. A New Robust Epigenetic Model for Forensic Age Prediction. J Forensic Sci 2020; 65:1424-1431. [PMID: 32453457 DOI: 10.1111/1556-4029.14460] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 04/22/2020] [Accepted: 05/04/2020] [Indexed: 12/12/2022]
Abstract
Forensic DNA phenotyping refers to an emerging field of forensic sciences aimed at the prediction of externally visible characteristics of unknown sample donors directly from biological materials. The aging process significantly affects most of the above characteristics making the development of a reliable method of age prediction very important. Today, the so-called "epigenetic clocks" represent the most accurate models for age prediction. Since they are technically not achievable in a typical forensic laboratory, forensic DNA technology has triggered efforts toward the simplification of these models. The present study aimed to build an epigenetic clock using a set of methylation markers of five different genes in a sample of the Italian population of different ages covering the whole span of adult life. In a sample of 330 subjects, 42 selected markers were analyzed with a machine learning approach for building a prediction model for age prediction. A ridge linear regression model including eight of the proposed markers was identified as the best performing model across a plethora of candidates. This model was tested on an independent sample of 83 subjects providing a median error of 4.5 years. In the present study, an epigenetic model for age prediction was validated in a sample of the Italian population. However, its applicability to advanced ages still represents the main limitation in forensic caseworks.
Collapse
Affiliation(s)
- Alberto Montesanto
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende, 87036, Italy
| | - Patrizia D'Aquila
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende, 87036, Italy
| | - Vincenzo Lagani
- Gnosis Data Analysis PC, Heraklion, GR700-13, Greece.,Institute of Chemical Biology, Ilia State University, Tbilisi, 0162, Georgia
| | - Ersilia Paparazzo
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende, 87036, Italy
| | - Silvana Geracitano
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende, 87036, Italy
| | - Laura Formentini
- Advanced Technology Center for Aging Research, Scientific Technological Area, IRCCS INRCA, Ancona, Italy
| | - Robertina Giacconi
- Advanced Technology Center for Aging Research, Scientific Technological Area, IRCCS INRCA, Ancona, Italy
| | - Maurizio Cardelli
- Advanced Technology Center for Aging Research, Scientific Technological Area, IRCCS INRCA, Ancona, Italy
| | - Mauro Provinciali
- Advanced Technology Center for Aging Research, Scientific Technological Area, IRCCS INRCA, Ancona, Italy
| | - Dina Bellizzi
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende, 87036, Italy
| | - Giuseppe Passarino
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende, 87036, Italy
| |
Collapse
|
39
|
Lesot MJ, Vieira S, Reformat MZ, Carvalho JP, Wilbik A, Bouchon-Meunier B, Yager RR. General-Purpose Automated Machine Learning for Transportation: A Case Study of Auto-sklearn for Traffic Forecasting. Information Processing and Management of Uncertainty in Knowledge-Based Systems 2020. [PMCID: PMC7274664 DOI: 10.1007/978-3-030-50143-3_57] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Currently, there are no guidelines to determine what are the most suitable machine learning pipelines (i.e. the workflow from data preprocessing to model selection and validation) to approach Traffic Forecasting (TF) problems. Although automated machine learning (AutoML) has proved to be successful dealing with the model selection problem in other applications areas, only a few papers have explored the performance of general-purpose AutoML methods, purely based on optimisation, when tackling TF. In this paper, we provide a thorough exploration of the benefits of Auto-sklearn for TF, as a general-purpose AutoML method that follows a hybrid search strategy combining optimisation with meta-learning and ensemble learning. Particularly, we focus on how well Auto-sklearn is able to recommend competitive machine learning pipelines to forecast traffic, modelled as a TF multi-class imbalanced classification problem, along different time horizons at two spatial scales (point and road segment) and two environments (freeway and urban). Concretely, we test the following scenarios: I) a hybrid search strategy with the three components (optimisation, meta-learning, ensemble learning), II) a strategy based on meta-learning and ensemble learning, and III) a strategy based on the estimation of the best performing pipeline from those suggested by the meta-learning. Experimental results show that the meta-learning component of Auto-sklearn does not work properly on TF problems, and on the other hand, that the optimisation does not contribute too much to the final performance of predictions.
Collapse
Affiliation(s)
| | - Susana Vieira
- IDMEC, IST, Universidade de Lisboa, Lisbon, Portugal
| | | | | | - Anna Wilbik
- Eindhoven University of Technology, Eindhoven, The Netherlands
| | | | | |
Collapse
|
40
|
Puri M. Automated Machine Learning Diagnostic Support System as a Computational Biomarker for Detecting Drug-Induced Liver Injury Patterns in Whole Slide Liver Pathology Images. Assay Drug Dev Technol 2020; 18:1-10. [PMID: 31149832 PMCID: PMC6998050 DOI: 10.1089/adt.2019.919] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Drug-induced liver injury (DILI) is a challenging disease to diagnose, a leading cause of acute liver failure, and responsible for drug withdrawal from the market. There is no symptom, no biomarker or test for detection, no therapy, but discontinuation of the drug. Pharmaceutical companies spend huge money, time, and scientific research efforts to test DILI effects and drug efficacy. A preclinical diagnostic support system is designed and proposed for DILI detection and classification on liver biopsy histopathology images. Heterogeneity features and automated machine learning (AutoML) models were tested to classify DILI injury patterns on whole slide image. Fractal and lacunarity values were used to detect hepatocellular necrotic injury patterns caused on a rat liver (in vivo) by 10 drugs at four dose levels. Correlations between fractal and lacunarity values were statistically analyzed for the 10 drugs; the Pearson correlation (r = 0.9809), p-value (1.6612E-06), and R2 (0.9582) were found to be high in the case of carbon tetrachloride. The AutoML model was tested to understand the injury patterns on a subset of 1,277 histology images. The AutoML algorithm was able to classify necrotic injury patterns accurately with an average precision of 98.6% on a score threshold of 0.5.
Collapse
Affiliation(s)
- Munish Puri
- Laboratory of Cancer Biology and Genetics, National Cancer Institute, National Institute of Health, Bethesda, Maryland
| |
Collapse
|
41
|
Liu T, Nicholas J, Theilig MM, Guntuku SC, Kording K, Mohr DC, Ungar L. Machine Learning for Phone-Based Relationship Estimation: The Need to Consider Population Heterogeneity. Proc ACM Interact Mob Wearable Ubiquitous Technol 2019; 3:145. [PMID: 32490330 PMCID: PMC7265570 DOI: 10.1145/3369820] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Estimating the category and quality of interpersonal relationships from ubiquitous phone sensor data matters for studying mental well-being and social support. Prior work focused on using communication volume to estimate broad relationship categories, often with small samples. Here we contextualize communications by combining phone logs with demographic and location data to predict interpersonal relationship roles on a varied sample population using automated machine learning methods, producing better performance (F1 = 0.68) than using communication features alone (F1 = 0.62). We also explore the effect of age variation in the underlying training sample on interpersonal relationship prediction and find that models trained on younger subgroups, which is popular in the field via student participation and recruitment, generalize poorly to the wider population. Our results not only illustrate the value of using data across demographics, communication patterns and semantic locations for relationship prediction, but also underscore the importance of considering population heterogeneity in phone-based personal sensing studies.
Collapse
|
42
|
Adamou M, Antoniou G, Greasidou E, Lagani V, Charonyktakis P, Tsamardinos I, Doyle M. Toward Automatic Risk Assessment to Support Suicide Prevention. Crisis 2018; 40:249-256. [PMID: 30474411 DOI: 10.1027/0227-5910/a000561] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Background: Suicide has been considered an important public health issue for years and is one of the main causes of death worldwide. Despite prevention strategies being applied, the rate of suicide has not changed substantially over the past decades. Suicide risk has proven extremely difficult to assess for medical specialists, and traditional methodologies deployed have been ineffective. Advances in machine learning make it possible to attempt to predict suicide with the analysis of relevant data aiming to inform clinical practice. Aims: We aimed to (a) test our artificial intelligence based, referral-centric methodology in the context of the National Health Service (NHS), (b) determine whether statistically relevant results can be derived from data related to previous suicides, and (c) develop ideas for various exploitation strategies. Method: The analysis used data of patients who died by suicide in the period 2013-2016 including both structured data and free-text medical notes, necessitating the deployment of state-of-the-art machine learning and text mining methods. Limitations: Sample size is a limiting factor for this study, along with the absence of non-suicide cases. Specific analytical solutions were adopted for addressing both issues. Results and Conclusion: The results of this pilot study indicate that machine learning shows promise for predicting within a specified period which people are most at risk of taking their own life at the time of referral to a mental health service.
Collapse
Affiliation(s)
- Marios Adamou
- 1 South West Yorkshire Partnership NHS Foundation Trust, Wakefield, UK.,2 Department of Computer Science, University of Huddersfield, UK
| | | | | | - Vincenzo Lagani
- 3 Gnosis Data Analysis PC, Heraklion, Greece.,5 Institute of Chemical Biology, Ilia State University, Tbilisi, Georgia
| | | | - Ioannis Tsamardinos
- 2 Department of Computer Science, University of Huddersfield, UK.,3 Gnosis Data Analysis PC, Heraklion, Greece.,4 Computer Science Department, University of Crete, Heraklion, Greece
| | - Michael Doyle
- 1 South West Yorkshire Partnership NHS Foundation Trust, Wakefield, UK
| |
Collapse
|
43
|
Orlenko A, Moore JH, Orzechowski P, Olson RS, Cairns J, Caraballo PJ, Weinshilboum RM, Wang L, Breitenstein MK. Considerations for automated machine learning in clinical metabolic profiling: Altered homocysteine plasma concentration associated with metformin exposure. Pac Symp Biocomput 2018; 23:460-471. [PMID: 29218905 PMCID: PMC5882490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
With the maturation of metabolomics science and proliferation of biobanks, clinical metabolic profiling is an increasingly opportunistic frontier for advancing translational clinical research. Automated Machine Learning (AutoML) approaches provide exciting opportunity to guide feature selection in agnostic metabolic profiling endeavors, where potentially thousands of independent data points must be evaluated. In previous research, AutoML using high-dimensional data of varying types has been demonstrably robust, outperforming traditional approaches. However, considerations for application in clinical metabolic profiling remain to be evaluated. Particularly, regarding the robustness of AutoML to identify and adjust for common clinical confounders. In this study, we present a focused case study regarding AutoML considerations for using the Tree-Based Optimization Tool (TPOT) in metabolic profiling of exposure to metformin in a biobank cohort. First, we propose a tandem rank-accuracy measure to guide agnostic feature selection and corresponding threshold determination in clinical metabolic profiling endeavors. Second, while AutoML, using default parameters, demonstrated potential to lack sensitivity to low-effect confounding clinical covariates, we demonstrated residual training and adjustment of metabolite features as an easily applicable approach to ensure AutoML adjustment for potential confounding characteristics. Finally, we present increased homocysteine with long-term exposure to metformin as a potentially novel, non-replicated metabolite association suggested by TPOT; an association not identified in parallel clinical metabolic profiling endeavors. While warranting independent replication, our tandem rank-accuracy measure suggests homocysteine to be the metabolite feature with largest effect, and corresponding priority for further translational clinical research. Residual training and adjustment for a potential confounding effect by BMI only slightly modified the suggested association. Increased homocysteine is thought to be associated with vitamin B12 deficiency - evaluation for potential clinical relevance is suggested. While considerations for clinical metabolic profiling are recommended, including adjustment approaches for clinical confounders, AutoML presents an exciting tool to enhance clinical metabolic profiling and advance translational research endeavors.
Collapse
Affiliation(s)
- Alena Orlenko
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | | | | | | | | | | | | | | | | |
Collapse
|