1
|
Sánchez-Lite A, Fuentes-Bargues JL, Iglesias I, González-Gaya C. Proposal of a workplace classification model for heart attack accidents from the field of occupational safety and health engineering. Heliyon 2024; 10:e37647. [PMID: 39347428 PMCID: PMC11437862 DOI: 10.1016/j.heliyon.2024.e37647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 08/09/2024] [Accepted: 09/06/2024] [Indexed: 10/01/2024] Open
Abstract
Research on occupational accidents is a key factor in improving working conditions and sustainability. Fatal accidents incur significant human and economic costs. Therefore, it is essential to examine fatal accidents to identify the factors that contribute to their occurrence. This study presents an overview of fatal heart attack accidents at work in Spain over the period 2009-2021. Descriptive analysis was conducted considering 13 variables classified into five groups. These variables were selected as predictors to determine the occurrence of this type of accident using a machine learning technique. Thirteen Naïve Bayes prediction models were developed using an unbalanced dataset of 15,616 valid samples from the Spanish Delta@database, employing a two-stage algorithm. The final model was retained using a General Performance Score index. The model selected for this study used a 70:30 distribution for the training and test datasets. A sample was classified as a fatal heart attack if its posterior probability exceeded 0.25. This model is assumed to be a compromise between the confusion matrix values of each model. Sectors with the highest number of heart attacks are 'Health and social work', 'Transport and storage', 'Manufacturing', and 'Construction'. The incidence of heart attacks and fatal heart attack accidents is higher in men than in women and higher in private sector employees. The findings and model development may assist in the formulation of surveillance strategies and preventive measures to reduce the incidence of heart attacks in the workplace.
Collapse
Affiliation(s)
- Alberto Sánchez-Lite
- Department of Materials Science and Metallurgical Engineering, Graphic Expression in Engineering, Cartographic Engineering, Geodesy and Photogrammetry, Mechanical Engineering and Manufacturing Engineering, School of Industrial Engineering, Universidad de Valladolid, P° del Cauce 59, 47011, Valladolid, Spain
| | - Jose Luis Fuentes-Bargues
- Project Management, Innovation and Sustainability Research Center (PRINS), Universitat Politècnica de València, 46022, Valencia, Spain
| | - Iván Iglesias
- EEI, School of Industrial Engineering, Univeridad de Vigo, Rúa Maxwell, nº9. 36310, Vigo-Pontevedra, Spain
| | - Cristina González-Gaya
- Department of Construction and Manufacturing Engineering, Escuela Superior de Ingeniería Industrial Universidad Nacional de Educación a Distancia (UNED), C/ Juan del Rosal 12, 28040, Madrid, Spain
| |
Collapse
|
2
|
Nguyen DHD, Tan AJH, Lee R, Lim WF, Wong JY, Suhaimi F. Monitoring of plant diseases caused by Fusarium commune and Rhizoctonia solani in bok choy using hyperspectral remote sensing and machine learning. PEST MANAGEMENT SCIENCE 2024. [PMID: 39291711 DOI: 10.1002/ps.8414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 08/26/2024] [Accepted: 08/29/2024] [Indexed: 09/19/2024]
Abstract
BACKGROUND Local vegetable production is susceptible to various fungal pathogens, the most common and lethal of which are Fusarium commune and Rhizoctonia solani. Early detection of these pathogens is challenging, and by the time visual symptoms appear, the pathogens may have already spread extensively, causing massive damage to the production. In this study, we explored the use of hyperspectral data for early detection of diseases caused by F. commune or R. solani in bok choy Brassica rapa subsp. chinensis by collecting hyperspectral data from healthy plants and plants inoculated with either fungal pathogen in a controlled experimental set-up. RESULTS Based on the collected data, we employed various tree-based, distribution-based, geometric, neural networks and ensemble learning algorithms to train detection models. Among the trained models, Multi-Layer Perceptron (MLP) models performed the best with overall accuracy reaching 95.9 ± 0.26%. MLP models could differentiate between healthy and infected plants with 99% precision after 1 day of infection, and distinguish between different fungal pathogens with 99% precision after 2 days. During this period, no visible symptoms of fungal infection could be observed. Further analysis into trained MLP models and general reflectance profiles of plants also revealed a high correlation of the spectral regions 445-460, 560-595, 606-620 and 719-728 nm with fungal infection in bok choy plants. CONCLUSION Our findings highlight the potential of hyperspectral imaging as a highly precise early detection tool for fungal diseases in plants. © 2024 Society of Chemical Industry.
Collapse
Affiliation(s)
- Derrick Hoang Danh Nguyen
- Plant Science and Health Branch, Horticulture and Community Division, National Parks Board, Singapore
| | - Arinah Jing Hui Tan
- Plant Science and Health Branch, Horticulture and Community Division, National Parks Board, Singapore
| | - Ronjin Lee
- Plant Science and Health Branch, Horticulture and Community Division, National Parks Board, Singapore
| | - Wei Feng Lim
- Plant Science and Health Branch, Horticulture and Community Division, National Parks Board, Singapore
| | - Jia Yih Wong
- Plant Science and Health Branch, Horticulture and Community Division, National Parks Board, Singapore
| | | |
Collapse
|
3
|
Sivakumar M, Parthasarathy S, Padmapriya T. Trade-off between training and testing ratio in machine learning for medical image processing. PeerJ Comput Sci 2024; 10:e2245. [PMID: 39314694 PMCID: PMC11419616 DOI: 10.7717/peerj-cs.2245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 07/17/2024] [Indexed: 09/25/2024]
Abstract
Artificial intelligence (AI) and machine learning (ML) aim to mimic human intelligence and enhance decision making processes across various fields. A key performance determinant in a ML model is the ratio between the training and testing dataset. This research investigates the impact of varying train-test split ratios on machine learning model performance and generalization capabilities using the BraTS 2013 dataset. Logistic regression, random forest, k nearest neighbors, and support vector machines were trained with split ratios ranging from 60:40 to 95:05. Findings reveal significant variations in accuracies across these ratios, emphasizing the critical need to strike a balance to avoid overfitting or underfitting. The study underscores the importance of selecting an optimal train-test split ratio that considers tradeoffs such as model performance metrics, statistical measures, and resource constraints. Ultimately, these insights contribute to a deeper understanding of how ratio selection impacts the effectiveness and reliability of machine learning applications across diverse fields.
Collapse
Affiliation(s)
- Muthuramalingam Sivakumar
- Department of Computer Science and Engineering, Thiagarajar College of Engineering, Madurai, TamilNadu, India
| | - Sudhaman Parthasarathy
- Department of Applied Mathematics and Computational Science, Thiagarajar College of Engineering, Madurai, TamilNadu, India
| | - Thiyagarajan Padmapriya
- Department of Applied Mathematics and Computational Science, Thiagarajar College of Engineering, Madurai, TamilNadu, India
| |
Collapse
|
4
|
Alturki S, Almoaiqel S. Towards an automated classification phase in the software maintenance process using decision tree. PeerJ Comput Sci 2024; 10:e2228. [PMID: 39314738 PMCID: PMC11419633 DOI: 10.7717/peerj-cs.2228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 07/10/2024] [Indexed: 09/25/2024]
Abstract
The software maintenance process is costly, accounting for up to 70% of the total cost in the software development life cycle (SDLC). The difficulty of maintaining software increases with its size and complexity, requiring significant time and effort. One way to alleviate these costs is to automate parts of the maintenance process. This research focuses on the automation of the classification phase using decision trees (DT) to sort, rank, and accept/reject maintenance requests (MRs) for mobile applications. Our dataset consisted of 1,656 MRs. We found that DTs could automate sorting and accepting/rejecting MRs with accuracies of 71.08% and 64.15%, respectively, though ranking accuracy was lower at 50%. While DTs can reduce costs, effort, and time, human verification is still necessary.
Collapse
Affiliation(s)
- Sahar Alturki
- Department of Software Engineering, King Saud University, Riyadh, Saudi Arabia
| | - Sarah Almoaiqel
- Department of Software Engineering, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
5
|
Bhavna K, Akhter A, Banerjee R, Roy D. Explainable deep-learning framework: decoding brain states and prediction of individual performance in false-belief task at early childhood stage. Front Neuroinform 2024; 18:1392661. [PMID: 39006894 PMCID: PMC11239353 DOI: 10.3389/fninf.2024.1392661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 06/14/2024] [Indexed: 07/16/2024] Open
Abstract
Decoding of cognitive states aims to identify individuals' brain states and brain fingerprints to predict behavior. Deep learning provides an important platform for analyzing brain signals at different developmental stages to understand brain dynamics. Due to their internal architecture and feature extraction techniques, existing machine-learning and deep-learning approaches are suffering from low classification performance and explainability issues that must be improved. In the current study, we hypothesized that even at the early childhood stage (as early as 3-years), connectivity between brain regions could decode brain states and predict behavioral performance in false-belief tasks. To this end, we proposed an explainable deep learning framework to decode brain states (Theory of Mind and Pain states) and predict individual performance on ToM-related false-belief tasks in a developmental dataset. We proposed an explainable spatiotemporal connectivity-based Graph Convolutional Neural Network (Ex-stGCNN) model for decoding brain states. Here, we consider a developmental dataset, N = 155 (122 children; 3-12 yrs and 33 adults; 18-39 yrs), in which participants watched a short, soundless animated movie, shown to activate Theory-of-Mind (ToM) and pain networs. After scanning, the participants underwent a ToM-related false-belief task, leading to categorization into the pass, fail, and inconsistent groups based on performance. We trained our proposed model using Functional Connectivity (FC) and Inter-Subject Functional Correlations (ISFC) matrices separately. We observed that the stimulus-driven feature set (ISFC) could capture ToM and Pain brain states more accurately with an average accuracy of 94%, whereas it achieved 85% accuracy using FC matrices. We also validated our results using five-fold cross-validation and achieved an average accuracy of 92%. Besides this study, we applied the SHapley Additive exPlanations (SHAP) approach to identify brain fingerprints that contributed the most to predictions. We hypothesized that ToM network brain connectivity could predict individual performance on false-belief tasks. We proposed an Explainable Convolutional Variational Auto-Encoder (Ex-Convolutional VAE) model to predict individual performance on false-belief tasks and trained the model using FC and ISFC matrices separately. ISFC matrices again outperformed the FC matrices in prediction of individual performance. We achieved 93.5% accuracy with an F1-score of 0.94 using ISFC matrices and achieved 90% accuracy with an F1-score of 0.91 using FC matrices.
Collapse
Affiliation(s)
- Km Bhavna
- Department of Computer Science and Engineering, IIT Jodhpur, Karwar, Rajasthan, India
| | - Azman Akhter
- Cognitive Brain Dynamics Lab, National Brain Research Centre, Manesar, Gurugram, India
| | - Romi Banerjee
- Department of Computer Science and Engineering, IIT Jodhpur, Karwar, Rajasthan, India
| | - Dipanjan Roy
- Cognitive Brain Dynamics Lab, National Brain Research Centre, Manesar, Gurugram, India
- School of AIDE, Center for Brain Science and Applications, Indian Institute of Technology (IIT), Jodhpur, India
| |
Collapse
|
6
|
Şafak E, Barışçı N. Detection of fake face images using lightweight convolutional neural networks with stacking ensemble learning method. PeerJ Comput Sci 2024; 10:e2103. [PMID: 38983199 PMCID: PMC11232570 DOI: 10.7717/peerj-cs.2103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 05/15/2024] [Indexed: 07/11/2024]
Abstract
Images and videos containing fake faces are the most common type of digital manipulation. Such content can lead to negative consequences by spreading false information. The use of machine learning algorithms to produce fake face images has made it challenging to distinguish between genuine and fake content. Face manipulations are categorized into four basic groups: entire face synthesis, face identity manipulation (deepfake), facial attribute manipulation and facial expression manipulation. The study utilized lightweight convolutional neural networks to detect fake face images generated by using entire face synthesis and generative adversarial networks. The dataset used in the training process includes 70,000 real images in the FFHQ dataset and 70,000 fake images produced with StyleGAN2 using the FFHQ dataset. 80% of the dataset was used for training and 20% for testing. Initially, the MobileNet, MobileNetV2, EfficientNetB0, and NASNetMobile convolutional neural networks were trained separately for the training process. In the training, the models were pre-trained on ImageNet and reused with transfer learning. As a result of the first trainings EfficientNetB0 algorithm reached the highest accuracy of 93.64%. The EfficientNetB0 algorithm was revised to increase its accuracy rate by adding two dense layers (256 neurons) with ReLU activation, two dropout layers, one flattening layer, one dense layer (128 neurons) with ReLU activation function, and a softmax activation function used for the classification dense layer with two nodes. As a result of this process accuracy rate of 95.48% was achieved with EfficientNetB0 algorithm. Finally, the model that achieved 95.48% accuracy was used to train MobileNet and MobileNetV2 models together using the stacking ensemble learning method, resulting in the highest accuracy rate of 96.44%.
Collapse
Affiliation(s)
- Emre Şafak
- R&D Technology and Innovation Department, HAVELSAN, Ankara, Türkiye
- Department of Computer Engineering, Gazi University Ankara, Ankara, Türkiye
| | - Necaattin Barışçı
- Department of Computer Engineering, Gazi University Ankara, Ankara, Türkiye
| |
Collapse
|
7
|
Silva Santana L, Borges Camargo Diniz J, Mothé Glioche Gasparri L, Buccaran Canto A, Batista Dos Reis S, Santana Neville Ribeiro I, Gadelha Figueiredo E, Paulo Mota Telles J. Application of Machine Learning for Classification of Brain Tumors: A Systematic Review and Meta-Analysis. World Neurosurg 2024; 186:204-218.e2. [PMID: 38580093 DOI: 10.1016/j.wneu.2024.03.152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/07/2024]
Abstract
BACKGROUND Classifying brain tumors accurately is crucial for treatment and prognosis. Machine learning (ML) shows great promise in improving tumor classification accuracy. This study evaluates ML algorithms for differentiating various brain tumor types. METHODS A systematic review and meta-analysis were conducted, searching PubMed, Embase, and Web of Science up to March 14, 2023. Studies that only investigated image segmentation accuracy or brain tumor detection instead of classification were excluded. We extracted binary diagnostic accuracy data, constructing contingency tables to derive sensitivity and specificity. RESULTS Fifty-one studies were included. The pooled area under the curve for glioblastoma versus lymphoma and low-grade versus high-grade gliomas were 0.99 (95% confidence interval [CI]: 0.98-1.00) and 0.89, respectively. The pooled sensitivity and specificity for benign versus malignant tumors were 0.90 (95% CI: 0.85-0.93) and 0.93 (95% CI: 0.90-0.95), respectively. The pooled sensitivity and specificity for low-grade versus high-grade gliomas were 0.99 (95% CI: 0.97-1.00) and 0.94, (95% CI: 0.79-0.99), respectively. Primary versus metastatic tumor identification yields sensitivity and specificity of 0.89, (95% CI: 0.83-0.93) and 0.87 (95% CI: 0.82-0.91), correspondingly. The differentiation of gliomas from pituitary tumors yielded the highest results among primary brain tumor classifications: sensitivity of 0.99 (95% CI: 0.99-1.00) and specificity of 0.99 (95% CI: 0.98-1.00). CONCLUSIONS ML demonstrated excellent performance in classifying brain tumor images, with near-maximum area under the curves, sensitivity, and specificity.
Collapse
Affiliation(s)
| | | | | | | | | | - Iuri Santana Neville Ribeiro
- Department of Neurology, Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo, São Paulo, Brazil
| | - Eberval Gadelha Figueiredo
- Department of Neurology, Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo, São Paulo, Brazil
| | - João Paulo Mota Telles
- Department of Neurology, Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo, São Paulo, Brazil.
| |
Collapse
|
8
|
Zhang X, Lian J, Yu Z, Tang H, Liang D, Liu J, Liu JK. Revealing the mechanisms of semantic satiation with deep learning models. Commun Biol 2024; 7:487. [PMID: 38649503 PMCID: PMC11035687 DOI: 10.1038/s42003-024-06162-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 04/08/2024] [Indexed: 04/25/2024] Open
Abstract
The phenomenon of semantic satiation, which refers to the loss of meaning of a word or phrase after being repeated many times, is a well-known psychological phenomenon. However, the microscopic neural computational principles responsible for these mechanisms remain unknown. In this study, we use a deep learning model of continuous coupled neural networks to investigate the mechanism underlying semantic satiation and precisely describe this process with neuronal components. Our results suggest that, from a mesoscopic perspective, semantic satiation may be a bottom-up process. Unlike existing macroscopic psychological studies that suggest that semantic satiation is a top-down process, our simulations use a similar experimental paradigm as classical psychology experiments and observe similar results. Satiation of semantic objectives, similar to the learning process of our network model used for object recognition, relies on continuous learning and switching between objects. The underlying neural coupling strengthens or weakens satiation. Taken together, both neural and network mechanisms play a role in controlling semantic satiation.
Collapse
Affiliation(s)
- Xinyu Zhang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, Gansu, China
| | - Jing Lian
- School of Electronics and Information Engineering, Lanzhou Jiaotong University, Lanzhou, 730070, Gansu, China
| | - Zhaofei Yu
- School of Computer Science, Peking University, Beijing, 100871, Beijing, China
- Institute for Artificial Intelligence, Peking University, Beijing, 100871, Beijing, China
| | - Huajin Tang
- The State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, 310027, Zhejiang, China
- The MOE Frontier Science Center for Brain Science and Brain-Machine Integration, Zhejiang University, Hangzhou, 310027, Zhejiang, China
| | - Dong Liang
- Department of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, Jiangsu, China
| | - Jizhao Liu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, Gansu, China.
| | - Jian K Liu
- School of Computer Science, Centre for Human Brain Health, University of Birmingham, Birmingham, B15 2TT, UK.
| |
Collapse
|
9
|
Blair JD, Gaynor KM, Palmer MS, Marshall KE. A gentle introduction to computer vision-based specimen classification in ecological datasets. J Anim Ecol 2024; 93:147-158. [PMID: 38230868 DOI: 10.1111/1365-2656.14042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 11/21/2023] [Indexed: 01/18/2024]
Abstract
Classifying specimens is a critical component of ecological research, biodiversity monitoring and conservation. However, manual classification can be prohibitively time-consuming and expensive, limiting how much data a project can afford to process. Computer vision, a form of machine learning, can help overcome these problems by rapidly, automatically and accurately classifying images of specimens. Given the diversity of animal species and contexts in which images are captured, there is no universal classifier for all species and use cases. As such, ecologists often need to train their own models. While numerous software programs exist to support this process, ecologists need a fundamental understanding of how computer vision works to select appropriate model workflows based on their specific use case, data types, computing resources and desired performance capabilities. Ecologists may also face characteristic quirks of ecological datasets, such as long-tail distributions, 'unknown' species, similarity between species and polymorphism within species, which impact the efficacy of computer vision. Despite growing interest in computer vision for ecology, there are few resources available to help ecologists face the challenges they are likely to encounter. Here, we present a gentle introduction for species classification using computer vision. In this manuscript and associated GitHub repository, we demonstrate how to prepare training data, basic model training procedures, and methods for model evaluation and selection. Throughout, we explore specific considerations ecologists should make when training classification models, such as data domains, feature extractors and class imbalances. With these basics, ecologists can adjust their workflows to achieve research goals and/or account for uncertainty in downstream analysis. Our goal is to provide guidance for ecologists for getting started in or improving their use of machine learning for visual classification tasks.
Collapse
Affiliation(s)
- Jarrett D Blair
- Department of Zoology, University of British Columbia, Vancouver, British Columbia, Canada
| | - Kaitlyn M Gaynor
- Department of Zoology, University of British Columbia, Vancouver, British Columbia, Canada
- Department of Botany, University of British Columbia, Vancouver, British Columbia, Canada
| | - Meredith S Palmer
- Department of Ecology & Evolutionary Biology, Princeton University, Princeton, New Jersey, USA
| | - Katie E Marshall
- Department of Zoology, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
10
|
Rodríguez Núñez M, Tavera Busso I, Carreras HA. Quantifying the contribution of environmental variables to cyclists' exposure to PM 2.5 using machine learning techniques. Heliyon 2024; 10:e24724. [PMID: 38298733 PMCID: PMC10828810 DOI: 10.1016/j.heliyon.2024.e24724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 12/17/2023] [Accepted: 01/12/2024] [Indexed: 02/02/2024] Open
Abstract
Cyclists are particularly vulnerable to travel-related exposure to air pollution. Understanding the factors that increase exposure is crucial for promoting healthier urban environments. Machine learning models have successfully predicted air pollutant concentrations, but they tend to be less interpretable than classical statistical ones, such as linear models. This study aimed to develop a predictive model to assess cyclists' exposure to fine particulate matter (PM2.5) in urban environments. The model was generated using geo-temporally referenced data and machine learning techniques. We explored several models and found that the gradient boosting machine learning model best fitted the PM2.5 predictions, with a minimum root mean square error value of 5.62 μg m-3. The variables with greatest influence on cyclist exposure were the temporal ones (month, day of the week, and time of the day), followed by meteorological variables, such as temperature, relative humidity, wind speed, wind direction, and atmospheric pressure. Additionally, we considered relevant attributes, which are partially linked to spatial characteristics. These attributes encompass street typology, vegetation density, and the flow of vehicles on a particular street, which quantifies the number of vehicles passing a given point per minute. Mean PM2.5 concentration was lower in bicycle paths away from vehicular traffic than in bike lanes along streets. These outcomes underscore the need to thoughtfully design public transportation routes, including bus routes, concerning the network of bicycle pathways. Such strategic planning attempts to improve the air quality in urban landscapes.
Collapse
Affiliation(s)
- Martín Rodríguez Núñez
- Instituto Multidisciplinario de Biología Vegetal (IMBIV), Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina
- Departamento de Química, Físicas y Naturales, Universidad Nacional de Córdoba, Córdoba, Argentina
| | - Iván Tavera Busso
- Instituto Multidisciplinario de Biología Vegetal (IMBIV), Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina
- Departamento de Química, Físicas y Naturales, Universidad Nacional de Córdoba, Córdoba, Argentina
| | - Hebe Alejandra Carreras
- Instituto Multidisciplinario de Biología Vegetal (IMBIV), Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina
- Departamento de Química, Físicas y Naturales, Universidad Nacional de Córdoba, Córdoba, Argentina
| |
Collapse
|
11
|
Ayyildiz N, Iskenderoglu O. How effective is machine learning in stock market predictions? Heliyon 2024; 10:e24123. [PMID: 38293519 PMCID: PMC10826674 DOI: 10.1016/j.heliyon.2024.e24123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 12/16/2023] [Accepted: 01/03/2024] [Indexed: 02/01/2024] Open
Abstract
In this study, it is aimed to compare the performances of the algorithms by predicting the movement directions of stock market indexes in developed countries by employing machine learning algorithms (MLMs) and determining the best estimation algorithm. For this purpose, the movement directions of indexes such as the NYSE 100 (the USA), NIKKEI 225 (Japan), FTSE 100 (the UK), CAC 40 (France), DAX 30 (Germany), FTSE MIB (Italy), and TSX (Canada) were estimated by employing the decision tree, random forest k-nearest neighbor, naive Bayes, logistic regression, support vector machines and artificial neural network algorithms. According to the results obtained, artificial neural networks were found to be the best algorithm for NYSE 100, FTSE 100, DAX 30 and FTSE MIB indices, while logistic regression was determined to be the best algorithm for the NIKKEI 225, CAC 40, and TSX indices. The artificial neural networks, which exhibited the highest average prediction performance, have been determined as the best prediction algorithm for the stock market indices of developed countries. It was also noted that artificial neural networks, logistic regression, and support vector machines algorithms were capable of predicting the directional movements of all indices with an accuracy rate of over 70 %.
Collapse
Affiliation(s)
- Nazif Ayyildiz
- Harran University, Suruc Vocational School, 63800, Sanliurfa, Turkey
| | - Omer Iskenderoglu
- Omer Halisdemir University, Faculty of Economics and Administrative Sciences, Department of Business, 51240, Nigde, Turkey
| |
Collapse
|
12
|
Neal WM, Pandey P, Khan SI, Khan IA, Chittiboyina AG. Machine learning and traditional QSAR modeling methods: a case study of known PXR activators. J Biomol Struct Dyn 2024; 42:903-917. [PMID: 37059719 DOI: 10.1080/07391102.2023.2196701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 03/22/2023] [Indexed: 04/16/2023]
Abstract
Pregnane X receptor (PXR), extensively expressed in human tissues related to digestion and metabolism, is responsible for recognizing and detoxifying diverse xenobiotics encountered by humans. To comprehend the promiscuous nature of PXR and its ability to bind a variety of ligands, computational approaches, viz., quantitative structure-activity relationship (QSAR) models, aid in the rapid dereplication of potential toxicological agents and mitigate the number of animals used to establish a meaningful regulatory decision. Recent advancements in machine learning techniques accommodating larger datasets are expected to aid in developing effective predictive models for complex mixtures (viz., dietary supplements) before undertaking in-depth experiments. Five hundred structurally diverse PXR ligands were used to develop traditional two-dimensional (2D) QSAR, machine-learning-based 2D-QSAR, field-based three-dimensional (3D) QSAR, and machine-learning-based 3D-QSAR models to establish the utility of predictive machine learning methods. Additionally, the applicability domain of the agonists was established to ensure the generation of robust QSAR models. A prediction set of dietary PXR agonists was used to externally-validate generated QSAR models. QSAR data analysis revealed that machine-learning 3D-QSAR techniques were more accurate in predicting the activity of external terpenes with an external validation squared correlation coefficient (R2) of 0.70 versus an R2 of 0.52 in machine-learning 2D-QSAR. Additionally, a visual summary of the binding pocket of PXR was assembled from the field 3D-QSAR models. By developing multiple QSAR models in this study, a robust groundwork for assessing PXR agonism from various chemical backbones has been established in anticipation of the identification of potential causative agents in complex mixtures.
Collapse
Affiliation(s)
- William M Neal
- Division of Pharmacognosy, Department of BioMolecular Sciences, School of Pharmacy, The University of Mississippi, University, MS, USA
| | - Pankaj Pandey
- National Center for Natural Products Research, Research Institute of Pharmaceutical Sciences, School of Pharmacy, The University of Mississippi, University, MS, USA
| | - Shabana I Khan
- Division of Pharmacognosy, Department of BioMolecular Sciences, School of Pharmacy, The University of Mississippi, University, MS, USA
- National Center for Natural Products Research, Research Institute of Pharmaceutical Sciences, School of Pharmacy, The University of Mississippi, University, MS, USA
| | - Ikhlas A Khan
- Division of Pharmacognosy, Department of BioMolecular Sciences, School of Pharmacy, The University of Mississippi, University, MS, USA
- National Center for Natural Products Research, Research Institute of Pharmaceutical Sciences, School of Pharmacy, The University of Mississippi, University, MS, USA
| | - Amar G Chittiboyina
- National Center for Natural Products Research, Research Institute of Pharmaceutical Sciences, School of Pharmacy, The University of Mississippi, University, MS, USA
| |
Collapse
|
13
|
Till T, Tschauner S, Singer G, Lichtenegger K, Till H. Development and optimization of AI algorithms for wrist fracture detection in children using a freely available dataset. Front Pediatr 2023; 11:1291804. [PMID: 38188914 PMCID: PMC10768054 DOI: 10.3389/fped.2023.1291804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Accepted: 12/05/2023] [Indexed: 01/09/2024] Open
Abstract
Introduction In the field of pediatric trauma computer-aided detection (CADe) and computer-aided diagnosis (CADx) systems have emerged offering a promising avenue for improved patient care. Especially children with wrist fractures may benefit from machine learning (ML) solutions, since some of these lesions may be overlooked on conventional X-ray due to minimal compression without dislocation or mistaken for cartilaginous growth plates. In this article, we describe the development and optimization of AI algorithms for wrist fracture detection in children. Methods A team of IT-specialists, pediatric radiologists and pediatric surgeons used the freely available GRAZPEDWRI-DX dataset containing annotated pediatric trauma wrist radiographs of 6,091 patients, a total number of 10,643 studies (20,327 images). First, a basic object detection model, a You Only Look Once object detector of the seventh generation (YOLOv7) was trained and tested on these data. Then, team decisions were taken to adjust data preparation, image sizes used for training and testing, and configuration of the detection model. Furthermore, we investigated each of these models using an Explainable Artificial Intelligence (XAI) method called Gradient Class Activation Mapping (Grad-CAM). This method visualizes where a model directs its attention to before classifying and regressing a certain class through saliency maps. Results Mean average precision (mAP) improved when applying optimizations pre-processing the dataset images (maximum increases of + 25.51% mAP@0.5 and + 39.78% mAP@[0.5:0.95]), as well as the object detection model itself (maximum increases of + 13.36% mAP@0.5 and + 27.01% mAP@[0.5:0.95]). Generally, when analyzing the resulting models using XAI methods, higher scoring model variations in terms of mAP paid more attention to broader regions of the image, prioritizing detection accuracy over precision compared to the less accurate models. Discussion This paper supports the implementation of ML solutions for pediatric trauma care. Optimization of a large X-ray dataset and the YOLOv7 model improve the model's ability to detect objects and provide valid diagnostic support to health care specialists. Such optimization protocols must be understood and advocated, before comparing ML performances against health care specialists.
Collapse
Affiliation(s)
- Tristan Till
- Department of Applied Computer Sciences, FH JOANNEUM - University of Applied Sciences, Graz, Austria
- Division of Pediatric Radiology, Department of Radiology, Medical University of Graz, Graz, Austria
| | - Sebastian Tschauner
- Division of Pediatric Radiology, Department of Radiology, Medical University of Graz, Graz, Austria
| | - Georg Singer
- Department of Pediatric and Adolescent Surgery, Medical University of Graz, Graz, Austria
| | - Klaus Lichtenegger
- Department of Applied Computer Sciences, FH JOANNEUM - University of Applied Sciences, Graz, Austria
| | - Holger Till
- Department of Pediatric and Adolescent Surgery, Medical University of Graz, Graz, Austria
| |
Collapse
|
14
|
Guo M, Lin Y, Shyu RJ, Huang J. Characterizing environmental pollution with civil complaints and social media data: A case of the Greater Taipei Area. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 348:119310. [PMID: 37925979 DOI: 10.1016/j.jenvman.2023.119310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/09/2023] [Accepted: 10/09/2023] [Indexed: 11/07/2023]
Abstract
Environmental pollution is a major cause of nuisance and ill health among urban residents. Complaints are traditionally self-reported through phone-based systems. Social media provide novel channels to detect pollution-related incidents; however, their reliability has not been sufficiently evaluated. This study aimed to compare pollution incidents expressed on Twitter with those extracted from phone-based systems and to identify the built environment and socioeconomic attributes that can predict the likelihood of pollution incidents. A total of 639,746 tweets were retrieved from the Greater Taipei Area in 2017 and 110,716 self-reported pollution incidents were extracted from the Public Nuisance Petition system during the same period. The results suggest that complaints collected from phone-based systems and Twitter were found to have correlated with each other spatially, albeit they differ in temporal profiles and by the proportion of pollution categories. Catering businesses and the entertainment activities they attract appear to be the main sources of pollution complaints and can be precisely captured by geotagged tweets. This can serve as a strong predictor for pollution incidents, more than traditional indicators such as population density or industrial activities, as suggested by earlier studies. Social media analytics, with their ability to monitor and analyze online discussions in a timely manner, can be a valuable supplement to existing phone-based pollution monitoring procedures. The methodologies developed in this study have the potential to support the proactive management of urban environmental pollution, in which resources can be prioritized in key areas to further enhance the quality of urban services.
Collapse
Affiliation(s)
- Mengdi Guo
- School of Architecture, Tianjin University, No. 92 Weijin Road, Nankai District, Tianjin, China
| | - Yu Lin
- School of Data Science, City University of Hong Kong, 16-201, 16/F, Lau Ming Wai Academic Building, 83 Tat Chee Avenue, Hong Kong SAR, China
| | - Rong-Juin Shyu
- Department of System Engineering and Naval Architecture, National Taiwan Ocean University, Keelung, Taiwan
| | - Jianxiang Huang
- Department of Urban Planning and Design, Faculty of Architecture, the University of Hong Kong, Pok Fu Lam Rd., Hong Kong SAR China; The University of Hong Kong Shenzhen Institute of Research and Innovation, 5/F, Key Laboratory Platform Building, Shenzhen Virtual University Park, No.6, Yuexing 2nd Rd, Nanshan, Shenzhen, China; Urban Systems Institute, The University of Hong Kong, Pokfulam Road, Hong Kong Special Administrative Region of China, China.
| |
Collapse
|
15
|
Baran K, Kloskowski A. Graph Neural Networks and Structural Information on Ionic Liquids: A Cheminformatics Study on Molecular Physicochemical Property Prediction. J Phys Chem B 2023; 127:10542-10555. [PMID: 38015981 PMCID: PMC10726349 DOI: 10.1021/acs.jpcb.3c05521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/01/2023] [Accepted: 11/16/2023] [Indexed: 11/30/2023]
Abstract
Ionic liquids (ILs) provide a promising solution in many industrial applications, such as solvents, absorbents, electrolytes, catalysts, lubricants, and many others. However, due to the enormous variety of their structures, uncovering or designing those with optimal attributes requires expensive and exhaustive simulations and experiments. For these reasons, searching for an efficient theoretical tool for finding the relationship between the IL structure and properties has been the subject of many research studies. Recently, special attention has been paid to machine learning tools, especially multilayer perceptron and convolutional neural networks, among many other algorithms in the field of artificial neural networks. For the latter, graph neural networks (GNNs) seem to be a powerful cheminformatic tool yet not well enough studied for dual molecular systems such as ILs. In this work, the usage of GNNs in structure-property studies is critically evaluated for predicting the density, viscosity, and surface tension of ILs. The problem of data availability and integrity is discussed to show how well GNNs deal with mislabeled chemical data. Providing more training data is proven to be more important than ensuring that they are immaculate. Great attention is paid to how GNNs process different ions to give graph transformations and electrostatic information. Clues on how GNNs should be applied to predict the properties of ILs are provided. Differences, especially regarding handling mislabeled data, favoring the use of GNNs over classical quantitative structure-property models are discussed.
Collapse
Affiliation(s)
- Karol Baran
- Department of Physical Chemistry,
Faculty of Chemistry, Gdansk University
of Technology, Narutowicza Street 11/12, 80-233 Gdansk, Poland
| | - Adam Kloskowski
- Department of Physical Chemistry,
Faculty of Chemistry, Gdansk University
of Technology, Narutowicza Street 11/12, 80-233 Gdansk, Poland
| |
Collapse
|
16
|
Karaduman G, Kelleci Çelik F. 2D-Quantitative structure-activity relationship modeling for risk assessment of pharmacotherapy applied during pregnancy. J Appl Toxicol 2023; 43:1436-1446. [PMID: 37082782 DOI: 10.1002/jat.4475] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Revised: 04/03/2023] [Accepted: 04/17/2023] [Indexed: 04/22/2023]
Abstract
The risk evaluation for pharmacological therapy during pregnancy is critical for maternal and fetal health. The initial risk assessment stage, the risk measurement, begins with pregnancy-labeling categories (A, B, C, D, and X) for pharmaceuticals defined by the US Food and Drug Administration (FDA). Recently, in silico methods have been preferred in toxicology studies to eliminate ethical issues before conducting clinical toxicology studies and animal experiments. Quantitative structure-activity relationship (QSAR) modeling is one of the in silico methodologies. The research focuses on creating a QSAR model that predicts the five FDA pregnancy categories of medications. Our dataset included 868 pharmaceuticals, containing nearly every pharmacological group collected from the FDA. 2D-molecular descriptors were calculated using PaDEL software. Twenty-four QSAR models were developed, and the best four models were discussed. The results of the models were compared according to sensitivity, accuracy, F-score, specificity, receiver operating characteristic (ROC) values, and Matthews correlation coefficient. Considering the statistical results, random forest is the best model for determining the pregnancy risk category of drugs. The accuracy of the model was 76.49% for internal and 93.58% for external validation. According to the kappa statistics, there is an average agreement of 0.583 for internal validation and a perfect agreement of 0.893 for external validation. Because the error rates of the model are very close to 0, the model is highly accurate. Consequently, our novel QSAR model gives guidance on the safe use of pharmaceuticals during pregnancy without requiring animal tests or clinical trials on pregnant women.
Collapse
Affiliation(s)
- Gul Karaduman
- Vocational School of Health Services, Karamanoğlu Mehmetbey University, Karaman, 70200, Turkey
- Department of Mathematics, University of Texas at Arlington, Arlington, Texas, 76019-0408, USA
| | - Feyza Kelleci Çelik
- Vocational School of Health Services, Karamanoğlu Mehmetbey University, Karaman, 70200, Turkey
| |
Collapse
|
17
|
Li T, Liu Z, Thakkar S, Roberts R, Tong W. DeepAmes: A deep learning-powered Ames test predictive model with potential for regulatory application. Regul Toxicol Pharmacol 2023; 144:105486. [PMID: 37633327 DOI: 10.1016/j.yrtph.2023.105486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 07/14/2023] [Accepted: 08/23/2023] [Indexed: 08/28/2023]
Abstract
The Ames assay is required by the regulatory agencies worldwide to assess the mutagenic potential risk of consumer products. As well as this in vitro assay, in silico approaches have been widely used to predict Ames test results as outlined in the International Council for Harmonization (ICH) guidelines. Building on this in silico approach, here we describe DeepAmes, a high performance and robust model developed with a novel deep learning (DL) approach for potential utility in regulatory science. DeepAmes was developed with a large and consistent Ames dataset (>10,000 compounds) and was compared with other five standard Machine Learning (ML) methods. Using a test set of 1,543 compounds, DeepAmes was the best performer in predicting the outcome of Ames assay. In addition, DeepAmes yielded the best and most stable performance up to when compounds were >30% outside of the applicability domain (AD). Regarding the potential for regulatory application, a revised version of DeepAmes with a much-improved sensitivity of 0.87 from 0.47. In conclusion, DeepAmes provides a DL-powered Ames test predictive model for predicting the results of Ames tests; with its defined AD and clear context of use, DeepAmes has potential for utility in regulatory application.
Collapse
Affiliation(s)
- Ting Li
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, USA
| | - Zhichao Liu
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, USA
| | - Shraddha Thakkar
- Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, USA
| | - Ruth Roberts
- ApconiX Ltd, Alderley Park, Alderley Edge, SK10 4TG, UK; University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Weida Tong
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, USA.
| |
Collapse
|
18
|
Castelli P, De Ruvo A, Bucciacchio A, D'Alterio N, Cammà C, Di Pasquale A, Radomski N. Harmonization of supervised machine learning practices for efficient source attribution of Listeria monocytogenes based on genomic data. BMC Genomics 2023; 24:560. [PMID: 37736708 PMCID: PMC10515079 DOI: 10.1186/s12864-023-09667-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 09/10/2023] [Indexed: 09/23/2023] Open
Abstract
BACKGROUND Genomic data-based machine learning tools are promising for real-time surveillance activities performing source attribution of foodborne bacteria such as Listeria monocytogenes. Given the heterogeneity of machine learning practices, our aim was to identify those influencing the source prediction performance of the usual holdout method combined with the repeated k-fold cross-validation method. METHODS A large collection of 1 100 L. monocytogenes genomes with known sources was built according to several genomic metrics to ensure authenticity and completeness of genomic profiles. Based on these genomic profiles (i.e. 7-locus alleles, core alleles, accessory genes, core SNPs and pan kmers), we developed a versatile workflow assessing prediction performance of different combinations of training dataset splitting (i.e. 50, 60, 70, 80 and 90%), data preprocessing (i.e. with or without near-zero variance removal), and learning models (i.e. BLR, ERT, RF, SGB, SVM and XGB). The performance metrics included accuracy, Cohen's kappa, F1-score, area under the curves from receiver operating characteristic curve, precision recall curve or precision recall gain curve, and execution time. RESULTS The testing average accuracies from accessory genes and pan kmers were significantly higher than accuracies from core alleles or SNPs. While the accuracies from 70 and 80% of training dataset splitting were not significantly different, those from 80% were significantly higher than the other tested proportions. The near-zero variance removal did not allow to produce results for 7-locus alleles, did not impact significantly the accuracy for core alleles, accessory genes and pan kmers, and decreased significantly accuracy for core SNPs. The SVM and XGB models did not present significant differences in accuracy between each other and reached significantly higher accuracies than BLR, SGB, ERT and RF, in this order of magnitude. However, the SVM model required more computing power than the XGB model, especially for high amount of descriptors such like core SNPs and pan kmers. CONCLUSIONS In addition to recommendations about machine learning practices for L. monocytogenes source attribution based on genomic data, the present study also provides a freely available workflow to solve other balanced or unbalanced multiclass phenotypes from binary and categorical genomic profiles of other microorganisms without source code modifications.
Collapse
Affiliation(s)
- Pierluigi Castelli
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Andrea De Ruvo
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Andrea Bucciacchio
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Nicola D'Alterio
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Cesare Cammà
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Adriano Di Pasquale
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Nicolas Radomski
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy.
| |
Collapse
|
19
|
Yu H, Tang S, Li SFY, Cheng F. Averaging Strategy for Interpretable Machine Learning on Small Datasets to Understand Element Uptake after Seed Nanotreatment. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:12760-12770. [PMID: 37594125 DOI: 10.1021/acs.est.3c01878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/19/2023]
Abstract
Understanding plant uptake and translocation of nanomaterials is crucial for ensuring the successful and sustainable applications of seed nanotreatment. Here, we collect a dataset with 280 instances from experiments for predicting the relative metal/metalloid concentration (RMC) in maize seedlings after seed priming by various metal and metalloid oxide nanoparticles. To obtain unbiased predictions and explanations on small datasets, we present an averaging strategy and add a dimension for interpretable machine learning. The findings in post-hoc interpretations of sophisticated LightGBM models demonstrate that solubility is highly correlated with model performance. Surface area, concentration, zeta potential, and hydrodynamic diameter of nanoparticles and seedling part and relative weight of plants are dominant factors affecting RMC, and their effects and interactions are explained. Furthermore, self-interpretable models using the RuleFit algorithm are established to successfully predict RMC only based on six important features identified by post-hoc explanations. We then develop a visualization tool called RuleGrid to depict feature effects and interactions in numerous generated rules. Consistent parameter-RMC relationships are obtained by different methods. This study offers a promising interpretable data-driven approach to expand the knowledge of nanoparticle fate in plants and may profoundly contribute to the safety-by-design of nanomaterials in agricultural and environmental applications.
Collapse
Affiliation(s)
- Hengjie Yu
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China
- Key Laboratory of Intelligent Equipment and Robotics for Agriculture of Zhejiang Province, Hangzhou 310058, China
| | - Shiyu Tang
- Department of Chemistry, National University of Singapore, 3 Science Drive 3, Singapore 117543, Singapore
| | - Sam Fong Yau Li
- Department of Chemistry, National University of Singapore, 3 Science Drive 3, Singapore 117543, Singapore
| | - Fang Cheng
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China
- Key Laboratory of Intelligent Equipment and Robotics for Agriculture of Zhejiang Province, Hangzhou 310058, China
| |
Collapse
|
20
|
Liao WC, Mukundan A, Sadiaza C, Tsao YM, Huang CW, Wang HC. Systematic meta-analysis of computer-aided detection to detect early esophageal cancer using hyperspectral imaging. BIOMEDICAL OPTICS EXPRESS 2023; 14:4383-4405. [PMID: 37799695 PMCID: PMC10549751 DOI: 10.1364/boe.492635] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 07/05/2023] [Accepted: 07/06/2023] [Indexed: 10/07/2023]
Abstract
One of the leading causes of cancer deaths is esophageal cancer (EC) because identifying it in early stage is challenging. Computer-aided diagnosis (CAD) could detect the early stages of EC have been developed in recent years. Therefore, in this study, complete meta-analysis of selected studies that only uses hyperspectral imaging to detect EC is evaluated in terms of their diagnostic test accuracy (DTA). Eight studies are chosen based on the Quadas-2 tool results for systematic DTA analysis, and each of the methods developed in these studies is classified based on the nationality of the data, artificial intelligence, the type of image, the type of cancer detected, and the year of publishing. Deeks' funnel plot, forest plot, and accuracy charts were made. The methods studied in these articles show the automatic diagnosis of EC has a high accuracy, but external validation, which is a prerequisite for real-time clinical applications, is lacking.
Collapse
Affiliation(s)
- Wei-Chih Liao
- Department of Internal Medicine, National Taiwan University Hospital, National Taiwan University College of Medicine, Taipei, Taiwan
- Graduate Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Arvind Mukundan
- Department of Mechanical Engineering, National Chung Cheng University, 168, University Rd., Min Hsiung, Chia Yi 62102, Taiwan
| | - Cleorita Sadiaza
- Department of Mechanical Engineering, Far Eastern University, P. Paredes St., Sampaloc, Manila, 1015, Philippines
| | - Yu-Ming Tsao
- Department of Mechanical Engineering, National Chung Cheng University, 168, University Rd., Min Hsiung, Chia Yi 62102, Taiwan
| | - Chien-Wei Huang
- Department of Gastroenterology, Kaohsiung Armed Forces General Hospital, 2, Zhongzheng 1st.Rd., Lingya District, Kaohsiung City 80284, Taiwan
- Department of Nursing, Tajen University, 20, Weixin Rd., Yanpu Township, Pingtung County 90741, Taiwan
| | - Hsiang-Chen Wang
- Department of Mechanical Engineering, National Chung Cheng University, 168, University Rd., Min Hsiung, Chia Yi 62102, Taiwan
- Department of Medical Research, Dalin Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, No. 2, Minsheng Road, Dalin, Chiayi, 62247, Taiwan
- Director of Technology Development, Hitspectra Intelligent Technology Co., Ltd., 4F., No. 2, Fuxing 4th Rd., Qianzhen Dist., Kaohsiung City 80661, Taiwan
| |
Collapse
|
21
|
North N, Enders AA, Cable ML, Allen HC. Array-Based Machine Learning for Functional Group Detection in Electron Ionization Mass Spectrometry. ACS OMEGA 2023; 8:24341-24350. [PMID: 37457446 PMCID: PMC10339417 DOI: 10.1021/acsomega.3c01684] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 05/22/2023] [Indexed: 07/18/2023]
Abstract
Mass spectrometry is a ubiquitous technique capable of complex chemical analysis. The fragmentation patterns that appear in mass spectrometry are an excellent target for artificial intelligence methods to automate and expedite the analysis of data to identify targets such as functional groups. To develop this approach, we trained models on electron ionization (a reproducible hard fragmentation technique) mass spectra so that not only the final model accuracies but also the reasoning behind model assignments could be evaluated. The convolutional neural network (CNN) models were trained on 2D images of the spectra using transfer learning of Inception V3, and the logistic regression models were trained using array-based data and Scikit Learn implementation in Python. Our training dataset consisted of 21,166 mass spectra from the United States' National Institute of Standards and Technology (NIST) Webbook. The data was used to train models to identify functional groups, both specific (e.g., amines, esters) and generalized classifications (aromatics, oxygen-containing functional groups, and nitrogen-containing functional groups). We found that the highest final accuracies on identifying new data were observed using logistic regression rather than transfer learning on CNN models. It was also determined that the mass range most beneficial for functional group analysis is 0-100 m/z. We also found success in correctly identifying functional groups of example molecules selected from both the NIST database and experimental data. Beyond functional group analysis, we also have developed a methodology to identify impactful fragments for the accurate detection of the models' targets. The results demonstrate a potential pathway for analyzing and screening substantial amounts of mass spectral data.
Collapse
Affiliation(s)
- Nicole
M. North
- Department
of Chemistry & Biochemistry, The Ohio
State University, Columbus, Ohio 43210, United States
| | - Abigail A. Enders
- Department
of Chemistry & Biochemistry, The Ohio
State University, Columbus, Ohio 43210, United States
| | - Morgan L. Cable
- NASA
Jet Propulsion Laboratory, California Institute
of Technology, Pasadena, California 91109, United States
| | - Heather C. Allen
- Department
of Chemistry & Biochemistry, The Ohio
State University, Columbus, Ohio 43210, United States
| |
Collapse
|
22
|
Park GJ, Kang NS. ADis-QSAR: a machine learning model based on biological activity differences of compounds. J Comput Aided Mol Des 2023:10.1007/s10822-023-00517-1. [PMID: 37382799 DOI: 10.1007/s10822-023-00517-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 06/26/2023] [Indexed: 06/30/2023]
Abstract
Drug candidates identified by the pharmaceutical industry typically have unique structural characteristics to ensure they interact strongly and specifically with their biological targets. Identifying these characteristics is a key challenge for developing new drugs, and quantitative structure-activity relationship (QSAR) analysis has generally been used to perform this task. QSAR models with good predictive power improve the cost and time efficiencies invested in compound development. Generating these good models depends on how well differences between "active" and "inactive" compound groups can be conveyed to the model to be learned. Efforts to solve this difference issue have been made, including generating a "molecular descriptor" that compressively expresses the structural characteristics of compounds. From the same perspective, we succeeded in developing the Activity Differences-Quantitative Structure-Activity Relationship (ADis-QSAR) model by generating molecular descriptors that more explicitly convey features of the group through a pair system that performs direct connections between active and inactive groups. We used popular machine learning algorithms, such as Support Vector Machine, Random Forest, XGBoost and Multi-Layer Perceptron for model learning and evaluated the model using scores such as accuracy, area under curve, precision and specificity. The results showed that the Support Vector Machine performed better than the others. Notably, the ADis-QSAR model showed significant improvements in meaningful scores such as precision and specificity compared to the baseline model, even in datasets with dissimilar chemical spaces. This model reduces the risk of selecting false positive compounds, improving the efficiency of drug development.
Collapse
Affiliation(s)
- Gyoung Jin Park
- Graduate School of New Drug Discovery and Development, Chungnam National University, 99 Daehak-ro,Yuseong-gu, Daejeon, 34134, Korea
| | - Nam Sook Kang
- Graduate School of New Drug Discovery and Development, Chungnam National University, 99 Daehak-ro,Yuseong-gu, Daejeon, 34134, Korea.
| |
Collapse
|
23
|
Pacheco VL, Bragagnolo L, Dalla Rosa F, Thomé A. Optimization of biocementation responses by artificial neural network and random forest in comparison to response surface methodology. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:61863-61887. [PMID: 36934187 DOI: 10.1007/s11356-023-26362-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 03/05/2023] [Indexed: 05/10/2023]
Abstract
In this article, the optimization of the specific urease activity (SUA) and the calcium carbonate (CaCO3) using microbially induced calcite precipitation (MICP) was compared to optimization using three algorithms based on machine learning: random forest regressor, artificial neural networks (ANNs), and multivariate linear regression. This study applied the techniques in two existing response surface method (RSM) experiments involving MICP technique. Random forest-based models and artificial neural network-based models were submitted through the optimization of hyperparameters via cross-validation technique and grid search, to select the best-optimized model. For this study, the random forest-based algorithm is aimed at having the best performance of 0.9381 and 0.9463 in comparison to the original r2 of 0.9021 and 0.8530, respectively. This study is aimed at exploring the capability of using machine learning-based models in small datasets for the purpose of optimization of experimental variables in MICP technique and the meaningfulness of the models by their specificities in the small experimental datasets applied to experimental designs. This study is aimed at exploring the capability of using machine learning-based models in small datasets for experimental variable optimization in MICP technique. The use of these techniques can create prerogatives to scale and mitigate costs in future experiments associated to the field.
Collapse
Affiliation(s)
- Vinicius Luiz Pacheco
- Graduate Program in Civil and Environmental Engineering, University of Passo Fundo (UPF), Campus I, Km 171, BR 285, Passo Fundo, Rio Grande Do Sul, CEP: 99001-970, Brazil.
| | - Lucimara Bragagnolo
- Graduate Program in Civil and Environmental Engineering, University of Passo Fundo (UPF), Campus I, Km 171, BR 285, Passo Fundo, Rio Grande Do Sul, CEP: 99001-970, Brazil
| | - Francisco Dalla Rosa
- Graduate Program in Civil and Environmental Engineering, University of Passo Fundo (UPF), Campus I, Km 171, BR 285, Passo Fundo, Rio Grande Do Sul, CEP: 99001-970, Brazil
| | - Antonio Thomé
- Graduate Program in Civil and Environmental Engineering, University of Passo Fundo (UPF), Campus I, Km 171, BR 285, Passo Fundo, Rio Grande Do Sul, CEP: 99001-970, Brazil
| |
Collapse
|
24
|
De P, Roy K. Computational modeling of PET imaging agents for vesicular acetylcholine transporter (VAChT) protein binding affinity: application of 2D-QSAR modeling and molecular docking techniques. In Silico Pharmacol 2023; 11:9. [PMID: 37035236 PMCID: PMC10073372 DOI: 10.1007/s40203-023-00146-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 03/31/2023] [Indexed: 04/07/2023] Open
Abstract
The neurotransmitter acetylcholine (ACh) plays a ubiquitous role in cognitive functions including learning and memory with widespread innervation in the cortex, subcortical structures, and the cerebellum. Cholinergic receptors, transporters, or enzymes associated with many neurodegenerative diseases, including Alzheimer's disease (AD) and Parkinson's disease (PD), are potential imaging targets. In the present study, we have developed 2D-quantitative structure-activity relationship (2D-QSAR) models for 19 positron emission tomography (PET) imaging agents targeted against presynaptic vesicular acetylcholine transporter (VAChT). VAChT assists in the transport of ACh into the presynaptic storage vesicles, and it becomes one of the main targets for the diagnosis of various neurodegenerative diseases. In our work, we aimed to understand the important structural features of the PET imaging agents required for their binding with VAChT. This was done by feature selection using a Genetic Algorithm followed by the Best Subset Selection method and developing a Partial Least Squares- based 2D-QSAR model using the best feature combination. The developed QSAR model showed significant statistical performance and reliability. Using the features selected in the 2D-QSAR analysis, we have also performed similarity-based chemical read-across predictions and obtained encouraging external validation statistics. Further, we have also performed molecular docking analysis to understand the molecular interactions occurring between the PET imaging agents and the VAChT receptor. The molecular docking results were correlated with the QSAR features for a better understanding of the molecular interactions. This research serves to fulfill the experimental data gap, highlighting the applicability of computational methods in the PET imaging agents' binding affinity prediction. Graphical abstract Supplementary Information The online version contains supplementary material available at 10.1007/s40203-023-00146-4.
Collapse
Affiliation(s)
- Priyanka De
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700032 India
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700032 India
| |
Collapse
|
25
|
Kim J, Jung W, An J, Oh HJ, Park J. Self-optimization of training dataset improves forecasting of cyanobacterial bloom by machine learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 866:161398. [PMID: 36621510 DOI: 10.1016/j.scitotenv.2023.161398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 11/30/2022] [Accepted: 01/01/2023] [Indexed: 06/17/2023]
Abstract
Data-driven model (DDM) prediction of aquatic ecological responses, such as cyanobacterial harmful algal blooms (CyanoHABs), is critically influenced by the choice of training dataset. However, a systematic method to choose the optimal training dataset considering data history has not yet been developed. Providing a comprehensive procedure with self-based optimal training dataset-selecting algorithm would self-improve the DDM performance. In this study, a novel algorithm was developed to self-generate possible training dataset candidates from the available input and output variable data and self-choose the optimal training dataset that maximizes CyanoHAB forecasting performance. Nine years of meteorological and water quality data (input) and CyanoHAB data (output) from a site on the Nakdong River, South Korea, were acquired and pretreated via an automated process. An artificial neural network (ANN) was chosen from among the DDM candidates by first-cut training and validation using the entire collected dataset. Optimal training datasets for the ANN were self-selected from among the possible self-generated training datasets by systematically simulating the performance in response to 46 periods and 40 sizes (number of data elements) of the generated training datasets. The best-performing models were screened to identify the candidate models. The best performance corresponded to 6-7 years of training data (∼18 % lower error) for forecasting 1-28 d ahead (1-28 d of forecasting lead time (FLT)). After the hyperparameters of the screened model candidates were fine-tuned, the best-performing model (7 years of data with 14 d FLT) was self-determined by comparing the forecasts with unseen CyanoHAB events. The self-determined model could reasonably predict CyanoHABs occurring in Korean waters (cyanobacteria cells/mL ≥ 1000). Thus, our proposed method of self-optimizing the training dataset effectively improved the predictive accuracy and operational efficiency of the DDM prediction of CyanoHAB.
Collapse
Affiliation(s)
- Jayun Kim
- Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea
| | - Woosik Jung
- Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea
| | - Jusuk An
- Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea; Department of Environmental Research, Korea Institute of Civil Engineering and Building Technology, Goyang, Republic of Korea
| | - Hyun Je Oh
- Department of Environmental Research, Korea Institute of Civil Engineering and Building Technology, Goyang, Republic of Korea
| | - Joonhong Park
- Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea.
| |
Collapse
|
26
|
Artificial intelligence-based diagnosis of asbestosis: analysis of a database with applicants for asbestosis state aid. Eur Radiol 2022; 33:3557-3565. [PMID: 36567379 PMCID: PMC10121486 DOI: 10.1007/s00330-022-09304-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 09/27/2022] [Accepted: 11/18/2022] [Indexed: 12/27/2022]
Abstract
OBJECTIVES In many countries, workers who developed asbestosis due to their occupation are eligible for government support. Based on the results of clinical examination, a team of pulmonologists determine the eligibility of patients to these programs. In this Dutch cohort study, we aim to demonstrate the potential role of an artificial intelligence (AI)-based system for automated, standardized, and cost-effective evaluation of applications for asbestosis patients. METHODS A dataset of n = 523 suspected asbestosis cases/applications from across the Netherlands was retrospectively collected. Each case/application was reviewed, and based on the criteria, a panel of three pulmonologists would determine eligibility for government support. An AI system is proposed, which uses thoracic CT images as input, and predicts the assessment of the clinical panel. Alongside imaging, we evaluated the added value of lung function parameters. RESULTS The proposed AI algorithm reached an AUC of 0.87 (p < 0.001) in the prediction of accepted versus rejected applications. Diffusion capacity (DLCO) also showed comparable predictive value (AUC = 0.85, p < 0.001), with little correlation between the two parameters (r-squared = 0.22, p < 0.001). The combination of the imaging AI score and DLCO achieved superior performance (AUC = 0.95, p < 0.001). Interobserver variability between pulmonologists on the panel was estimated at alpha = 0.65 (Krippendorff's alpha). CONCLUSION We developed an AI system to support the clinical decision-making process for the application to the government support for asbestosis. A multicenter prospective validation study is currently ongoing to examine the added value and reliability of this system alongside the clinic panel. KEY POINTS • Artificial intelligence can detect imaging patterns of asbestosis in CT scans in a cohort of patients applying for state aid. • Combining the AI prediction with the diffusing lung function parameter reaches the highest diagnostic performance. • Specific cases with fibrosis but no asbestosis were correctly classified, suggesting robustness of the AI system, which is currently under prospective validation.
Collapse
|
27
|
Kuzu SY. Evaluation of Gradient Boosting and Deep Learning Algorithms in Dimuon Production. J Mol Struct 2022. [DOI: 10.1016/j.molstruc.2022.134834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
28
|
Hamdy O, Abdel-Salam Z, Abdel-Harith M. Utilization of laser-induced breakdown spectroscopy, with principal component analysis and artificial neural networks in revealing adulteration of similarly looking fish fillets. APPLIED OPTICS 2022; 61:10260-10266. [PMID: 36606791 DOI: 10.1364/ao.470835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 10/18/2022] [Indexed: 06/17/2023]
Abstract
Fish is an essential source of many nutrients necessary for human health. However, the deliberate mislabeling of similar fish fillet types is common in markets to make use of the relatively high price difference. This is a type of explicit food adulteration. In the present work, spectrochemical analysis and chemometric methods are adopted to disclose this type of fish species cheating. Laser-induced breakdown spectroscopy (LIBS) was utilized to differentiate between the fillets of the low-priced tilapia and the expensive Nile perch. Furthermore, the acquired spectroscopic data were analyzed statistically using principal component analysis (PCA) and artificial neural network (ANN) showing good discrimination in the PCA score plot and a 99% classification accuracy rate of the implemented ANN model. The recorded spectra of the two fish indicated that tilapia has a higher fat content than Nile perch, as evidenced by higher CN and C2 bands and an atomic line at 247.8 nm in its spectrum. The obtained results demonstrated the potential of using LIBS as a simple, fast, and cost-effective analytical technique, combined with statistical analysis for the decisive discrimination between fish fillet species.
Collapse
|
29
|
A Systematic Review of Applications of Machine Learning and Other Soft Computing Techniques for the Diagnosis of Tropical Diseases. Trop Med Infect Dis 2022; 7:tropicalmed7120398. [PMID: 36548653 PMCID: PMC9787706 DOI: 10.3390/tropicalmed7120398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 11/17/2022] [Accepted: 11/21/2022] [Indexed: 11/29/2022] Open
Abstract
This systematic literature aims to identify soft computing techniques currently utilized in diagnosing tropical febrile diseases and explore the data characteristics and features used for diagnoses, algorithm accuracy, and the limitations of current studies. The goal of this study is therefore centralized around determining the extent to which soft computing techniques have positively impacted the quality of physician care and their effectiveness in tropical disease diagnosis. The study has used PRISMA guidelines to identify paper selection and inclusion/exclusion criteria. It was determined that the highest frequency of articles utilized ensemble techniques for classification, prediction, analysis, diagnosis, etc., over single machine learning techniques, followed by neural networks. The results identified dengue fever as the most studied disease, followed by malaria and tuberculosis. It was also revealed that accuracy was the most common metric utilized to evaluate the predictive capability of a classification mode. The information presented within these studies benefits frontline healthcare workers who could depend on soft computing techniques for accurate diagnoses of tropical diseases. Although our research shows an increasing interest in using machine learning techniques for diagnosing tropical diseases, there still needs to be more studies. Hence, recommendations and directions for future research are proposed.
Collapse
|
30
|
Maize crop disease detection using NPNet-19 convolutional neural network. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07722-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
31
|
Kim KM, Ahn JH. Machine learning predictions of chlorophyll-a in the Han river basin, Korea. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2022; 318:115636. [PMID: 35777152 DOI: 10.1016/j.jenvman.2022.115636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 06/20/2022] [Accepted: 06/26/2022] [Indexed: 06/15/2023]
Abstract
This study developed a model to predict concentrations of chlorophyll-a ([Chl-a]) as a proxy for algal population with data from multiple monitoring stations in the Han river basin, by using machine-learning predictive models, then analyzed the relationship between [Chl-a] and the input variables of the optimized model. Daily water quality and meteorological data from 2012 to 2020 were collected from the real-time water quality information system and the meteorological administration of Korea. To quantify model accuracy, the coefficient of determination, root mean square error, and mean absolute error were applied. Among random forest (RF), support vector machine, and artificial neural network, the RF with random dataset showed the highest accuracy. The RF was optimized when 78 trees were applied to the model. Input variables for the best RF model were total organic carbon (feature importance: 27%), total nitrogen (19%), pH (13%), water temperature (8%), total phosphorus (8%), electrical conductivity (7%), dissolved oxygen (6%), minimum air temperature (AT) (4%), mean AT (3%), and maximum AT (3%). The feature-importance analysis showed that total organic carbon was the most important variable to predict [Chl-a] in the Han river basin. Total nitrogen was a more important variable than total phosphorus.
Collapse
Affiliation(s)
- Kyung-Min Kim
- Department of Integrated Energy and Infra System, Kangwon National University, Chuncheon, Gangwon-do, 24341, South Korea
| | - Johng-Hwa Ahn
- Department of Integrated Energy and Infra System, Kangwon National University, Chuncheon, Gangwon-do, 24341, South Korea; Department of Environmental Engineering, College of Engineering, Kangwon National University, Chuncheon, Gangwon-do, 24341, South Korea.
| |
Collapse
|
32
|
Elkholosy H, Ead R, Hammad A, AbouRizk S. Data mining for forecasting labor resource requirements: a case study of project management staffing requirements. INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 2022. [DOI: 10.1080/15623599.2022.2112898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
Affiliation(s)
- Hady Elkholosy
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton, Canada
| | - Rana Ead
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton, Canada
| | - Ahmed Hammad
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton, Canada
| | - Simaan AbouRizk
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton, Canada
| |
Collapse
|
33
|
Hoyos W, Aguilar J, Toro M. A clinical decision-support system for dengue based on fuzzy cognitive maps. Health Care Manag Sci 2022; 25:666-681. [PMID: 35971038 DOI: 10.1007/s10729-022-09611-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 07/28/2022] [Indexed: 01/18/2023]
Abstract
Dengue is a viral infection widely distributed in tropical and subtropical regions of the world. Dengue is characterized by high fatality rates when the diagnosis is not made promptly and effectively. To aid in the diagnosis of dengue, we propose a clinical decision-support system that classifies the clinical picture based on its severity, and using causal relationships evaluates the behavior of the clinical and laboratory variables that describe the signs and symptoms related to dengue. The system is based on a fuzzy cognitive map that is defined by the signs, symptoms and laboratory tests used in the conventional diagnosis of dengue. The evaluation of the model was performed on datasets of patients diagnosed with dengue to compare the model with other approaches. The developed model showed a good classification performance with 89.4% accuracy and could evaluate the behaviour of clinical and laboratory variables related to dengue severity (it is an explainable method). This model serves as a diagnostic aid for dengue that can be used by medical professionals in clinical settings.
Collapse
Affiliation(s)
- William Hoyos
- Grupo de Investigaciones Microbiológicas y Biomédicas de Córdoba, Universidad de Córdoba, Carrera 6 No 77-305, Montería, Colombia
- Grupo de Investigación en I+D+i en TIC, Universidad EAFIT, Carrera 48 No 7Sur-50, Medellín, Colombia
| | - Jose Aguilar
- Grupo de Investigación en I+D+i en TIC, Universidad EAFIT, Carrera 48 No 7Sur-50, Medellín, Colombia.
- Centro de Estudios en Microelectrónica y Sistemas Distribuidos, Universidad de Los Andes, Núcleo La Hechicera, Mérida, Venezuela.
- Departamento de Automática, Universidad de Alcalá, Alcalá de Henares, Spain.
| | - Mauricio Toro
- Grupo de Investigación en I+D+i en TIC, Universidad EAFIT, Carrera 48 No 7Sur-50, Medellín, Colombia
| |
Collapse
|
34
|
Ehrhart M, Resch B, Havas C, Niederseer D. A Conditional GAN for Generating Time Series Data for Stress Detection in Wearable Physiological Sensor Data. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22165969. [PMID: 36015730 PMCID: PMC9412645 DOI: 10.3390/s22165969] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/05/2022] [Accepted: 08/06/2022] [Indexed: 05/14/2023]
Abstract
Human-centered applications using wearable sensors in combination with machine learning have received a great deal of attention in the last couple of years. At the same time, wearable sensors have also evolved and are now able to accurately measure physiological signals and are, therefore, suitable for detecting body reactions to stress. The field of machine learning, or more precisely, deep learning, has been able to produce outstanding results. However, in order to produce these good results, large amounts of labeled data are needed, which, in the context of physiological data related to stress detection, are a great challenge to collect, as they usually require costly experiments or expert knowledge. This usually results in an imbalanced and small dataset, which makes it difficult to train a deep learning algorithm. In recent studies, this problem is tackled with data augmentation via a Generative Adversarial Network (GAN). Conditional GANs (cGAN) are particularly suitable for this as they provide the opportunity to feed auxiliary information such as a class label into the training process to generate labeled data. However, it has been found that during the training process of GANs, different problems usually occur, such as mode collapse or vanishing gradients. To tackle the problems mentioned above, we propose a Long Short-Term Memory (LSTM) network, combined with a Fully Convolutional Network (FCN) cGAN architecture, with an additional diversity term to generate synthetic physiological data, which are used to augment the training dataset to improve the performance of a binary classifier for stress detection. We evaluated the methodology on our collected physiological measurement dataset, and we were able to show that using the method, the performance of an LSTM and an FCN classifier could be improved. Further, we showed that the generated data could not be distinguished from the real data any longer.
Collapse
Affiliation(s)
- Maximilian Ehrhart
- Department of Geoinformatics, University of Salzburg, 5020 Salzburg, Austria or
| | - Bernd Resch
- Department of Geoinformatics, University of Salzburg, 5020 Salzburg, Austria or
- Center for Geographic Analysis, Harvard University, Cambridge, MA 02138, USA
- Correspondence: ; Tel.: +43-662-8044-7551
| | - Clemens Havas
- Department of Geoinformatics, University of Salzburg, 5020 Salzburg, Austria or
| | - David Niederseer
- Department of Cardiology, University Heart Center Zurich, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| |
Collapse
|
35
|
Deep learning based semantic segmentation and quantification for MRD biochip images. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
36
|
Guttman Y, Kerem Z. Computer-Aided (In Silico) Modeling of Cytochrome P450-Mediated Food–Drug Interactions (FDI). Int J Mol Sci 2022; 23:ijms23158498. [PMID: 35955630 PMCID: PMC9369352 DOI: 10.3390/ijms23158498] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 07/26/2022] [Accepted: 07/28/2022] [Indexed: 02/01/2023] Open
Abstract
Modifications of the activity of Cytochrome 450 (CYP) enzymes by compounds in food might impair medical treatments. These CYP-mediated food–drug interactions (FDI) play a major role in drug clearance in the intestine and liver. Inter-individual variation in both CYP expression and structure is an important determinant of FDI. Traditional targeted approaches have highlighted a limited number of dietary inhibitors and single-nucleotide variations (SNVs), each determining personal CYP activity and inhibition. These approaches are costly in time, money and labor. Here, we review computational tools and databases that are already available and are relevant to predicting CYP-mediated FDIs. Computer-aided approaches such as protein–ligand interaction modeling and the virtual screening of big data narrow down hundreds of thousands of items in databanks to a few putative targets, to which the research resources could be further directed. Structure-based methods are used to explore the structural nature of the interaction between compounds and CYP enzymes. However, while collections of chemical, biochemical and genetic data are available today and call for the implementation of big-data approaches, ligand-based machine-learning approaches for virtual screening are still scarcely used for FDI studies. This review of CYP-mediated FDIs promises to attract scientists and the general public.
Collapse
|
37
|
Predicting Divorce Prospect Using Ensemble Learning: Support Vector Machine, Linear Model, and Neural Network. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:3687598. [PMID: 35860635 PMCID: PMC9293523 DOI: 10.1155/2022/3687598] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/20/2022] [Accepted: 05/23/2022] [Indexed: 01/27/2023]
Abstract
A divorce is a legal step taken by married people to end their marriage. It occurs after a couple decides to no longer live together as husband and wife. Globally, the divorce rate has more than doubled from 1970 until 2008, with divorces per 1,000 married people rising from 2.6 to 5.5. Divorce occurs at a rate of 16.9 per 1,000 married women. According to the experts, over half of all marriages ends in divorce or separation in the United States. A novel ensemble learning technique based on advanced machine learning algorithms is proposed in this study. The support vector machine (SVM), passive aggressive classifier, and neural network (MLP) are applied in the context of divorce prediction. A question-based dataset is created by the field specialist. The responses to the questions provide important information about whether a marriage is likely to turn into divorce in the future. The cross-validation is applied in 5 folds, and the performance results of the evaluation metrics are examined. The accuracy score is 100%, and Receiver Operating Characteristic (ROC) curve accuracy score, recall score, the precision score, and the F1 accuracy score are close to 97% confidently. Our findings examined the key indicators for divorce and the factors that are most significant when predicting the divorce.
Collapse
|
38
|
López-López E, Fernández-de Gortari E, Medina-Franco JL. Yes SIR! On the structure-inactivity relationships in drug discovery. Drug Discov Today 2022; 27:2353-2362. [PMID: 35561964 DOI: 10.1016/j.drudis.2022.05.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/09/2022] [Accepted: 05/05/2022] [Indexed: 12/12/2022]
Abstract
In analogy with structure-activity relationships (SARs), which are at the core of medicinal chemistry, studying structure-inactivity relationships (SIRs) is essential to understanding and predicting biological activity. Current computational methods should predict or distinguish 'activity' and 'inactivity' with the same confidence because both concepts are complementary. However, the lack of inactivity data, in particular in the public domain, limits the development of predictive models and its broad application. In this review, we encourage the scientific community to disclose and analyze high-confidence activity data considering both the labeled 'active' and 'inactive' compounds.
Collapse
Affiliation(s)
- Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico; Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute, Mexico City 07000, Mexico.
| | - Eli Fernández-de Gortari
- Department of Nanosafety, International Iberian Nanotechnology Laboratory, Braga 4715-330, Portugal
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
| |
Collapse
|
39
|
González-Fernández E, Álvarez-López S, Garrido A, Fernández-González M, Rodríguez-Rajo FJ. Data mining assessment of Poaceae pollen influencing factors and its environmental implications. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 815:152874. [PMID: 34999063 DOI: 10.1016/j.scitotenv.2021.152874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 12/29/2021] [Accepted: 12/29/2021] [Indexed: 06/14/2023]
Abstract
Poaceae pollen is highly allergenic, with a marked contribution to the pollen worldwide allergy prevalence. Pollen counts are defined by the species present in the considered area, although year-to-year oscillations may be triggered by different parameters, among which are weather conditions. Due to the predominant role of Poaceae pollen in the allergenicity in urban green areas, the aim of this study was the analysis of pollen trends and the influence of meteorology to forecast relevant variations in airborne pollen levels. The study was carried out during the 1993-2020 period in Ourense, in NW Iberian Peninsula. We used a volumetric Lanzoni VPPS 2000 trap for recording Poaceae airborne pollen grains, and meteorological daily data were obtained from the Galician Institute for Meteorology and Oceanography. The main indexes of the pollen season and their trends were calculated. A correlation analysis and 'C5.0 Decision Trees and Rule-Based Models' data mining algorithm were applied to determine the influence of meteorological conditions on pollen levels. We detected atmospheric Poaceae pollen during 139 days on average, mainly from April to August. The mean pollen grains amount recorded during the pollen season was 4608 pollen grains, with the pollen maximum peak of 276 pollen/m3 on 27 June. We found no statistically significant trends and slight slopes for the seasonal indexes, similarly to previous Poaceae studies in the same region. The calculated C5.0 model offered defined results, indicating that the combination of mean temperature above 17.46 °C and sunlight exposure higher than 12.7 h is conductive to significantly high pollen levels. The obtained results make possible the identification of risk moments during the pollen season for the activation of protective measures for sensitized population to grass pollen.
Collapse
Affiliation(s)
| | - Sabela Álvarez-López
- Department of Plant Biology and Soil Sciences, Faculty of Sciences, University of Vigo, 32004 Ourense, Spain
| | - Alejandro Garrido
- Department of Plant Biology and Soil Sciences, Faculty of Sciences, University of Vigo, 32004 Ourense, Spain
| | - María Fernández-González
- Department of Plant Biology and Soil Sciences, Faculty of Sciences, University of Vigo, 32004 Ourense, Spain.
| | - Fco Javier Rodríguez-Rajo
- Department of Plant Biology and Soil Sciences, Faculty of Sciences, University of Vigo, 32004 Ourense, Spain
| |
Collapse
|
40
|
Qureshi MB, Azad L, Qureshi MS, Aslam S, Aljarbouh A, Fayaz M. Brain Decoding Using fMRI Images for Multiple Subjects through Deep Learning. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:1124927. [PMID: 35273647 PMCID: PMC8904097 DOI: 10.1155/2022/1124927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Revised: 02/06/2022] [Accepted: 02/11/2022] [Indexed: 12/02/2022]
Abstract
Substantial information related to human cerebral conditions can be decoded through various noninvasive evaluating techniques like fMRI. Exploration of the neuronal activity of the human brain can divulge the thoughts of a person like what the subject is perceiving, thinking, or visualizing. Furthermore, deep learning techniques can be used to decode the multifaceted patterns of the brain in response to external stimuli. Existing techniques are capable of exploring and classifying the thoughts of the human subject acquired by the fMRI imaging data. fMRI images are the volumetric imaging scans which are highly dimensional as well as require a lot of time for training when fed as an input in the deep learning network. However, the hassle for more efficient learning of highly dimensional high-level features in less training time and accurate interpretation of the brain voxels with less misclassification error is needed. In this research, we propose an improved CNN technique where features will be functionally aligned. The optimal features will be selected after dimensionality reduction. The highly dimensional feature vector will be transformed into low dimensional space for dimensionality reduction through autoadjusted weights and combination of best activation functions. Furthermore, we solve the problem of increased training time by using Swish activation function, making it denser and increasing efficiency of the model in less training time. Finally, the experimental results are evaluated and compared with other classifiers which demonstrated the supremacy of the proposed model in terms of accuracy.
Collapse
Affiliation(s)
- Muhammad Bilal Qureshi
- Department of Computer Science & IT, University of Lakki Marwat, Lakki Marwat 28420, KPK, Pakistan
| | - Laraib Azad
- Department of Computer Science, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology, Islamabad 44000, Pakistan
| | - Muhammad Shuaib Qureshi
- Department of Computer Science, School of Arts and Sciences, University of Central Asia, Kyrgyzstan
| | - Sheraz Aslam
- Department of Electrical Engineering, Computer Engineering, and Informatics, Cyprus University of Technology, Cyprus
| | - Ayman Aljarbouh
- Department of Computer Science, School of Arts and Sciences, University of Central Asia, Kyrgyzstan
| | - Muhammad Fayaz
- Department of Computer Science, School of Arts and Sciences, University of Central Asia, Kyrgyzstan
| |
Collapse
|
41
|
Yeo C, Kim BC, Cheon S, Lee J, Mun D. Machining feature recognition based on deep neural networks to support tight integration with 3D CAD systems. Sci Rep 2021; 11:22147. [PMID: 34772966 PMCID: PMC8590007 DOI: 10.1038/s41598-021-01313-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 10/26/2021] [Indexed: 11/23/2022] Open
Abstract
Recently, studies applying deep learning technology to recognize the machining feature of three-dimensional (3D) computer-aided design (CAD) models are increasing. Since the direct utilization of boundary representation (B-rep) models as input data for neural networks in terms of data structure is difficult, B-rep models are generally converted into a voxel, mesh, or point cloud model and used as inputs for neural networks for the application of 3D models to deep learning. However, the model’s resolution decreases during the format conversion of 3D models, causing the loss of some features or difficulties in identifying areas of the converted model corresponding to a specific face of the B-rep model. To solve these problems, this study proposes a method enabling tight integration of a 3D CAD system with a deep neural network using feature descriptors as inputs to neural networks for recognizing machining features. Feature descriptor denotes an explicit representation of the main property items of a face. We constructed 2236 data to train and evaluate the deep neural network. Of these, 1430 were used for training the deep neural network, and 358 were used for validation. And 448 were used to evaluate the performance of the trained deep neural network. In addition, we conducted an experiment to recognize a total of 17 types (16 types of machining features and a non-feature) from the B-rep model, and the types for all 75 test cases were successfully recognized.
Collapse
Affiliation(s)
- Changmo Yeo
- School of Mechanical Engineering, Korea University, 145, Anam-ro, Seongbuk-gu, Seoul, 02841, South Korea
| | - Byung Chul Kim
- School of Mechanical Engineering, Korea University of Technology and Education, 1600 Chungjeol-ro, Byeongcheon-myeon, Dongnam-gu, Cheonan-si, Chungcheongnam-do, 31253, South Korea
| | - Sanguk Cheon
- Department of Integrative Systems Engineering, Ajou University, 206, Worldcup-ro, Yeongtong-gu, Suwon, 16499, South Korea
| | - Jinwon Lee
- School of Mechanical Engineering, Korea University, 145, Anam-ro, Seongbuk-gu, Seoul, 02841, South Korea
| | - Duhwan Mun
- School of Mechanical Engineering, Korea University, 145, Anam-ro, Seongbuk-gu, Seoul, 02841, South Korea.
| |
Collapse
|
42
|
Rácz A, Bajusz D, Miranda-Quintana RA, Héberger K. Machine learning models for classification tasks related to drug safety. Mol Divers 2021; 25:1409-1424. [PMID: 34110577 PMCID: PMC8342376 DOI: 10.1007/s11030-021-10239-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 05/27/2021] [Indexed: 12/23/2022]
Abstract
In this review, we outline the current trends in the field of machine learning-driven classification studies related to ADME (absorption, distribution, metabolism and excretion) and toxicity endpoints from the past six years (2015-2021). The study focuses only on classification models with large datasets (i.e. more than a thousand compounds). A comprehensive literature search and meta-analysis was carried out for nine different targets: hERG-mediated cardiotoxicity, blood-brain barrier penetration, permeability glycoprotein (P-gp) substrate/inhibitor, cytochrome P450 enzyme family, acute oral toxicity, mutagenicity, carcinogenicity, respiratory toxicity and irritation/corrosion. The comparison of the best classification models was targeted to reveal the differences between machine learning algorithms and modeling types, endpoint-specific performances, dataset sizes and the different validation protocols. Based on the evaluation of the data, we can say that tree-based algorithms are (still) dominating the field, with consensus modeling being an increasing trend in drug safety predictions. Although one can already find classification models with great performances to hERG-mediated cardiotoxicity and the isoenzymes of the cytochrome P450 enzyme family, these targets are still central to ADMET-related research efforts.
Collapse
Affiliation(s)
- Anita Rácz
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, Budapest, 1117, Hungary.
| | - Dávid Bajusz
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, Budapest, 1117, Hungary
| | | | - Károly Héberger
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, Budapest, 1117, Hungary.
| |
Collapse
|
43
|
Pahar M, Klopper M, Warren R, Niesler T. COVID-19 cough classification using machine learning and global smartphone recordings. Comput Biol Med 2021; 135:104572. [PMID: 34182331 PMCID: PMC8213969 DOI: 10.1016/j.compbiomed.2021.104572] [Citation(s) in RCA: 90] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 12/15/2022]
Abstract
We present a machine learning based COVID-19 cough classifier which can discriminate COVID-19 positive coughs from both COVID-19 negative and healthy coughs recorded on a smartphone. This type of screening is non-contact, easy to apply, and can reduce the workload in testing centres as well as limit transmission by recommending early self-isolation to those who have a cough suggestive of COVID-19. The datasets used in this study include subjects from all six continents and contain both forced and natural coughs, indicating that the approach is widely applicable. The publicly available Coswara dataset contains 92 COVID-19 positive and 1079 healthy subjects, while the second smaller dataset was collected mostly in South Africa and contains 18 COVID-19 positive and 26 COVID-19 negative subjects who have undergone a SARS-CoV laboratory test. Both datasets indicate that COVID-19 positive coughs are 15%–20% shorter than non-COVID coughs. Dataset skew was addressed by applying the synthetic minority oversampling technique (SMOTE). A leave-p-out cross-validation scheme was used to train and evaluate seven machine learning classifiers: logistic regression (LR), k-nearest neighbour (KNN), support vector machine (SVM), multilayer perceptron (MLP), convolutional neural network (CNN), long short-term memory (LSTM) and a residual-based neural network architecture (Resnet50). Our results show that although all classifiers were able to identify COVID-19 coughs, the best performance was exhibited by the Resnet50 classifier, which was best able to discriminate between the COVID-19 positive and the healthy coughs with an area under the ROC curve (AUC) of 0.98. An LSTM classifier was best able to discriminate between the COVID-19 positive and COVID-19 negative coughs, with an AUC of 0.94 after selecting the best 13 features from a sequential forward selection (SFS). Since this type of cough audio classification is cost-effective and easy to deploy, it is potentially a useful and viable means of non-contact COVID-19 screening.
Collapse
Affiliation(s)
- Madhurananda Pahar
- Department of Electrical and Electronic Engineering, Stellenbosch University, South Africa.
| | - Marisa Klopper
- SAMRC Centre for Tuberculosis Research, DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, South Africa.
| | - Robin Warren
- SAMRC Centre for Tuberculosis Research, DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, South Africa.
| | - Thomas Niesler
- Department of Electrical and Electronic Engineering, Stellenbosch University, South Africa.
| |
Collapse
|
44
|
Lovrić M, Malev O, Klobučar G, Kern R, Liu JJ, Lučić B. Predictive Capability of QSAR Models Based on the CompTox Zebrafish Embryo Assays: An Imbalanced Classification Problem. Molecules 2021; 26:1617. [PMID: 33803931 PMCID: PMC7998177 DOI: 10.3390/molecules26061617] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 03/03/2021] [Accepted: 03/11/2021] [Indexed: 02/06/2023] Open
Abstract
The CompTox Chemistry Dashboard (ToxCast) contains one of the largest public databases on Zebrafish (Danio rerio) developmental toxicity. The data consists of 19 toxicological endpoints on unique 1018 compounds measured in relatively low concentration ranges. The endpoints are related to developmental effects occurring in dechorionated zebrafish embryos for 120 hours post fertilization and monitored via gross malformations and mortality. We report the predictive capability of 209 quantitative structure-activity relationship (QSAR) models developed by machine learning methods using penalization techniques and diverse model quality metrics to cope with the imbalanced endpoints. All these QSAR models were generated to test how the imbalanced classification (toxic or non-toxic) endpoints could be predicted regardless which of three algorithms is used: logistic regression, multi-layer perceptron, or random forests. Additionally, QSAR toxicity models are developed starting from sets of classical molecular descriptors, structural fingerprints and their combinations. Only 8 out of 209 models passed the 0.20 Matthew's correlation coefficient value defined a priori as a threshold for acceptable model quality on the test sets. The best models were obtained for endpoints mortality (MORT), ActivityScore and JAW (deformation). The low predictability of the QSAR model developed from the zebrafish embryotoxicity data in the database is mainly due to a higher sensitivity of 19 measurements of endpoints carried out on dechorionated embryos at low concentrations.
Collapse
Affiliation(s)
- Mario Lovrić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (R.K.)
- Ruđer Bošković Institute, P.O. Box 180, 10002 Zagreb, Croatia;
| | - Olga Malev
- Ruđer Bošković Institute, P.O. Box 180, 10002 Zagreb, Croatia;
- Department of Biology, Faculty of Science, University of Zagreb, Rooseveltov Trg 6, 10000 Zagreb, Croatia;
| | - Göran Klobučar
- Department of Biology, Faculty of Science, University of Zagreb, Rooseveltov Trg 6, 10000 Zagreb, Croatia;
| | - Roman Kern
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (R.K.)
- Institute of Interactive Systems and Data Science, TU Graz, Inffeldgasse 16c, 8010 Graz, Austria
| | - Jay J. Liu
- Department of Chemical Engineering, Pukyong National University, Busan 608-739, Korea
| | - Bono Lučić
- Ruđer Bošković Institute, P.O. Box 180, 10002 Zagreb, Croatia;
| |
Collapse
|