1
|
Yakovyna V, Shakhovska N, Szpakowska A. A novel hybrid supervised and unsupervised hierarchical ensemble for COVID-19 cases and mortality prediction. Sci Rep 2024; 14:9782. [PMID: 38684770 PMCID: PMC11059164 DOI: 10.1038/s41598-024-60637-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 04/25/2024] [Indexed: 05/02/2024] Open
Abstract
Though COVID-19 is no longer a pandemic but rather an endemic, the epidemiological situation related to the SARS-CoV-2 virus is developing at an alarming rate, impacting every corner of the world. The rapid escalation of the coronavirus has led to the scientific community engagement, continually seeking solutions to ensure the comfort and safety of society. Understanding the joint impact of medical and non-medical interventions on COVID-19 spread is essential for making public health decisions that control the pandemic. This paper introduces two novel hybrid machine-learning ensembles that combine supervised and unsupervised learning for COVID-19 data classification and regression. The study utilizes publicly available COVID-19 outbreak and potential predictive features in the USA dataset, which provides information related to the outbreak of COVID-19 disease in the US, including data from each of 3142 US counties from the beginning of the epidemic (January 2020) until June 2021. The developed hybrid hierarchical classifiers outperform single classification algorithms. The best-achieved performance metrics for the classification task were Accuracy = 0.912, ROC-AUC = 0.916, and F1-score = 0.916. The proposed hybrid hierarchical ensemble combining both supervised and unsupervised learning allows us to increase the accuracy of the regression task by 11% in terms of MSE, 29% in terms of the area under the ROC, and 43% in terms of the MPP metric. Thus, using the proposed approach, it is possible to predict the number of COVID-19 cases and deaths based on demographic, geographic, climatic, traffic, public health, social-distancing-policy adherence, and political characteristics with sufficiently high accuracy. The study reveals that virus pressure is the most important feature in COVID-19 spread for classification and regression analysis. Five other significant features were identified to have the most influence on COVID-19 spread. The combined ensembling approach introduced in this study can help policymakers design prevention and control measures to avoid or minimize public health threats in the future.
Collapse
Affiliation(s)
- Vitaliy Yakovyna
- Faculty of Mathematics and Computer Science, University of Warmia and Mazury in Olsztyn, Ul. Oczapowskiego 2, 10-719, Olsztyn, Poland
- Artificial Intelligence Department, Lviv Polytechnic National University, 12 S. Bandery St, Lviv, 79013, Ukraine
| | - Nataliya Shakhovska
- Artificial Intelligence Department, Lviv Polytechnic National University, 12 S. Bandery St, Lviv, 79013, Ukraine.
- Universytet Rolniczy, 31120, Kraków, Poland.
| | - Aleksandra Szpakowska
- Faculty of Mathematics and Computer Science, University of Warmia and Mazury in Olsztyn, Ul. Oczapowskiego 2, 10-719, Olsztyn, Poland
| |
Collapse
|
2
|
Yin Y, Ahmadianfar I, Karim FK, Elmannai H. Advanced forecasting of COVID-19 epidemic: Leveraging ensemble models, advanced optimization, and decomposition techniques. Comput Biol Med 2024; 175:108442. [PMID: 38678939 DOI: 10.1016/j.compbiomed.2024.108442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Revised: 03/25/2024] [Accepted: 04/07/2024] [Indexed: 05/01/2024]
Abstract
In the global effort to address the outbreak of the new coronavirus pneumonia (COVID-19) pandemic, accurate forecasting of epidemic patterns has become crucial for implementing successful interventions aimed at preventing and controlling the spread of the disease. The correct prediction of the course of COVID-19 outbreaks is a complex and challenging task, mainly because of the significant volatility in the data series related to COVID-19. Previous studies have been limited by the exclusive use of individual forecasting techniques in epidemic modeling, disregarding the integration of diverse prediction procedures. The lack of attention to detail in this situation can yield worse-than-ideal results. Consequently, this study introduces a novel ensemble framework that integrates three machine learning methods (kernel ridge regression (KRidge), Deep random vector functional link (dRVFL), and ridge regression) within a linear relationship (L-KRidge-dRVFL-Ridge). The optimization of this framework is accomplished through a distinctive approach, specifically adaptive differential evolution and particle swarm optimization (A-DEPSO). Moreover, an effective decomposition method, known as time-varying filter empirical mode decomposition (TVF-EMD), is employed to decompose the input variables. A feature selection technique, specifically using the light gradient boosting machine (LGBM), is also implemented to extract the most influential input variables. The daily datasets of COVID-19 collected from two countries, namely Italy and Poland, were used as the experimental examples. Additionally, all models are implemented to forecast COVID-19 at two-time horizons: 10- and 14-day ahead (t+10 and t+14). According to the results, the proposed model can yield higher correlation coefficient (R) for both case studies: Italy (t+10 = 0.965, t+14 = 0.961) and Poland (t+10 = 0.952, t+14 = 0.940) than the other models. The experimental results demonstrate that the model suggested in this paper has outstanding results in various kinds of complex epidemic prediction situations. The proposed ensemble model demonstrates exceptional accuracy and resilience, outperforming all similar models in terms of efficacy.
Collapse
Affiliation(s)
- Yingyu Yin
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510665, China.
| | - Iman Ahmadianfar
- Information and Communication Technology Research Group, Scientific Research Center, Al-Ayen University, Thi-Qar, Nasiriyah, 64001, Iraq.
| | - Faten Khalid Karim
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O.BOX 84428, Riyadh 11671, Saudi Arabia.
| | - Hela Elmannai
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O.BOX 84428, Riyadh 11671, Saudi Arabia.
| |
Collapse
|
3
|
Liu J, Zhu A, Wang X, Zhou X, Chen L. Predicting the current fishable habitat distribution of Antarctic toothfish ( Dissostichus mawsoni) and its shift in the future under climate change in the Southern Ocean. PeerJ 2024; 12:e17131. [PMID: 38563000 PMCID: PMC10984185 DOI: 10.7717/peerj.17131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 02/27/2024] [Indexed: 04/04/2024] Open
Abstract
Global warming continues to exert unprecedented impacts on marine habitats. Species distribution models (SDMs) are proven powerful in predicting habitat distribution for marine demersal species under climate change impacts. The Antarctic toothfish, Dissostichus mawsoni (Norman 1937), an ecologically and commercially significant species, is endemic to the Southern Ocean. Utilizing occurrence records and environmental data, we developed an ensemble model that integrates various modelling techniques. This model characterizes species-environment relationships and predicts current and future fishable habitats of D. mawsoni under four climate change scenarios. Ice thickness, depth and mean water temperature were the top three important factors in affecting the distribution of D. mawsoni. The ensemble prediction suggests an overall expansion of fishable habitats, potentially due to the limited occurrence records from fishery-dependent surveys. Future projections indicate varying degrees of fishable habitat loss in large areas of the Amery Ice Shelf's eastern and western portions. Suitable fishable habitats, including the spawning grounds in the seamounts around the northern Ross Sea and the coastal waters of the Bellingshausen Sea and Amundsen Sea, were persistent under present and future environmental conditions, highlighting the importance to protect these climate refugia from anthropogenic disturbance. Though data deficiency existed in this study, our predictions can provide valuable information for designing climate-adaptive development and conservation strategies in maintaining the sustainability of this species.
Collapse
Affiliation(s)
- Jie Liu
- Planning and Sea Island Department, Shandong Marine Forecast and Hazard Mitigation Service, Qingdao, Shandong, China
| | - Ancheng Zhu
- Planning and Sea Island Department, Shandong Marine Forecast and Hazard Mitigation Service, Qingdao, Shandong, China
| | - Xitao Wang
- Planning and Sea Island Department, Shandong Marine Forecast and Hazard Mitigation Service, Qingdao, Shandong, China
| | - Xiangjun Zhou
- Planning and Sea Island Department, Shandong Marine Forecast and Hazard Mitigation Service, Qingdao, Shandong, China
| | - Lu Chen
- Planning and Sea Island Department, Shandong Marine Forecast and Hazard Mitigation Service, Qingdao, Shandong, China
- Ocean University of China, College of Marine Life Sciences, Qingdao, Shandong, China
| |
Collapse
|
4
|
Hussain S, Aslam W, Mehmood A, Choi GS, Ashraf I. A machine learning based framework for IoT devices identification using web traffic. PeerJ Comput Sci 2024; 10:e1834. [PMID: 38660201 PMCID: PMC11041939 DOI: 10.7717/peerj-cs.1834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 01/02/2024] [Indexed: 04/26/2024]
Abstract
Identification of the Internet of Things (IoT) devices has become an essential part of network management to secure the privacy of smart homes and offices. With its wide adoption in the current era, IoT has facilitated the modern age in many ways. However, such proliferation also has associated privacy and data security risks. In the case of smart homes and smart offices, unknown IoT devices increase vulnerabilities and chances of data theft. It is essential to identify the connected devices for secure communication. It is very difficult to maintain the list of rules when the number of connected devices increases and human involvement is necessary to check whether any intruder device has approached the network. Therefore, it is required to automate device identification using machine learning methods. In this article, we propose an accuracy boosting model (ABM) using machine learning models of random forest and extreme gradient boosting. Featuring engineering techniques are employed along with cross-validation to accurately identify IoT devices such as lights, smoke detectors, thermostat, motion sensors, baby monitors, socket, TV, security cameras, and watches. The proposed ensemble model utilizes random forest (RF) and extreme gradient boosting (XGB) as base learners with adaptive boosting. The proposed ensemble model is tested with extensive experiments involving the IoT Device Identification dataset from a public repository. Experimental results indicate a higher accuracy of 91%, precision of 93%, recall of 93%, and F1 score of 93%.
Collapse
Affiliation(s)
- Sajjad Hussain
- Department of Information Security, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Waqar Aslam
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Arif Mehmood
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Gyu Sang Choi
- Information and Communication Engineering, Yeungnam University, Gyeongsan, Republic of Korea
| | - Imran Ashraf
- Information and Communication Engineering, Yeungnam University, Gyeongsan, Republic of Korea
| |
Collapse
|
5
|
Kim C, Park JH, Lee JY. AI-based betting anomaly detection system to ensure fairness in sports and prevent illegal gambling. Sci Rep 2024; 14:6470. [PMID: 38499635 PMCID: PMC10948790 DOI: 10.1038/s41598-024-57195-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 03/15/2024] [Indexed: 03/20/2024] Open
Abstract
This study develops a solution to sports match-fixing using various machine-learning models to detect match-fixing anomalies, based on betting odds. We use five models to distinguish between normal and abnormal matches: logistic regression (LR), random forest (RF), support vector machine (SVM), the k-nearest neighbor (KNN) classification, and the ensemble model-a model optimized from the previous four. The models classify normal and abnormal matches by learning their patterns using sports betting odds data. The database was developed based on the world football league match betting data of 12 betting companies, which offered a vast collection of data on players, teams, game schedules, and league rankings for football matches. We develop an abnormal match detection model based on the data analysis results of each model, using the match result dividend data. We then use data from real-time matches and apply the five models to construct a system capable of detecting match-fixing in real time. The RF, KNN, and ensemble models recorded a high accuracy, over 92%, whereas the LR and SVM models were approximately 80% accurate. In comparison, previous studies have used a single model to examine football match betting odds data, with an accuracy of 70-80%.
Collapse
Affiliation(s)
- Changgyun Kim
- Department of Artificial Intelligence & Software, Kangwon National University, Samcheok, 25913, Republic of Korea
| | - Jae-Hyeon Park
- Center for Sports and Performance Analysis, Korea National Sport University, Seoul, 05541, Republic of Korea
| | - Ji-Yong Lee
- Center for Sports and Performance Analysis, Korea National Sport University, Seoul, 05541, Republic of Korea.
| |
Collapse
|
6
|
Brar AS, Singh K. A multi-objective stacked regression method for distance based colour measuring device. Sci Rep 2024; 14:5530. [PMID: 38448462 PMCID: PMC10918078 DOI: 10.1038/s41598-024-54785-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 02/16/2024] [Indexed: 03/08/2024] Open
Abstract
Identifying colour from a distance is challenging due to the external noise associated with the measurement process. The present study focuses on developing a colour measuring system and a novel Multi-target Regression (MTR) model for accurate colour measurement from distance. Herein, a novel MTR method, referred as Multi-Objective Stacked Regression (MOSR) is proposed. The core idea behind MOSR is based on stacking as an ensemble approach with multi-objective evolutionary learning using NSGA-II. A multi-objective optimization approach is used for selecting base learners that maximises prediction accuracy while minimising ensemble complexity, which is further compared with six state-of-the-art methods over the colour dataset. Classification and regression tree (CART), Random Forest (RF) and Support Vector Machine (SVM) were used as regressor algorithms. MOSR outperformed all compared methods with the highest coefficient of determination values for all three targets of the colour dataset. Rigorous comparison with state-of-the-art methods over 18 benchmarked datasets showed MOSR outperformed in 15 datasets when CART was used as a regressor algorithm and 11 datasets when RF and SVM were used as regressor algorithms. The MOSR method was statistically superior to compared methods and can be effectively used to measure accurate colour values in the distance-based colour measuring device.
Collapse
Affiliation(s)
- Amrinder Singh Brar
- Department of Computer Science and Engineering, Punjabi University, Patiala, 147002, India.
| | - Kawaljeet Singh
- University Computer Centre, Punjabi University, Patiala, 147002, India
| |
Collapse
|
7
|
He X, Yang Z, Wang L, Sun Y, Cao H, Liang Y. NeuTox: A weighted ensemble model for screening potential neuronal cytotoxicity of chemicals based on various types of molecular representations. J Hazard Mater 2024; 465:133443. [PMID: 38198870 DOI: 10.1016/j.jhazmat.2024.133443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 01/02/2024] [Accepted: 01/03/2024] [Indexed: 01/12/2024]
Abstract
Chemical-induced neurotoxicity has been widely brought into focus in the risk assessment of chemical safety. However, the traditional in vivo animal models to evaluate neurotoxicity are time-consuming and expensive, which cannot completely represent the pathophysiology of neurotoxicity in humans. Cytotoxicity to human neuroblastoma cell line (SH-SY5Y) is commonly used as an alternative to animal testing for the assessment of neurotoxicity, yet it is still not appropriate for high throughput screening of potential neuronal cytotoxicity of chemicals. In this study, we constructed an ensemble prediction model, termed NeuTox, by combining multiple machine learning algorithms with molecular representations based on the weighted score of Particle Swarm Optimization. For the test set, NeuTox shows excellent performance with an accuracy of 0.9064, which are superior to the top-performing individual models. The subsequent experimental verifications reveal that 5,5'-isopropylidenedi-2-biphenylol and 4,4'-cyclo-hexylidenebisphenol exhibited stronger SH-SY5Y-based cytotoxicity compared to bisphenol A, suggesting that NeuTox has good generalization ability in the first-tier assessment of neuronal cytotoxicity of BPA analogs. For ease of use, NeuTox is presented as an online web server that can be freely accessed via http://www.iehneutox-predictor.cn/NeuToxPredict/Predict.
Collapse
Affiliation(s)
- Xuejun He
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Zeguo Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ling Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yuzhen Sun
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Huiming Cao
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China.
| | - Yong Liang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| |
Collapse
|
8
|
Byeon DH, Lee WH. Ensemble evaluation of potential distribution of Procambarus clarkii using multiple species distribution models. Oecologia 2024; 204:589-601. [PMID: 38386057 DOI: 10.1007/s00442-024-05516-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 01/20/2024] [Indexed: 02/23/2024]
Abstract
Procambarus clarkii is a notorious invasive species that has led to ecological concerns owing to its high viability and rapid reproduction. South Korea, a country exposed to a high risk of introduction of invasive species due to active international trade, has suffered from recent massive invasions by invasive species, necessitating the evaluation of potential areas requiring intensive monitoring. In this study, we developed two different types of species distribution models, CLIMEX and random forest, for P. clarkii using occurrence records from the United States. The potential distribution in the United States was predicted along coastal lines and inland regions located below 40°N latitude The model was then applied to evaluate the potential distribution in South Korea, and an ensemble map was constructed to identify the most vulnerable domestic regions. According to both models, the domestic potential distribution was highest in most areas located at low altitudes. In the ensemble model, most of the low-altitude western regions, the eastern coast, and some southern inland regions were predicted to be suitable for the distribution of P. clarkii, and a similar distribution pattern was predicted when the model was projected into the future climate. Through this study, it is possible to secure basic data that can be used for the early monitoring of the introduction and subsequent distribution of P. clarkii.
Collapse
Affiliation(s)
- Dae-Hyeon Byeon
- Department of Biosystems Machinery Engineering, Chungnam National University, Daejeon, 34134, Korea
| | - Wang-Hee Lee
- Department of Biosystems Machinery Engineering, Chungnam National University, Daejeon, 34134, Korea.
- Department of Smart Agriculture Systems, Chungnam National University, Daejeon, 34134, Korea.
| |
Collapse
|
9
|
Jin Z, Zhao H, Xian X, Li M, Qi Y, Guo J, Yang N, Lü Z, Liu W. Early warning and management of invasive crop pests under global warming: estimating the global geographical distribution patterns and ecological niche overlap of three Diabrotica beetles. Environ Sci Pollut Res Int 2024; 31:13575-13590. [PMID: 38253826 DOI: 10.1007/s11356-024-32076-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 01/15/2024] [Indexed: 01/24/2024]
Abstract
Invasive alien pests (IAPs) pose a major threat to global agriculture and food production. When multiple IAPs coexist in the same habitat and use the same resources, the economic loss to local agricultural production increases. Many species of the Diabrotica genus, such as Diabrotica barberi, Diabrotica undecimpunctata, and Diabrotica virgifera, originating from the USA and Mexico, seriously damaged maize production in North America and Europe. However, the potential geographic distributions (PGDs) and degree of ecological niche overlap among the three Diabrotica beetles remain unclear; thus, the potential coexistence zone is unknown. Based on environmental and species occurrence data, we used an ensemble model (EM) to predict the PGDs and overlapping PGD of the three Diabrotica beetles. The n-dimensional hypervolumes concept was used to explore the degree of niche overlap among the three species. The EM showed better reliability than the individual models. According to the EM results, the PGDs and overlapping PGD of the three Diabrotica beetles were mainly distributed in North America, Europe, and Asia. Under the current scenario, D. virgifera has the largest PGD ranges (1615 × 104 km2). In the future, the PGD of this species will expand further and reach a maximum under the SSP5-8.5 scenario in the 2050s (2499 × 104 km2). Diabrotica virgifera showed the highest potential for invasion under the current and future global warming scenarios. Among the three studied species, the degree of ecological niche overlap was the highest for D. undecimpunctata and D. virgifera, with the highest similarity in the PGD patterns and maximum coexistence range. Under global warming, the PGDs of the three Diabrotica beetles are expected to expand to high latitudes. Identifying the PGDs of the three Diabrotica beetles provides an important reference for quarantine authorities in countries at risk of invasion worldwide to develop specific preventive measures against pests.
Collapse
Affiliation(s)
- Zhenan Jin
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing, 100193, China
| | - Haoxiang Zhao
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing, 100193, China
| | - Xiaoqing Xian
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing, 100193, China
| | - Ming Li
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing, 100193, China
| | - Yuhan Qi
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing, 100193, China
| | - Jianyang Guo
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing, 100193, China
| | - Nianwan Yang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing, 100193, China
- Institute of Western Agriculture, Chinese Academy of Agricultural Sciences, Changji, 831100, China
| | - Zhichuang Lü
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing, 100193, China
| | - Wanxue Liu
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing, 100193, China.
| |
Collapse
|
10
|
Tache IA, Hatfaludi CA, Puiu A, Itu LM, Popa-Fotea NM, Calmac L, Scafa-Udriste A. Assessment of the functional severity of coronary lesions from optical coherence tomography based on ensembled learning. Biomed Eng Online 2023; 22:127. [PMID: 38104144 PMCID: PMC10724936 DOI: 10.1186/s12938-023-01192-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 12/07/2023] [Indexed: 12/19/2023] Open
Abstract
BACKGROUND Atherosclerosis is one of the most frequent cardiovascular diseases. The dilemma faced by physicians is whether to treat or postpone the revascularization of lesions that fall within the intermediate range given by an invasive fractional flow reserve (FFR) measurement. The paper presents a monocentric study for lesions significance assessment that can potentially cause ischemia on the large coronary arteries. METHODS A new dataset is acquired, comprising the optical coherence tomography (OCT) images, clinical parameters, echocardiography and FFR measurements collected from 80 patients with 102 lesions, with stable multivessel coronary artery disease. Having the ground truth given by the invasive FFR measurement, the dataset is challenging because almost 40% of the lesions are in the gray zone, having an FFR value between 0.75 and 0.85. Twenty-six features are extracted from OCT images, clinical characteristics, and echocardiography and the most relevant are identified by examining the models' accuracy. An ensembled learning is performed for solving the binary classification problem of lesion significance considering the leave-one-out cross-validation approach. RESULTS Ensemble models are designed from the multi-features voting from 5 features models by prediction aggregation with a maximum accuracy of 81.37% and a maximum area under the curve score (AUC) of 0.856. CONCLUSIONS The proposed explainable supervised learning-based lesion classification is a new method that can be improved by training with a larger multicenter dataset for further designing a tool for guiding the decision making of the clinician for the cases outside the gray zone and for the other situation extra clinical information about the lesion is needed.
Collapse
Affiliation(s)
- Irina-Andra Tache
- Department of Automatic Control and Systems Engineering, University Politehnica of Bucharest, Bucharest, Romania.
- Siemens Advanta SRL, 15 Noiembrie Bvd, 500097, Brasov, Romania.
- Romanian Academy of Scientists, Bucharest, Romania.
| | - Cosmin-Andrei Hatfaludi
- Siemens Advanta SRL, 15 Noiembrie Bvd, 500097, Brasov, Romania
- Department of Automation and Information Technology, Transilvania University of Brasov, Mihai Viteazu Nr. 5, 5000174, Brasov, Romania
| | - Andrei Puiu
- Siemens Advanta SRL, 15 Noiembrie Bvd, 500097, Brasov, Romania
- Department of Automation and Information Technology, Transilvania University of Brasov, Mihai Viteazu Nr. 5, 5000174, Brasov, Romania
| | - Lucian Mihai Itu
- Siemens Advanta SRL, 15 Noiembrie Bvd, 500097, Brasov, Romania
- Department of Automation and Information Technology, Transilvania University of Brasov, Mihai Viteazu Nr. 5, 5000174, Brasov, Romania
- Romanian Academy of Scientists, Bucharest, Romania
| | - Nicoleta-Monica Popa-Fotea
- Department of Cardiology, Emergency Clinical Hospital, 8 Calea Floreasca, 014461, Bucharest, Romania
- Department Cardio-Thoracic, University of Medicine and Pharmacy "Carol Davila", 8 Eroii Sanitari, 050474, Bucharest, Romania
| | - Lucian Calmac
- Department of Cardiology, Emergency Clinical Hospital, 8 Calea Floreasca, 014461, Bucharest, Romania
- Department Cardio-Thoracic, University of Medicine and Pharmacy "Carol Davila", 8 Eroii Sanitari, 050474, Bucharest, Romania
| | - Alexandru Scafa-Udriste
- Department of Cardiology, Emergency Clinical Hospital, 8 Calea Floreasca, 014461, Bucharest, Romania
- Department Cardio-Thoracic, University of Medicine and Pharmacy "Carol Davila", 8 Eroii Sanitari, 050474, Bucharest, Romania
| |
Collapse
|
11
|
Wang S, Lin M, Meng Y, Jiang T, Fan F, Wang S. Self-expansion full information optimization strategy: Convenient and efficient method for near infrared spectrum auto-analysis. Spectrochim Acta A Mol Biomol Spectrosc 2023; 303:123224. [PMID: 37603976 DOI: 10.1016/j.saa.2023.123224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/06/2023] [Accepted: 07/31/2023] [Indexed: 08/23/2023]
Abstract
An essential step in the application of near infrared spectroscopy technology is the spectrum preprocessing. A reasonable implementation of it ensures that the effective spectral information is correctly extracted and, also that the model's accuracy is increased. However, some analysts' research still uses the manual approach of trial and error, particularly those less skilled ones. Previous papers have provided preprocessing optimization algorithms for NIR, but there are still some problems that need to be resolved, such as the unwieldy sequence determination of preprocessing method or, the fluctuated optimization outcomes or, lack of sufficient statistical information. This research suggests a spectrum auto-analysis methodology named self-expansion full information optimization strategy, a new powerful open-source technique for concurrently addressing all of these above issues simultaneously. For the first time in the field of chemometrics, this algorithm offers a reliable and effective automatic near infrared auto-modelling method based on the statistical informatics. With the aid of its built-in modules, such as information generators, spectrum processors, etc., it is able to fully search the common preprocessing techniques, which is determined by Monte Carlo cross validation. Then the final ensemble calibration model is built by employing the optimized preprocessing schemes, along with the wavelength variables screening algorithm. The optimization strategy can offer the user objective useful statistics information created throughout the modeling process to further examine the model's effectiveness. The results demonstrate that the suggested method can easily and successfully extract spectrum information and develop calibration models by putting it to the test on two groups of actual near-infrared spectral data. Additionally, this optimization strategy can also be applied to other spectrum analysis areas, such Raman spectroscopy or infrared spectroscopy, by changing a few of its parameters, and has extraordinary application value.
Collapse
Affiliation(s)
- Shenghao Wang
- School of Electronic and Information Engineering, Zhongyuan University of Technology, Zhengzhou, China.
| | - Manman Lin
- School of Electronic and Information Engineering, Zhongyuan University of Technology, Zhengzhou, China
| | - Yanhong Meng
- School of Electronic and Information Engineering, Zhongyuan University of Technology, Zhengzhou, China
| | - Tao Jiang
- School of Electronic and Information Engineering, Zhongyuan University of Technology, Zhengzhou, China
| | - Fuling Fan
- School of Electronic and Information Engineering, Zhongyuan University of Technology, Zhengzhou, China
| | - Shuanghong Wang
- School of Electronic and Information Engineering, Zhongyuan University of Technology, Zhengzhou, China
| |
Collapse
|
12
|
K V, Al-onazi BB, Simic V, Tirkolaee EB, Jana C. DeepFND: an ensemble-based deep learning approach for the optimization and improvement of fake news detection in digital platform. PeerJ Comput Sci 2023; 9:e1666. [PMID: 38192452 PMCID: PMC10773750 DOI: 10.7717/peerj-cs.1666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 10/05/2023] [Indexed: 01/10/2024]
Abstract
Early identification of false news is now essential to save lives from the dangers posed by its spread. People keep sharing false information even after it has been debunked. Those responsible for spreading misleading information in the first place should face the consequences, not the victims of their actions. Understanding how misinformation travels and how to stop it is an absolute need for society and government. Consequently, the necessity to identify false news from genuine stories has emerged with the rise of these social media platforms. One of the tough issues of conventional methodologies is identifying false news. In recent years, neural network models' performance has surpassed that of classic machine learning approaches because of their superior feature extraction. This research presents Deep learning-based Fake News Detection (DeepFND). This technique has Visual Geometry Group 19 (VGG-19) and Bidirectional Long Short Term Memory (Bi-LSTM) ensemble models for identifying misinformation spread through social media. This system uses an ensemble deep learning (DL) strategy to extract characteristics from the article's text and photos. The joint feature extractor and the attention modules are used with an ensemble approach, including pre-training and fine-tuning phases. In this article, we utilized a unique customized loss function. In this research, we look at methods for detecting bogus news on the internet without human intervention. We used the Weibo, liar, PHEME, fake and real news, and Buzzfeed datasets to analyze fake and real news. Multiple methods for identifying fake news are compared and contrasted. Precision procedures have been used to calculate the proposed model's output. The model's 99.88% accuracy is better than expected.
Collapse
Affiliation(s)
- Venkatachalam K
- Department of Applied Cybernetics, University of Hradec Králové, Hradec Kralove, Czech Republic
| | - Badriyya B. Al-onazi
- Department of Language Preparation, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Vladimir Simic
- Faculty of Transport and Traffic Engineering, University of Belgrade, Belgrade, Serbia
- Department of Industrial Engineering and Management, Yuan Ze University, Taoyuan City, Taiwan
| | - Erfan Babaee Tirkolaee
- Department of Industrial Engineering, Istinye University, Istanbul, Turkey
- MEU Research Unit, Middle East University, Amman, Jordan
| | - Chiranjibe Jana
- Department of Applied Mathematics with Oceanology and Computer Programming, Vidyasagar University, Midnapore, India
| |
Collapse
|
13
|
Zamani MG, Nikoo MR, Jahanshahi S, Barzegar R, Meydani A. Forecasting water quality variable using deep learning and weighted averaging ensemble models. Environ Sci Pollut Res Int 2023; 30:124316-124340. [PMID: 37996598 DOI: 10.1007/s11356-023-30774-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 10/27/2023] [Indexed: 11/25/2023]
Abstract
Water quality variables, including chlorophyll-a (Chl-a), play a pivotal role in comprehending and evaluating the condition of aquatic ecosystems. Chl-a, a pigment present in diverse aquatic organisms, notably algae and cyanobacteria, serves as a valuable indicator of water quality. Thus, the objectives of this study encompass: (1) the assessment of the predictive capabilities of four deep learning (DL) models - namely, recurrent neural network (RNN), long short-term memory (LSTM), gated recurrence unit (GRU), and temporal convolutional network (TCN) - in forecasting Chl-a concentrations; (2) the incorporation of these DL models into ensemble models (EMs) employing genetic algorithm (GA) and non-dominated sorting genetic algorithm (NSGA-II) to harness the strengths of each standalone model; and (3) the evaluation of the efficacy of the developed EMs. Utilizing data collected at 15-min intervals from Small Prespa Lake (SPL) in Greece, the models employed hourly Chl-a concentration lag times, extending up to 6 h, as models' inputs to forecast Chla (t+1). The proposed models underwent training on 70% of the dataset and were subsequently validated on the remaining 30%. Among the standalone DL models, the GRU model exhibited superior performance in Chl-a forecasting, surpassing the RNN, LSTM, and TCN models by 8%, 2%, and 2%, respectively. Furthermore, the integration of DL models through single-objective GA and multi-objective NSGA-II optimization algorithms yielded hybrid models adept at effectively forecasting both low and high Chl-a concentrations. The ensemble model based on NSGA-II outperformed standalone DL models as well as the GA-based model across a range of evaluation indices. For instance, considering the R-squared metric, the study's findings demonstrated that the EM-NSGA-II stands out with exceptional effectiveness compared to DL and EM-GA models, showcasing improvements of 14% (RNN), 8% (LSTM), 6% (GRU), 8% (TCN), and 3% (EM-GA) during the testing phase.
Collapse
Affiliation(s)
- Mohammad G Zamani
- Department of Water Resources Engineering, Faculty of Civil, Water and Environmental Engineering, Amirkabir University of Technology, Tehran, Iran
| | - Mohammad Reza Nikoo
- Department of Civil and Architectural Engineering, Sultan Qaboos University, Muscat, Oman.
| | - Sina Jahanshahi
- Department of Water Resources Engineering, Faculty of Civil, Water and Environmental Engineering, University of Tehran, Tehran, Iran
| | - Rahim Barzegar
- Groundwater Research Group (GRES), Research Institute on Mines and Environment (RIME), Université du Québec en Abitibi-Témiscamingue (UQAT), Amos, Québec, Canada
| | - Amirreza Meydani
- Department of Geography and Spatial Sciences, University of Delaware, Newark, DE, USA
| |
Collapse
|
14
|
Liu M, Liu H, Wu T, Zhu Y, Zhou Y, Huang Z, Xiang C, Huang J. ACP-Dnnel: anti-coronavirus peptides' prediction based on deep neural network ensemble learning. Amino Acids 2023; 55:1121-1136. [PMID: 37402073 DOI: 10.1007/s00726-023-03300-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 06/25/2023] [Indexed: 07/05/2023]
Abstract
The ongoing COVID-19 pandemic has caused dramatic loss of human life. There is an urgent need for safe and efficient anti-coronavirus infection drugs. Anti-coronavirus peptides (ACovPs) can inhibit coronavirus infection. With high-efficiency, low-toxicity, and broad-spectrum inhibitory effects on coronaviruses, they are promising candidates to be developed into a new type of anti-coronavirus drug. Experiment is the traditional way of ACovPs' identification, which is less efficient and more expensive. With the accumulation of experimental data on ACovPs, computational prediction provides a cheaper and faster way to find anti-coronavirus peptides' candidates. In this study, we ensemble several state-of-the-art machine learning methodologies to build nine classification models for the prediction of ACovPs. These models were pre-trained using deep neural networks, and the performance of our ensemble model, ACP-Dnnel, was evaluated across three datasets and independent dataset. We followed Chou's 5-step rules. (1) we constructed the benchmark datasets data1, data2, and data3 for training and testing, and introduced the independent validation dataset ACVP-M; (2) we analyzed the peptides sequence composition feature of the benchmark dataset; (3) we constructed the ACP-Dnnel model with deep convolutional neural network (DCNN) merged the bi-directional long short-term memory (BiLSTM) as the base model for pre-training to extract the features embedded in the benchmark dataset, and then, nine classification algorithms were introduced to ensemble together for classification prediction and voting together; (4) tenfold cross-validation was introduced during the training process, and the final model performance was evaluated; (5) finally, we constructed a user-friendly web server accessible to the public at http://150.158.148.228:5000/ . The highest accuracy (ACC) of ACP-Dnnel reaches 97%, and the Matthew's correlation coefficient (MCC) value exceeds 0.9. On three different datasets, its average accuracy is 96.0%. After the latest independent dataset validation, ACP-Dnnel improved at MCC, SP, and ACC values 6.2%, 7.5% and 6.3% greater, respectively. It is suggested that ACP-Dnnel can be helpful for the laboratory identification of ACovPs, speeding up the anti-coronavirus peptide drug discovery and development. We constructed the web server of anti-coronavirus peptides' prediction and it is available at http://150.158.148.228:5000/ .
Collapse
Affiliation(s)
- Mingyou Liu
- School of Biology and Engineering, Guizhou Medical University, Guiyang, Guizhou, China
- School of Life Science and Technology, University of Electronic Science and Technology, Chengdu, Sichuan, China
| | - Hongmei Liu
- School of Biology and Engineering, Guizhou Medical University, Guiyang, Guizhou, China
| | - Tao Wu
- School of Biology and Engineering, Guizhou Medical University, Guiyang, Guizhou, China
| | - Yingxue Zhu
- School of Biology and Engineering, Guizhou Medical University, Guiyang, Guizhou, China
| | - Yuwei Zhou
- School of Life Science and Technology, University of Electronic Science and Technology, Chengdu, Sichuan, China
| | - Ziru Huang
- School of Life Science and Technology, University of Electronic Science and Technology, Chengdu, Sichuan, China
| | - Changcheng Xiang
- School of Computer Science and Technology, Aba Teachers University, Aba, Sichuan, China.
| | - Jian Huang
- School of Life Science and Technology, University of Electronic Science and Technology, Chengdu, Sichuan, China.
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, Sichuan, China.
| |
Collapse
|
15
|
Shahabi MS, Shalbaf A, Rostami R. Prediction of response to repetitive transcranial magnetic stimulation for major depressive disorder using hybrid Convolutional recurrent neural networks and raw Electroencephalogram Signal. Cogn Neurodyn 2023; 17:909-920. [PMID: 37522037 PMCID: PMC10374518 DOI: 10.1007/s11571-022-09881-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 08/03/2022] [Accepted: 08/28/2022] [Indexed: 11/30/2022] Open
Abstract
Major Depressive Disorder (MDD) is a high prevalence disease that needs an effective and timely treatment to prevent its progress and additional costs. Repetitive Transcranial Magnetic Stimulation (rTMS) is an effective treatment option for MDD patients which uses strong magnetic pulses to stimulate specific regions of the brain. However, some patients do not respond to this treatment which causes the waste of multiple weeks as treatment time and clinical resources. Therefore developing an effective way for the prediction of response to the rTMS treatment of depression is necessary. In this work, we proposed a hybrid model created by pre-trained Convolutional Neural Networks (CNN) models and Bidirectional Long Short-Term Memory (BLSTM) cells to predict response to rTMS treatment from raw EEG signal. Three pre-trained CNN models named VGG16, InceptionResNetV2, and EffecientNetB0 were utilized as Transfer Learning (TL) models to construct hybrid TL-BLSTM models. Then an ensemble of these models was created using weighted majority voting which the weights were optimized by Differential Evolution (DE) optimization algorithm. Evaluation of these models shows the superior performance of the ensemble model by the accuracy of 98.51%, sensitivity of 98.64%, specificity of 98.36%, F1-score of 98.6%, and AUC of 98.5%. Therefore, the ensemble of the proposed hybrid convolutional recurrent networks can efficiently predict the treatment outcome of rTMS using raw EEG data.
Collapse
Affiliation(s)
- Mohsen Sadat Shahabi
- Department of Biomedical Engineering and Medical Physics, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ahmad Shalbaf
- Department of Biomedical Engineering and Medical Physics, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Reza Rostami
- Department of Psychology, University of Tehran, Tehran, Iran
| |
Collapse
|
16
|
Chen J, Engelhard M, Henao R, Berchuck S, Eichner B, Perrin EM, Sapiro G, Dawson G. Enhancing early autism prediction based on electronic records using clinical narratives. J Biomed Inform 2023; 144:104390. [PMID: 37182592 PMCID: PMC10526711 DOI: 10.1016/j.jbi.2023.104390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 04/14/2023] [Accepted: 05/09/2023] [Indexed: 05/16/2023]
Abstract
Recent work has shown that predictive models can be applied to structured electronic health record (EHR) data to stratify autism likelihood from an early age (<1 year). Integrating clinical narratives (or notes) with structured data has been shown to improve prediction performance in other clinical applications, but the added predictive value of this information in early autism prediction has not yet been explored. In this study, we aimed to enhance the performance of early autism prediction by using both structured EHR data and clinical narratives. We built models based on structured data and clinical narratives separately, and then an ensemble model that integrated both sources of data. We assessed the predictive value of these models from Duke University Health System over a 14-year span to evaluate ensemble models predicting later autism diagnosis (by age 4 years) from data collected from ages 30 to 360 days. Our sample included 11,750 children above by age 3 years (385 meeting autism diagnostic criteria). The ensemble model for autism prediction showed superior performance and at age 30 days achieved 46.8% sensitivity (95% confidence interval, CI: 22.0%, 52.9%), 28.0% positive predictive value (PPV) at high (90%) specificity (CI: 2.0%, 33.1%), and AUC4 (with at least 4-year follow-up for controls) reaching 0.769 (CI: 0.715, 0.811). Prediction by 360 days achieved 44.5% sensitivity (CI: 23.6%, 62.9%), and 13.7% PPV at high (90%) specificity (CI: 9.6%, 18.9%), and AUC4 reaching 0.797 (CI: 0.746, 0.840). Results show that incorporating clinical narratives in early autism prediction achieved promising accuracy by age 30 days, outperforming models based on structured data only. Furthermore, findings suggest that additional features learned from clinician narratives might be hypothesis generating for understanding early development in autism.
Collapse
Affiliation(s)
- Junya Chen
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27705, United States.
| | - Matthew Engelhard
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27705, United States
| | - Ricardo Henao
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27705, United States
| | - Samuel Berchuck
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27705, United States
| | - Brian Eichner
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27705, United States
| | - Eliana M Perrin
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27705, United States
| | - Guillermo Sapiro
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27705, United States
| | - Geraldine Dawson
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27705, United States
| |
Collapse
|
17
|
Dai TY, Radhakrishnan P, Nweye K, Estrada R, Niyogi D, Nagy Z. Analyzing the impact of COVID-19 on the electricity demand in Austin, TX using an ensemble-model based counterfactual and 400,000 smart meters. Comput Urban Sci 2023; 3:20. [PMID: 37192956 PMCID: PMC10162906 DOI: 10.1007/s43762-023-00095-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Revised: 03/17/2023] [Accepted: 03/29/2023] [Indexed: 05/18/2023]
Abstract
The COVID-19 pandemic caused lifestyle changes and has led to the new electricity demand patterns in the presence of non-pharmaceutical interventions such as work-from-home policy and lockdown. Quantifying the effect on electricity demand is critical for future electricity market planning yet challenging in the context of limited smart metered buildings, which leads to limited understanding of the temporal and spatial variations in building energy use. This study uses a large scale private smart meter electricity demand data from the City of Austin, combined with publicly available environmental data, and develops an ensemble regression model for long term daily electricity demand prediction. Using 15-min resolution data from over 400,000 smart meters from 2018 to 2020 aggregated by building type and zip code, our proposed model precisely formalizes the counterfactual universe in the without COVID-19 scenario. The model is used to understand building electricity demand changes during the pandemic and to identify relationships between such changes and socioeconomic patterns. Results indicate the increase in residential usage , demonstrating the spatial redistribution of energy consumption during the work-from-home period. Our experiments demonstrate the effectiveness of our proposed framework by assessing multiple socioeconomic impacts with the comparison between the counterfactual universe and observations.
Collapse
Affiliation(s)
- Ting-Yu Dai
- Department of Civil, Environmental and Architectural Engineering, The University of Texas at Austin, Austin, 78712-1700 Texas USA
| | - Praveen Radhakrishnan
- Department of Civil, Environmental and Architectural Engineering, The University of Texas at Austin, Austin, 78712-1700 Texas USA
| | - Kingsley Nweye
- Department of Civil, Environmental and Architectural Engineering, The University of Texas at Austin, Austin, 78712-1700 Texas USA
| | - Robert Estrada
- Department of Civil, Environmental and Architectural Engineering, The University of Texas at Austin, Austin, 78712-1700 Texas USA
| | - Dev Niyogi
- Department of Civil, Environmental and Architectural Engineering, The University of Texas at Austin, Austin, 78712-1700 Texas USA
| | - Zoltan Nagy
- Department of Civil, Environmental and Architectural Engineering, The University of Texas at Austin, Austin, 78712-1700 Texas USA
| |
Collapse
|
18
|
Kim G, Park YM, Yoon HJ, Choi JH. A multi-kernel and multi-scale learning based deep ensemble model for predicting recurrence of non-small cell lung cancer. PeerJ Comput Sci 2023; 9:e1311. [PMID: 37346527 PMCID: PMC10280639 DOI: 10.7717/peerj-cs.1311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 03/06/2023] [Indexed: 06/23/2023]
Abstract
Predicting recurrence in patients with non-small cell lung cancer (NSCLC) before treatment is vital for guiding personalized medicine. Deep learning techniques have revolutionized the application of cancer informatics, including lung cancer time-to-event prediction. Most existing convolutional neural network (CNN) models are based on a single two-dimensional (2D) computational tomography (CT) image or three-dimensional (3D) CT volume. However, studies have shown that using multi-scale input and fusing multiple networks provide promising performance. This study proposes a deep learning-based ensemble network for recurrence prediction using a dataset of 530 patients with NSCLC. This network assembles 2D CNN models of various input slices, scales, and convolutional kernels, using a deep learning-based feature fusion model as an ensemble strategy. The proposed framework is uniquely designed to benefit from (i) multiple 2D in-plane slices to provide more information than a single central slice, (ii) multi-scale networks and multi-kernel networks to capture the local and peritumoral features, (iii) ensemble design to integrate features from various inputs and model architectures for final prediction. The ensemble of five 2D-CNN models, three slices, and two multi-kernel networks, using 5 × 5 and 6 × 6 convolutional kernels, achieved the best performance with an accuracy of 69.62%, area under the curve (AUC) of 72.5%, F1 score of 70.12%, and recall of 70.81%. Furthermore, the proposed method achieved competitive results compared with the 2D and 3D-CNN models for cancer outcome prediction in the benchmark studies. Our model is also a potential adjuvant treatment tool for identifying NSCLC patients with a high risk of recurrence.
Collapse
Affiliation(s)
- Gihyeon Kim
- Department of Computational Medicine, Graduate Program in System Health Science and Engineering, Ewha Womans University, Seoul, South Korea
| | - Young Mi Park
- Department of Molecular Medicine, College of Medicine, Ewha Womans University, Seoul, South Korea
| | - Hyun Jung Yoon
- Department of Radiology, Veterans Health Service Medical Center, Seoul, South Korea
| | - Jang-Hwan Choi
- Division of Mechanical and Biomedical Engineering, Graduate Program in System Health Science and Engineering, Ewha Womans University, Seoul, South Korea
- Department of Artificial Intelligence, Ewha Womans University, Seoul, South Korea
| |
Collapse
|
19
|
Yang SQ, Zhang LX, Ge YJ, Zhang JW, Hu JX, Shen CY, Lu AP, Hou TJ, Cao DS. In-silico target prediction by ensemble chemogenomic model based on multi-scale information of chemical structures and protein sequences. J Cheminform 2023; 15:48. [PMID: 37088813 PMCID: PMC10123967 DOI: 10.1186/s13321-023-00720-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 04/08/2023] [Indexed: 04/25/2023] Open
Abstract
Identification and validation of bioactive small-molecule targets is a significant challenge in drug discovery. In recent years, various in-silico approaches have been proposed to expedite time- and resource-consuming experiments for target detection. Herein, we developed several chemogenomic models for target prediction based on multi-scale information of chemical structures and protein sequences. By combining the information of a compound with multiple protein targets together and putting these compound-target pairs into a well-established model, the scores to indicate whether there are interactions between compounds and targets can be derived, and thus a target prediction task can be completed by sorting the outputted scores. To improve the prediction performance, we constructed several chemogenomic models using multi-scale information of chemical structures and protein sequences, and the ensemble model with the best performance was used as our final model. The model was validated by various strategies and external datasets and the promising target prediction capability of the model, i.e., the fraction of known targets identified in the top-k (1 to 10) list of the potential target candidates suggested by the model, was confirmed. Compared with multiple state-of-art target prediction methods, our model showed equivalent or better predictive ability in terms of the top-k predictions. It is expected that our method can be utilized as a powerful computational tool to narrow down the potential targets for experimental testing.
Collapse
Affiliation(s)
- Su-Qing Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Liu-Xia Zhang
- The First Hospital of Hunan University of Chinese Medicine, Changsha, 410007, Hunan, People's Republic of China
| | - You-Jin Ge
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Jin-Wei Zhang
- Departments of Biomedical Engineering and Pathology, School of Basic Medical Science, Central South University, Changsha, 410013, Hunan, People's Republic of China
| | - Jian-Xin Hu
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Cheng-Ying Shen
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, People's Republic of China
| | - Ting-Jun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, People's Republic of China.
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China.
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, People's Republic of China.
| |
Collapse
|
20
|
Yang X, Zhang X, Zhang P, Bidegain G, Dong J, Hu C, Li M, Zhang Z, Guo H. Ensemble habitat suitability modeling for predicting optimal sites for eelgrass (Zostera marina) in the tidal lagoon ecosystem: Implications for restoration and conservation. J Environ Manage 2023; 330:117108. [PMID: 36584472 DOI: 10.1016/j.jenvman.2022.117108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 12/18/2022] [Accepted: 12/20/2022] [Indexed: 06/17/2023]
Abstract
Seagrass systems are in decline, mainly due to anthropogenic pressures and ongoing climate change. Implementing seagrass protection and restoration measures requires accurate assessment of suitable habitats. Commonly, such assessments have been performed using single-algorithm habitat suitability models, nearly always based on low environmental resolution information and short-term species data series. Here we address eelgrass (Zoostera marina) meadows' large-scale decline (>80%) in Shandong province (Yellow Sea, China) by developing an ensemble habitat model (EHM) to inform eelgrass conservation and restoration strategies in the Swan Lake (SL). For this, we applied a weighted EHM derived from ten single-algorithm models including profile, regression, classification, and machine learning methods to generate a high-resolution habitat suitability map. The EHM was constructed based on the predictive performances of each model, by combining a series of present-absent eelgrass datasets from recent years coupled with oceanographic and sediment data. The model was cross-validated with independent historical datasets, and a final habitat suitability map for conservation and restoration was generated. Our EHM scheme outperformed all single models in terms of habitat suitability, scoring ∼0.95 for both true statistic skill (TSS) and area under the curve (AUC) performance criteria. Machine learning methods outperformed profile, regression and classification methods. Regarding model explanatory variables, overall, topographic characteristics such as depth (DEP) and seafloor slope (SSL) are the most significant factors determining the distribution of eelgrass. The EHM predicted that the overlapping area was almost 90% of the current eelgrass habitat. Using results from our EHM, a LOESS regression model for the relationship of the habitat suitability to both the biomass and density of Z. marina outperformed better than the classic Ordinary Least Squares regression model. The EHM is a promising tool for supporting eelgrass protection and restoration areas in temperate lagoons as data availability improves.
Collapse
Affiliation(s)
- Xiaolong Yang
- Fishery College, Zhejiang Ocean University, Zhoushan, 316022, China; State Environmental Protection Key Laboratory of Coastal Ecosystem, National Marine Environmental Monitoring Center, Dalian, 116023, China
| | - Xiumei Zhang
- Fishery College, Zhejiang Ocean University, Zhoushan, 316022, China.
| | - Peidong Zhang
- The Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, Qingdao, China
| | - Gorka Bidegain
- Department of Applied Mathematics, Engineering School of Bilbao, University of the Basque Country (UPV/EHU), Ingeniero Torres Quevedo s/n, 48013, Bilbao, Spain; Research Center for Experimental Marine Biology and Biotechnology, Plentzia Marine Station, University of the Basque Country (PiE-UPV/EHU), Areatza Pasealekua, 48620, Plentzia, Spain
| | - Jianyu Dong
- The Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, Qingdao, China
| | - Chengye Hu
- Fishery College, Zhejiang Ocean University, Zhoushan, 316022, China
| | - Min Li
- The Institute for Advanced Study of Coastal Ecology, Ludong University, Yantai, 264025, China
| | - Zhixin Zhang
- CAS Key Laboratory of Tropical Marine Bio-resources and Ecology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, 510301, China
| | - Hao Guo
- State Environmental Protection Key Laboratory of Coastal Ecosystem, National Marine Environmental Monitoring Center, Dalian, 116023, China
| |
Collapse
|
21
|
Haoxiang Z, Xiaoqing X, Nianwan Y, Yongjun Z, Hui L, Fanghao W, Jianyang G, Wanxue L. Insights from the biogeographic approach for biocontrol of invasive alien pests: Estimating the ecological niche overlap of three egg parasitoids against Spodoptera frugiperda in China. Sci Total Environ 2023; 862:160785. [PMID: 36502977 DOI: 10.1016/j.scitotenv.2022.160785] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 12/04/2022] [Accepted: 12/05/2022] [Indexed: 06/17/2023]
Abstract
Spodoptera frugiperda, the fall armyworm, causes major damage to maize and >80 other crops worldwide. Since S. frugiperda successfully invaded China in 2018 via long-distance migration from Myanmar, it has caused major maize yield losses and posed a severe threat to maize production and food security. The biocontrol approach for S. frugiperda using natural enemies is environmentally safe and effective. Estimating the potential suitable area (PSA) for S. frugiperda and its natural enemies can provide insights for its biocontrol and management. Therefore, based on the global distribution records and bioclimatic variables, we modeled the PSA of S. frugiperda and three egg parasitoids in China using an ensemble model (EM). We found that the prediction results of the EM were more reliable than those of a single model. The PSAs of S. frugiperda and its three egg parasitoids were mainly attributed to temperature variables. The PSA of S. frugiperda was divided into migratory and overwintering areas using the mean January 10 °C isotherm from 2018 to 2022. In the overwintering area, Trichogramma chilonis had the largest PSA overlap with S. frugiperda (94.57 %), followed by Telenomus remus (68.64 %) and Trichogramma dendrolimi (67.53 %). Telenomus remus and Tr. chilonis were the most effective egg parasitoids against S. frugiperda in the overwintering area. In the migratory area, Tr. chilonis had the largest PSA overlap with S. frugiperda (91.36 %), followed by Tr. dendrolimi (81.70 %) and Te. remus (15.23 %). Trichogramma dendrolimi would be the most effective egg parasitoid against S. frugiperda in the Yangtze River Basin and northeastern China. Trichogramma chilonis was the most effective egg parasitoid against S. frugiperda in central China. Our findings indicate that the three native egg parasitoids would be "good regulators" of S. frugiperda outbreaks in China.
Collapse
Affiliation(s)
- Zhao Haoxiang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing 100193, China
| | - Xian Xiaoqing
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing 100193, China
| | - Yang Nianwan
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing 100193, China; Western Agricultural Research Center, Chinese Academy of Agricultural Sciences, Changji 831100, China
| | - Zhang Yongjun
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing 100193, China
| | - Liu Hui
- The National Agro-Tech Extension and Service Center, Beijing 100193, China
| | - Wan Fanghao
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing 100193, China
| | - Guo Jianyang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing 100193, China.
| | - Liu Wanxue
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing 100193, China.
| |
Collapse
|
22
|
Zhu D, Yang W, Xu D, Li H, Zhao Y, Li D. A deep learning based two-layer predictor to identify enhancers and their strength. Methods 2023; 211:23-30. [PMID: 36740001 DOI: 10.1016/j.ymeth.2023.01.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 01/03/2023] [Accepted: 01/30/2023] [Indexed: 02/05/2023] Open
Abstract
The enhancer is a DNA sequence that can increase the activity of promoters and thus speed up the frequency of gene transcription. The enhancer plays an essential role in activating gene expression. Currently, gene sequencing technology has been developed for 30 years from the first generation to the third generation, and a variety of biological sequence data have increased significantly every year. Due to the importance of enhancer functions, it is very expensive to identify enhancers through biochemical experiments. Therefore, we need to study new methods for the identification and classification of enhancers. Based on the K-mer principle this study proposed a feature extraction method that others have not used in convolutional neural networks. Then, we combined it with one-hot encoding to build an efficient one-dimensional convolutional neural network ensemble model for predicting enhancers and their strengths. Finally, we used five commonly used classification problem evaluation indicators to compare with the models proposed by other researchers. The model proposed in this paper has a better performance by using the same independent test dataset as other models.
Collapse
Affiliation(s)
- Di Zhu
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Wen Yang
- International Medical Center, Shenzhen University General Hospital, Shenzhen, China
| | - Dali Xu
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Hongfei Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yuming Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.
| | - Dan Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.
| |
Collapse
|
23
|
Xian X, Zhao H, Wang R, Huang H, Chen B, Zhang G, Liu W, Wan F. Climate change has increased the global threats posed by three ragweeds (Ambrosia L.) in the Anthropocene. Sci Total Environ 2023; 859:160252. [PMID: 36427731 DOI: 10.1016/j.scitotenv.2022.160252] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 11/07/2022] [Accepted: 11/14/2022] [Indexed: 06/16/2023]
Abstract
Invasive alien plants (IAPs) substantially affect the native biodiversity, agriculture, industry, and human health worldwide. Ambrosia (ragweed) species, which are major IAPs globally, produce a significant impact on human health and the natural environment. In particular, invasion of A. artemisiifolia, A. psilostachya, and A. trifida in non-native continents is more extensive and severe than that of other species. Here, we used biomod2 ensemble model based on environmental and species occurrence data to predict the potential geographical distribution, overlapping geographical distribution areas, and the ecological niche dynamics of these three ragweeds and further explored the environmental variables shaping the observed patterns to assess the impact of these IAPs on the natural environment and public health. The ecological niche has shifted in the invasive area compared with that in the native area, which increased the invasion risk of three Ambrosia species during the invasion process in the world. The potential geographical distribution and overlapping geographical distribution areas of the three Ambrosia species are primarily distributed in Asia, North America, and Europe, and are expected to increase under four representative concentration pathways in the 2050s. The centers of potential geographical distributions of the three Ambrosia species showed a tendency to shift poleward from the current time to the 2050s. Bioclimatic variables and the human influence index were more significant in shaping these patterns than other factors. In brief, climate change has facilitated the expansion of the geographical distribution and overlapping geographical distribution areas of the three Ambrosia species. Ecomanagement and cross-country management strategies are warranted to mitigate the future effects of the expansion of these ragweed species worldwide in the Anthropocene on the natural environment and public health.
Collapse
Affiliation(s)
- Xiaoqing Xian
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing 100193, China
| | - Haoxiang Zhao
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing 100193, China
| | - Rui Wang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing 100193, China
| | - Hongkun Huang
- Rural Energy and Environment Agency, Ministry of Agriculture and Rural Affairs, Beijing 100125, China
| | - Baoxiong Chen
- Rural Energy and Environment Agency, Ministry of Agriculture and Rural Affairs, Beijing 100125, China
| | - Guifen Zhang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing 100193, China
| | - Wanxue Liu
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing 100193, China.
| | - Fanghao Wan
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Science, Beijing 100193, China
| |
Collapse
|
24
|
Bleichrodt A, Dahal S, Maloney K, Casanova L, Luo R, Chowell G. Real-time forecasting the trajectory of monkeypox outbreaks at the national and global levels, July-October 2022. BMC Med 2023; 21:19. [PMID: 36647108 PMCID: PMC9841951 DOI: 10.1186/s12916-022-02725-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 12/28/2022] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Beginning May 7, 2022, multiple nations reported an unprecedented surge in monkeypox cases. Unlike past outbreaks, differences in affected populations, transmission mode, and clinical characteristics have been noted. With the existing uncertainties of the outbreak, real-time short-term forecasting can guide and evaluate the effectiveness of public health measures. METHODS We obtained publicly available data on confirmed weekly cases of monkeypox at the global level and for seven countries (with the highest burden of disease at the time this study was initiated) from the Our World in Data (OWID) GitHub repository and CDC website. We generated short-term forecasts of new cases of monkeypox across the study areas using an ensemble n-sub-epidemic modeling framework based on weekly cases using 10-week calibration periods. We report and assess the weekly forecasts with quantified uncertainty from the top-ranked, second-ranked, and ensemble sub-epidemic models. Overall, we conducted 324 weekly sequential 4-week ahead forecasts across the models from the week of July 28th, 2022, to the week of October 13th, 2022. RESULTS The last 10 of 12 forecasting periods (starting the week of August 11th, 2022) show either a plateauing or declining trend of monkeypox cases for all models and areas of study. According to our latest 4-week ahead forecast from the top-ranked model, a total of 6232 (95% PI 487.8, 12,468.0) cases could be added globally from the week of 10/20/2022 to the week of 11/10/2022. At the country level, the top-ranked model predicts that the USA will report the highest cumulative number of new cases for the 4-week forecasts (median based on OWID data: 1806 (95% PI 0.0, 5544.5)). The top-ranked and weighted ensemble models outperformed all other models in short-term forecasts. CONCLUSIONS Our top-ranked model consistently predicted a decreasing trend in monkeypox cases on the global and country-specific scale during the last ten sequential forecasting periods. Our findings reflect the potential impact of increased immunity, and behavioral modification among high-risk populations.
Collapse
Affiliation(s)
- Amanda Bleichrodt
- Department of Population Health Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA
| | - Sushma Dahal
- Department of Population Health Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA
| | - Kevin Maloney
- Department of Population Health Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA
| | - Lisa Casanova
- Department of Population Health Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA
| | - Ruiyan Luo
- Department of Population Health Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA
| | - Gerardo Chowell
- Department of Population Health Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA.
| |
Collapse
|
25
|
Mujahid M, Rustam F, Alasim F, Siddique M, Ashraf I. What people think about fast food: opinions analysis and LDA modeling on fast food restaurants using unstructured tweets. PeerJ Comput Sci 2023; 9:e1193. [PMID: 37346556 PMCID: PMC10280231 DOI: 10.7717/peerj-cs.1193] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 11/28/2022] [Indexed: 06/23/2023]
Abstract
With the rise of social media platforms, sharing reviews has become a social norm in today's modern society. People check customer views on social networking sites about different fast food restaurants and food items before visiting the restaurants and ordering food. Restaurants can compete to better the quality of their offered items or services by carefully analyzing the feedback provided by customers. People tend to visit restaurants with a higher number of positive reviews. Accordingly, manually collecting feedback from customers for every product is a labor-intensive process; the same is true for sentiment analysis. To overcome this, we use sentiment analysis, which automatically extracts meaningful information from the data. Existing studies predominantly focus on machine learning models. As a consequence, the performance analysis of deep learning models is neglected primarily and of the deep ensemble models especially. To this end, this study adopts several deep ensemble models including Bi long short-term memory and gated recurrent unit (BiLSTM+GRU), LSTM+GRU, GRU+recurrent neural network (GRU+RNN), and BiLSTM+RNN models using self-collected unstructured tweets. The performance of lexicon-based methods is compared with deep ensemble models for sentiment classification. In addition, the study makes use of Latent Dirichlet Allocation (LDA) modeling for topic analysis. For experiments, the tweets for the top five fast food serving companies are collected which include KFC, Pizza Hut, McDonald's, Burger King, and Subway. Experimental results reveal that deep ensemble models yield better results than the lexicon-based approach and BiLSTM+GRU obtains the highest accuracy of 95.31% for three class problems. Topic modeling indicates that the highest number of negative sentiments are represented for Subway restaurants with high-intensity negative words. The majority of the people (49%) remain neutral regarding the choice of fast food, 31% seem to like fast food while the rest (20%) dislike fast food.
Collapse
Affiliation(s)
- Muhammad Mujahid
- Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Raheem Yar Khan, Pakistan
| | - Furqan Rustam
- School of Computer Science, University College Dublin, Dublin, Ireland
| | - Fahad Alasim
- Department of Industrial Engineering, College of Engineering, King Saud University, Riyadh, Saudi Arabia
| | - MuhammadAbubakar Siddique
- Department of Computer Science and Information Technology, Ghazi University, Dera Ghazi Khan, Punjab, Pakistan
| | - Imran Ashraf
- Information and Communication Engineering, Yeungnam University, Gyeongsan si, South Korea
| |
Collapse
|
26
|
Mondal S, Lee MA, Chen YK, Wang YC. Ensemble modeling of black pomfret ( Parastromateus niger) habitat in the Taiwan Strait based on oceanographic variables. PeerJ 2023; 11:e14990. [PMID: 36919168 PMCID: PMC10008307 DOI: 10.7717/peerj.14990] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 02/12/2023] [Indexed: 03/12/2023] Open
Abstract
The location, effort, number of captures, and time of fishing were all used in this study to assess the geographic distribution of Parastromateus niger in the Taiwan Strait. Other species distribution models performed worse than generalized linear models (GLMs) based on six oceanographic parameters. The sea surface temperature (SST) was between 26.5 °C and 29.5 °C, the sea surface chlorophyll (SSC) level was between 0.3-0.44 mg/m3, the sea surface salinity (SSS) was between 33.4 °C and 34.4 °C, the mixed layer depth was between 10 °C and 14 °C, the sea surface height was between 0.57 °C and 0.77 °C, and the eddy kinetic energy (EKE) was between 0.603 °C. According to the statistical findings, SST is merely a small effect compared to SSS, SSC level, and EKE in terms of impacting species distribution. By combining four effective single-algorithm models with no obvious bias, an ensemble habitat model was created. The ranges of 117°E-119°E and 22°N-24°N have the highest annual distributions of S.CPUE and nominal CPUE.
Collapse
Affiliation(s)
- Sandipan Mondal
- Environmental Biology & Fishery Science, National Taiwan Ocean University, Keelung, Taiwan.,Center of Excellence for Ocean Engineering, National Taiwan Ocean University, Keelung, Taiwan
| | - Ming An Lee
- Environmental Biology & Fishery Science, National Taiwan Ocean University, Keelung, Taiwan.,Center of Excellence for Ocean Engineering, National Taiwan Ocean University, Keelung, Taiwan.,Doctoral Degree Program in Ocean Resource and Environmental Changes, National Taiwan Ocean University, Keelung, Taiwan
| | - Yu-Kai Chen
- Coastal and Offshore Resource Research Center, Fisheries Research Institute, Council of Agriculture, Kaohsiung, Taiwan
| | - Yi-Chen Wang
- Environmental Biology & Fishery Science, National Taiwan Ocean University, Keelung, Taiwan.,Center of Excellence for Ocean Engineering, National Taiwan Ocean University, Keelung, Taiwan
| |
Collapse
|
27
|
Umer M, Sadiq S, karamti H, Abdulmajid Eshmawi A, Nappi M, Usman Sana M, Ashraf I. ETCNN: Extra Tree and Convolutional Neural Network-based Ensemble Model for COVID-19 Tweets Sentiment Classification. Pattern Recognit Lett 2022; 164:224-231. [PMID: 36407854 PMCID: PMC9664766 DOI: 10.1016/j.patrec.2022.11.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 10/09/2022] [Accepted: 11/11/2022] [Indexed: 11/17/2022]
Abstract
Pandemics influence people negatively and people experience fear and disappointment. With the global outspread of COVID-19, the sentiments of the general public are substantially influenced, and analyzing their sentiments could help to devise corresponding policies to alleviate negative sentiments. Often the data collected from social media platforms is unstructured leading to low classification accuracy. This study brings forward an ensemble model where the benefits of handcrafted features and automatic feature extraction are combined by machine learning and deep learning models. Unstructured data is obtained, preprocessed, and annotated using TextBlob and VADER before training machine learning models. Similarly, the efficiency of Word2Vec, TF, and TF-IDF features is also analyzed. Results reveal the better performance of the extra tree classifier when trained with TF-IDF features from TextBlob annotated data. Overall, machine learning models perform better with TF-IDF and TextBlob. The proposed model obtains superior performance using both annotation techniques with 0.97 and 0.95 scores of accuracy using TextBlob and VADER respectively with Word2Vec features. Results reveal that use of machine learning and deep learning models together with a voting criterion tends to yield better results than other machine learning models. Analysis of sentiments indicates that predominantly people possess negative sentiments regarding COVID-19.
Collapse
Affiliation(s)
- Muhammad Umer
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, 63100, Pakistan
| | - Saima Sadiq
- Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Hanen karamti
- Department of computer sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O.Box 84428, Riyadh 11671, Saudi Arabia
| | | | - Michele Nappi
- Department of Computer Science, University of Salerno, Fisciano, Italy,Corresponding author
| | - Muhammad Usman Sana
- College of Computer Science Technology, Xian University of Science and Technology, Xian, Shaanxi 710054, China
| | - Imran Ashraf
- Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, Korea,Corresponding author
| |
Collapse
|
28
|
Zhu X, Guo H, Huang JJ, Tian S, Xu W, Mai Y. An ensemble machine learning model for water quality estimation in coastal area based on remote sensing imagery. J Environ Manage 2022; 323:116187. [PMID: 36261960 DOI: 10.1016/j.jenvman.2022.116187] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 09/01/2022] [Accepted: 09/02/2022] [Indexed: 06/16/2023]
Abstract
The accurate estimation of coastal water quality parameters (WQPs) is crucial for decision-makers to manage water resources. Although various machine learning (ML) models have been developed for coastal water quality estimation using remote sensing data, the performance of these models has significant uncertainties when applied to regional scales. To address this issue, an ensemble ML-based model was developed in this study. The ensemble ML model was applied to estimate chlorophyll-a (Chla), turbidity, and dissolved oxygen (DO) based on Sentinel-2 satellite images in Shenzhen Bay, China. The optimal input features for each WQP were selected from eight spectral bands and seven spectral indices. A local explanation strategy termed Shapley Additive Explanations (SHAP) was employed to quantify contributions of each feature to model outputs. In addition, the impacts of three climate factors on the variation of each WQP were analyzed. The results suggested that the ensemble ML models have satisfied performance for Chla (errors = 1.7%), turbidity (errors = 1.5%) and DO estimation (errors = 0.02%). Band 3 (B3) has the highest positive contribution to Chla estimation, while Band Ration Index2 (BR2) has the highest negative contribution to turbidity estimation, and Band 7 (B7) has the highest positive contribution to DO estimation. The spatial patterns of the three WQPs revealed that the water quality deterioration in Shenzhen Bay was mainly influenced by input of terrestrial pollutants from the estuary. Correlation analysis demonstrated that air temperature (Temp) and average air pressure (AAP) exhibited the closest relationship with Chla. DO showed the strongest negative correlation with Temp, while turbidity was not sensitive to Temp, average wind speed (AWS), and AAP. Overall, the ensemble ML model proposed in this study provides an accurate and practical method for long-term Chla, turbidity, and DO estimation in coastal waters.
Collapse
Affiliation(s)
- Xiaotong Zhu
- College of Environmental Science and Engineering/Sino-Canada Joint R&D Centre for Water and Environment Safety,Nankai University, Tianjin, 300071, PR China
| | - Hongwei Guo
- College of Environmental Science and Engineering/Sino-Canada Joint R&D Centre for Water and Environment Safety,Nankai University, Tianjin, 300071, PR China
| | - Jinhui Jeanne Huang
- College of Environmental Science and Engineering/Sino-Canada Joint R&D Centre for Water and Environment Safety,Nankai University, Tianjin, 300071, PR China.
| | - Shang Tian
- College of Environmental Science and Engineering/Sino-Canada Joint R&D Centre for Water and Environment Safety,Nankai University, Tianjin, 300071, PR China
| | - Wang Xu
- Shenzhen Environmental Monitoring Center, Shenzhen, Guangdong, 518049, PR China
| | - Youquan Mai
- Shenzhen Environmental Monitoring Center, Shenzhen, Guangdong, 518049, PR China
| |
Collapse
|
29
|
El-Kenawy ESM, Zerouali B, Bailek N, Bouchouich K, Hassan MA, Almorox J, Kuriqi A, Eid M, Ibrahim A. Improved weighted ensemble learning for predicting the daily reference evapotranspiration under the semi-arid climate conditions. Environ Sci Pollut Res Int 2022; 29:81279-81299. [PMID: 35731435 DOI: 10.1007/s11356-022-21410-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 06/07/2022] [Indexed: 06/15/2023]
Abstract
Evapotranspiration is an important quantity required in many applications, such as hydrology and agricultural and irrigation planning. Reference evapotranspiration is particularly important, and the prediction of its variations is beneficial for analyzing the needs and management of water resources. In this paper, we explore the predictive ability of hybrid ensemble learning to predict daily reference evapotranspiration (RET) under the semi-arid climate by using meteorological datasets at 12 locations in the Andalusia province in southern Spain. The datasets comprise mean, maximum, and minimum air temperatures and mean relative humidity and mean wind speed. A new modified variant of the grey wolf optimizer, named the PRSFGWO algorithm, is proposed to maximize the ensemble learning's prediction accuracy through optimal weight tuning and evaluate the proposed model's capacity when the climate data is limited. The performance of the proposed approach, based on weighted ensemble learning, is compared with various algorithms commonly adopted in relevant studies. A diverse set of statistical measurements alongside ANOVA tests was used to evaluate the predictive performance of the prediction models. The proposed model showed high-accuracy statistics, with relative root mean errors lower than 0.999% and a minimum R2 of 0.99. The model inputs were also reduced from six variables to only two for cost-effective predictions of daily RET. This shows that the PRSFGWO algorithm is a good RET prediction model for the semi-arid climate region in southern Spain. The results obtained from this research are very promising compared with existing models in the literature.
Collapse
Affiliation(s)
- El-Sayed M El-Kenawy
- Department of Communications and Electronics, Delta Higher Institute of Engineering and Technology, Mansour, 35111, Egypt
- Faculty of Artificial Intelligence, Delta University for Science and Technology, Mansoura, 35712, Egypt
| | - Bilel Zerouali
- Vegetal Chemistry-Water-Energy Laboratory, Faculty of Civil Engineering and Architecture, Department of Hydraulic, Hassiba Benbouali University of Chlef, B.P. 78C, Ouled Fares, 02180, Chlef, Algeria
| | - Nadjem Bailek
- Energies and Materials Research Laboratory, Department of Matter Sciences, Faculty of Sciences and Technology, University of Tamanghasset, Tamanghasset, Algeria.
| | - Kada Bouchouich
- Unité de Recherche en Energies Renouvelables en Milieu Saharien (URERMS), Centre de Développement Des Energies Renouvelables (CDER), 01000, Adrar, Algeria
| | - Muhammed A Hassan
- Mechanical Power Engineering Department, Faculty of Engineering, Cairo University, Giza, Giza, 12613, Egypt
| | - Javier Almorox
- Agricultural Engineering Department, Faculty of Agriculture, Mansoura University, Mansoura, 35516, Egypt
| | - Alban Kuriqi
- CERIS, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
- Civil Engineering Department, University for Business and Technology, Pristina, Kosovo
| | - Marwa Eid
- Faculty of Artificial Intelligence, Delta University for Science and Technology, Mansoura, 35712, Egypt
| | - Abdelhameed Ibrahim
- Computer Engineering and Control Systems Department, Faculty of Engineering, Mansoura University, Mansoura, 35516, Egypt
| |
Collapse
|
30
|
Yenkikar A, Babu CN, Hemanth DJ. Semantic relational machine learning model for sentiment analysis using cascade feature selection and heterogeneous classifier ensemble. PeerJ Comput Sci 2022; 8:e1100. [PMID: 36262147 PMCID: PMC9575864 DOI: 10.7717/peerj-cs.1100] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 08/23/2022] [Indexed: 06/16/2023]
Abstract
The exponential rise in social media via microblogging sites like Twitter has sparked curiosity in sentiment analysis that exploits user feedback towards a targeted product or service. Considering its significance in business intelligence and decision-making, numerous efforts have been made in this area. However, lack of dictionaries, unannotated data, large-scale unstructured data, and low accuracies have plagued these approaches. Also, sentiment classification through classifier ensemble has been underexplored in literature. In this article, we propose a Semantic Relational Machine Learning (SRML) model that automatically classifies the sentiment of tweets by using classifier ensemble and optimal features. The model employs the Cascaded Feature Selection (CFS) strategy, a novel statistical assessment approach based on Wilcoxon rank sum test, univariate logistic regression assisted significant predictor test and cross-correlation test. It further uses the efficacy of word2vec-based continuous bag-of-words and n-gram feature extraction in conjunction with SentiWordNet for finding optimal features for classification. We experiment on six public Twitter sentiment datasets, the STS-Gold dataset, the Obama-McCain Debate (OMD) dataset, the healthcare reform (HCR) dataset and the SemEval2017 Task 4A, 4B and 4C on a heterogeneous classifier ensemble comprising fourteen individual classifiers from different paradigms. Results from the experimental study indicate that CFS supports in attaining a higher classification accuracy with up to 50% lesser features compared to count vectorizer approach. In Intra-model performance assessment, the Artificial Neural Network-Gradient Descent (ANN-GD) classifier performs comparatively better than other individual classifiers, but the Best Trained Ensemble (BTE) strategy outperforms on all metrics. In inter-model performance assessment with existing state-of-the-art systems, the proposed model achieved higher accuracy and outperforms more accomplished models employing quantum-inspired sentiment representation (QSR), transformer-based methods like BERT, BERTweet, RoBERTa and ensemble techniques. The research thus provides critical insights into implementing similar strategy into building more generic and robust expert system for sentiment analysis that can be leveraged across industries.
Collapse
Affiliation(s)
- Anuradha Yenkikar
- Department of Computer Science and Engineering, M. S. Ramaiah University of Applied Sciences, Bengaluru, Karnataka, India
| | - C. Narendra Babu
- Department of Computer Science and Engineering, M. S. Ramaiah University of Applied Sciences, Bengaluru, Karnataka, India
| | - D. Jude Hemanth
- Department of Electronics and Communications Engineering, Karunya University, Coimbatore, Tamil Nadu, India
| |
Collapse
|
31
|
Ezanno P, Picault S, Bareille S, Beaunée G, Boender GJ, Dankwa EA, Deslandes F, Donnelly CA, Hagenaars TJ, Hayes S, Jori F, Lambert S, Mancini M, Munoz F, Pleydell DRJ, Thompson RN, Vergu E, Vignes M, Vergne T. The African swine fever modelling challenge: Model comparison and lessons learnt. Epidemics 2022; 40:100615. [PMID: 35970067 DOI: 10.1016/j.epidem.2022.100615] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 06/29/2022] [Accepted: 07/20/2022] [Indexed: 11/26/2022] Open
Abstract
Robust epidemiological knowledge and predictive modelling tools are needed to address challenging objectives, such as: understanding epidemic drivers; forecasting epidemics; and prioritising control measures. Often, multiple modelling approaches can be used during an epidemic to support effective decision making in a timely manner. Modelling challenges contribute to understanding the pros and cons of different approaches and to fostering technical dialogue between modellers. In this paper, we present the results of the first modelling challenge in animal health - the ASF Challenge - which focused on a synthetic epidemic of African swine fever (ASF) on an island. The modelling approaches proposed by five independent international teams were compared. We assessed their ability to predict temporal and spatial epidemic expansion at the interface between domestic pigs and wild boar, and to prioritise a limited number of alternative interventions. We also compared their qualitative and quantitative spatio-temporal predictions over the first two one-month projection phases of the challenge. Top-performing models in predicting the ASF epidemic differed according to the challenge phase, host species, and in predicting spatial or temporal dynamics. Ensemble models built using all team-predictions outperformed any individual model in at least one phase. The ASF Challenge demonstrated that accounting for the interface between livestock and wildlife is key to increasing our effectiveness in controlling emerging animal diseases, and contributed to improving the readiness of the scientific community to face future ASF epidemics. Finally, we discuss the lessons learnt from model comparison to guide decision making.
Collapse
Affiliation(s)
| | | | - Servane Bareille
- INRAE, Oniris, BIOEPAR, 44300 Nantes, France; INRAE, ENVT, IHAP, Toulouse, France
| | | | | | | | | | - Christl A Donnelly
- Department of Statistics, University of Oxford, Oxford, United Kingdom; Department of Infectious Disease Epidemiology, Faculty of Medicine, School of Public Health, Imperial College London, United Kingdom
| | | | - Sarah Hayes
- Department of Infectious Disease Epidemiology, Faculty of Medicine, School of Public Health, Imperial College London, United Kingdom
| | - Ferran Jori
- CIRAD, INRAE, Université de Montpellier, ASTRE, 34398 Montpellier, France
| | - Sébastien Lambert
- Centre for Emerging, Endemic and Exotic Diseases, Department of Pathobiology and Population Sciences, Royal Veterinary College, University of London, United Kingdom
| | - Matthieu Mancini
- INRAE, Oniris, BIOEPAR, 44300 Nantes, France; INRAE, ENVT, IHAP, Toulouse, France
| | - Facundo Munoz
- CIRAD, INRAE, Université de Montpellier, ASTRE, 34398 Montpellier, France
| | - David R J Pleydell
- CIRAD, INRAE, Université de Montpellier, ASTRE, 34398 Montpellier, France
| | - Robin N Thompson
- Mathematics Institute and Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research, University of Warwick, Coventry, United Kingdom
| | - Elisabeta Vergu
- Université Paris-Saclay, INRAE, MaIAGE, 78350 Jouy-en-Josas, France
| | - Matthieu Vignes
- School of Mathematical and Computational Sciences, Massey University, Palmerston North, New Zealand
| | | |
Collapse
|
32
|
Park J, Lee WH, Kim KT, Park CY, Lee S, Heo TY. Interpretation of ensemble learning to predict water quality using explainable artificial intelligence. Sci Total Environ 2022; 832:155070. [PMID: 35398119 DOI: 10.1016/j.scitotenv.2022.155070] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 03/31/2022] [Accepted: 04/02/2022] [Indexed: 06/14/2023]
Abstract
Algal bloom is a significant issue when managing water quality in freshwater; specifically, predicting the concentration of algae is essential to maintaining the safety of the drinking water supply system. The chlorophyll-a (Chl-a) concentration is a commonly used indicator to obtain an estimation of algal concentration. In this study, an XGBoost ensemble machine learning (ML) model was developed from eighteen input variables to predict Chl-a concentration. The composition and pretreatment of input variables to the model are important factors for improving model performance. Explainable artificial intelligence (XAI) is an emerging area of ML modeling that provides a reasonable interpretation of model performance. The effect of input variable selection on model performance was estimated, where the priority of input variable selection was determined using three indices: Shapley value (SHAP), feature importance (FI), and variance inflation factor (VIF). SHAP analysis is an XAI algorithm designed to compute the relative importance of input variables with consistency, providing an interpretable analysis for model prediction. The XGB models simulated with independent variables selected using three indices were evaluated with root mean square error (RMSE), RMSE-observation standard deviation ratio, and Nash-Sutcliffe efficiency. This study shows that the model exhibited the most stable performance when the priority of input variables was determined by SHAP. This implies that on-site monitoring can be designed to collect the selected input variables from the SHAP analysis to reduce the cost of overall water quality analysis. The independent variables were further analyzed using SHAP summary plot, force plot, target plot, and partial dependency plot to provide understandable interpretation on the performance of the XGB model. While XAI is still in the early stages of development, this study successfully demonstrated a good example of XAI application to improve the interpretation of machine learning model performance in predicting water quality.
Collapse
Affiliation(s)
- Jungsu Park
- Department of Civil and Environmental Engineering, Hanbat National University,125, Dongseo-daero, Yuseong-gu, Daejeon 34158, Republic of Korea.
| | - Woo Hyoung Lee
- Department of Civil, Environmental and Construction Engineering, University of Central Florida, 12800 Pegasus Dr., Orlando, FL 32816, USA.
| | - Keug Tae Kim
- Department of Environmental & Energy Engineering, The University of Suwon, 17 Wauan-gil, Bongdam-eup, Hwaseong-si, Gyeonggi-do 18323, Republic of Korea.
| | | | - Sanghun Lee
- Department of Information & Statistics, Chungbuk National University, Chungdae-Ro 1, SeoWon-Gu, Cheongju, Chungbuk 28644, Republic of Korea
| | - Tae-Young Heo
- Department of Information & Statistics, Chungbuk National University, Chungdae-Ro 1, SeoWon-Gu, Cheongju, Chungbuk 28644, Republic of Korea.
| |
Collapse
|
33
|
Mohsen F, Biswas MR, Ali H, Alam T, Househ M, Shah Z. Customized and Automated Machine Learning-Based Models for Diabetes Type 2 Classification. Stud Health Technol Inform 2022; 295:517-520. [PMID: 35773925 DOI: 10.3233/shti220779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This study aims to develop models to accurately classify patients with type 2 diabetes using the Practice Fusion dataset. We use Random Forest (RF), Support Vector Classifier (SVC), AdaBoost classifier, an ensemble model, and automated machine learning (AutoML) model. We compare the performance of all models in a five-fold cross-validation scheme using four evaluation measures. Experimental results demonstrate that the AutoML model outperformed individual and ensemble models in all evaluation measures.
Collapse
Affiliation(s)
- Farida Mohsen
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Md Rafiul Biswas
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Hazrat Ali
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Mowafa Househ
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Zubair Shah
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| |
Collapse
|
34
|
Wang Y, Zhu X, Yang L, Hu X, He K, Yu C, Jiao S, Chen J, Guo R, Yang S. IDDLncLoc: Subcellular Localization of LncRNAs Based on a Framework for Imbalanced Data Distributions. Interdiscip Sci 2022; 14:409-420. [PMID: 35192174 DOI: 10.1007/s12539-021-00497-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 12/16/2021] [Accepted: 12/20/2021] [Indexed: 06/14/2023]
Abstract
Long non-coding RNAs play a crucial role in many life processes of cell, such as genetic markers, RNA splicing, signaling, and protein regulation. Considering that identifying lncRNA's localization in the cell through experimental methods is complicated, hard to reproduce, and expensive, we propose a novel method named IDDLncLoc in this paper, which adopts an ensemble model to solve the problem of the subcellular localization. In the proposal model, dinucleotide-based auto-cross covariance features, k-mer nucleotide composition features, and composition, transition, and distribution features are introduced to encode a raw RNA sequence to vector. To screen out reliable features, feature selection through binomial distribution, and recursive feature elimination is employed. Furthermore, strategies of oversampling in mini-batch, random sampling, and stacking ensemble strategies are customized to overcome the problem of data imbalance on the benchmark dataset. Finally, compared with the latest methods, IDDLncLoc achieves an accuracy of 94.96% on the benchmark dataset, which is 2.59% higher than the best method, and the results further demonstrate IDDLncLoc is excellent on the subcellular localization of lncRNA. Besides, a user-friendly web server is available at http://lncloc.club .
Collapse
Affiliation(s)
- Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Xiaopeng Zhu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Lili Yang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
- Department of Obstetrics, The First Hospital of Jilin University, Changchun, China
| | - Xuemei Hu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Kai He
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Cuinan Yu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Shaoqing Jiao
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Jiali Chen
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Rui Guo
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Sen Yang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
| |
Collapse
|
35
|
Nimmi K, Janet B, Selvan AK, Sivakumaran N. Pre-trained ensemble model for identification of emotion during COVID-19 based on emergency response support system dataset. Appl Soft Comput 2022; 122:108842. [PMID: 35465357 PMCID: PMC9014641 DOI: 10.1016/j.asoc.2022.108842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 03/26/2022] [Accepted: 04/05/2022] [Indexed: 01/17/2023]
Abstract
The COVID-19 precautions, lockdown, and quarantine implemented throughout the epidemic resulted in a worldwide economic disaster. People are facing unprecedented levels of intense threat, necessitating professional, systematic psychiatric intervention and assistance. New psychological services must be established as quickly as possible to support the mental healthcare needs of people in this pandemic condition. This study examines the contents of calls landed in the emergency response support system (ERSS) during the pandemic. Furthermore, a combined analysis of Twitter patterns connected to emergency services could be valuable in assisting people in this pandemic crisis and understanding and supporting people's emotions. The proposed Average Voting Ensemble Deep Learning model (AVEDL Model) is based on the Average Voting technique. The AVEDL Model is utilized to classify emotion based on COVID-19 associated emergency response support system calls (transcribed) along with tweets. Pre-trained transformer-based models BERT, DistilBERT, and RoBERTa are combined to build the AVEDL Model, which achieves the best results. The AVEDL Model is trained and tested for emotion detection using the COVID-19 labeled tweets and call content of the emergency response support system. This is the first deep learning ensemble model using COVID-19 emotion analysis to the best of our knowledge. The AVEDL Model outperforms standard deep learning and machine learning models by attaining an accuracy of 86.46 percent and Macro-average F1-score of 85.20 percent.
Collapse
Affiliation(s)
- K. Nimmi
- Department of Computer Applications, National Institute of Technology, Tiruchirappalli, India,Corresponding author
| | - B. Janet
- Department of Computer Applications, National Institute of Technology, Tiruchirappalli, India
| | - A. Kalai Selvan
- Centre for Development of Advanced Computing (C-DAC), Thiruvananthapuram, India
| | - N. Sivakumaran
- Department of Instrumentation and Control Engineering, National Institute of Technology, Tiruchirappalli, India
| |
Collapse
|
36
|
Biney JKM, Vašát R, Blöcher JR, Borůvka L, Němeček K. Using an ensemble model coupled with portable X-ray fluorescence and visible near-infrared spectroscopy to explore the viability of mapping and estimating arsenic in an agricultural soil. Sci Total Environ 2022; 818:151805. [PMID: 34813815 DOI: 10.1016/j.scitotenv.2021.151805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 11/07/2021] [Accepted: 11/15/2021] [Indexed: 06/13/2023]
Abstract
Increasing concentrations of potentially toxic elements (PTE) in agricultural soils remain a major source of public concern. Monitoring PTEs in an agricultural field with no history of contaminants necessitate adequate analysis utilizing a robust model to accurately uncover hidden PTEs. Detecting and mapping the distribution of soil properties using portable X-ray fluorescence (pXRF) and proximal sensing techniques is not only rapid, but also relatively inexpensive. In this study, an ensemble model, consisting of partial least square regression (PLSR), support vector machine (SVM), random forest (RF) and cubist, was used for the prediction and mapping of soil As content in an agricultural field with no history of pollution. The datasets were collected using pXRF and field spectroscopy techniques. The main goal was to compare the ensemble model to each of the calibration techniques in terms of prediction accuracy of As content in such a field. Other components [e.g., soil organic carbon (SOC), Mn, S, soil pH, Fe] that are known to influence As levels in the soil were also retrieved to assess their correlation with soil As. The models were evaluated using the root mean squared error (RMSECV), the coefficient of determination (R2CV) and the ratio of performance to interquartile range (RPIQ). In terms of prediction accuracy, the ensemble model outperformed each of the individual techniques (R2CV = 0.80/0.75) and obtained the least error margin (RMSECV = 1.91/2.16). Overall, all the predictive techniques were able to detect both low and high estimated values of soil As within the study field, but with the ensemble model resembling the measurements better. The ensemble model, a promising tool as demonstrated by the current study, is highly recommended to be included in future studies for more accurate estimation of As and other PTEs in other agricultural fields.
Collapse
Affiliation(s)
- James Kobina Mensah Biney
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, 16500 Prague-Suchdol, Czech Republic; The Silva Tarouca Research Institute for Landscape and Ornamental Gardening, Department of Landscape Ecology, Lidická 25/27, Brno, 602 00, Czech Republic.
| | - Radim Vašát
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, 16500 Prague-Suchdol, Czech Republic
| | - Johanna Ruth Blöcher
- Department of Water Resources and Environmental Modeling, Faculty of Environmental Sciences, Czech University of Life Sciences Prague, 16500 Prague-Suchdol, Czech Republic
| | - Luboš Borůvka
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, 16500 Prague-Suchdol, Czech Republic
| | - Karel Němeček
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, 16500 Prague-Suchdol, Czech Republic
| |
Collapse
|
37
|
Jin Z, Ma Y, Chu L, Liu Y, Dubrow R, Chen K. Predicting spatiotemporally-resolved mean air temperature over Sweden from satellite data using an ensemble model. Environ Res 2022; 204:111960. [PMID: 34464620 DOI: 10.1016/j.envres.2021.111960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 07/29/2021] [Accepted: 08/23/2021] [Indexed: 06/13/2023]
Abstract
Mapping of air temperature (Ta) at high spatiotemporal resolution is critical to reducing exposure assessment errors in epidemiological studies on the health effects of air temperature. In this study, we applied a three-stage ensemble model to estimate daily mean Ta from satellite-based land surface temperature (Ts) over Sweden during 2001-2019 at a high spatial resolution of 1 × 1 km2. The ensemble model incorporated four base models, including a generalized additive model (GAM), a generalized additive mixed model (GAMM), and two machine learning models (random forest [RF] and extreme gradient boosting [XGBoost]), and allowed the weights for each model to vary over space, with the best-performing model for each grid cell assigned the highest weight. Various spatial predictors were included as adjustment variables in all the base models, including land cover type, normalized difference vegetation index (NDVI), and elevation. The ensemble model showed high performance with an overall R2 of 0.98 and a root mean square error of 1.38 °C in the ten-fold cross-validation, and outperformed each of the four base models. Although each base model performed well, the two machine learning models (RF [R2 = 0.97], XGBoost [R2 = 0.98]) had better performance than the two regression models (GAM [R2 = 0.95], GAMM [R2 = 0.96]). In the machine learning models, Ts was the dominant predictor of Ta, followed by day of year, NDVI, latitude, elevation, and longitude. The highly spatiotemporally-resolved Ta can improve temperature exposure assessment in future epidemiological studies.
Collapse
Affiliation(s)
- Zhihao Jin
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, USA; Yale Center on Climate Change and Health, Yale School of Public Health, New Haven, CT, USA
| | - Yiqun Ma
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, USA; Yale Center on Climate Change and Health, Yale School of Public Health, New Haven, CT, USA
| | - Lingzhi Chu
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, USA; Yale Center on Climate Change and Health, Yale School of Public Health, New Haven, CT, USA
| | - Yang Liu
- Gangarosa Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Robert Dubrow
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, USA; Yale Center on Climate Change and Health, Yale School of Public Health, New Haven, CT, USA
| | - Kai Chen
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, USA; Yale Center on Climate Change and Health, Yale School of Public Health, New Haven, CT, USA.
| |
Collapse
|
38
|
Zhuang H, Zhang C, Jin X, Ge A, Chen M, Ye J, Qiao H, Xiong P, Zhang X, Chen J, Luan X, Wang W. A flagship species-based approach to efficient, cost-effective biodiversity conservation in the Qinling Mountains, China. J Environ Manage 2022; 305:114388. [PMID: 34972047 DOI: 10.1016/j.jenvman.2021.114388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 12/12/2021] [Accepted: 12/22/2021] [Indexed: 06/14/2023]
Abstract
Prioritizing threatened species protection has been proposed as an efficient response to the global biodiversity crisis. We used in-situ conservation data to predict the potential habitat area of four flagship species: the giant panda (Ailuropoda melanoleuca), golden monkey (Rhinopithecus roxella quinlingensis), takin (Budorcas taxicolor bedfordi), and crested ibis (Nipponia nippon). We then designed systematic conservation planning schemes for various scenarios given species habitat preferences and anthropogenic activities and conducted a cost-effectiveness assessment. Broadly, the geographical distributions of suitable habitats for giant pandas, golden monkeys, and takins exhibited high spatial congruence (correlation coefficients of 0.59-0.90), and areas of high congruence were concentrated in the northern portion of the Qinling Mountains at high elevation (>1500 m). By contrast, the crested ibis was negatively correlated in space with its sympatric species (-0.47 to -0.29). Crested ibis habitats were clustered in the southern portion of the region at low elevation (<1500 m). A hypothetical conservation priority area (CPA) based on the giant panda, golden monkey, and takin included 39.64% of the Qinling Mountains and 100%, 99.99%, 99.59%, and 7.84% of the suitable habitats for giant pandas, golden monkeys, takins, and crested ibises, respectively. The same area included 99.07%, 70.87%, and 39.96% of the highly important areas for the ecosystem services of biodiversity conservation, water supply, and soil retention, respectively, and only 4.62%, 16.83%, and 13.4% of the area were associated with high-density residential area, impervious surfaces, and cropland, respectively. Therefore, we conclude that a CPA approach based on the specialist species could result in effective, low-cost biodiversity conservation in the Qinling Mountains. However, we note that existing protected areas account for only 26.52% of the CPA. We recommend that the main area of the proposed Qinling National Park should be based on the CPA developed here.
Collapse
Affiliation(s)
- Hongfei Zhuang
- Academy of Forestry Inventory and Planning, National Forestry and Grassland Administration, Beijing, 100714, China; School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, 100083, China; First Institute of Oceanography, Ministry of Natural Resources, Qingdao, 266061, China
| | - Chao Zhang
- School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, 100083, China
| | - Xuelin Jin
- Shaanxi Institute of Zoology, Xi'an, 710032, China
| | - Anxin Ge
- Shaanxi Institute of Forestry Inventory and Planning, Xi'an, 710082, China
| | - Minhao Chen
- School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, 100083, China
| | - Jing Ye
- Academy of Forestry Inventory and Planning, National Forestry and Grassland Administration, Beijing, 100714, China
| | - Hailiang Qiao
- Shaanxi Institute of Forestry Inventory and Planning, Xi'an, 710082, China
| | - Ping Xiong
- Shaanxi Institute of Forestry Inventory and Planning, Xi'an, 710082, China
| | - Xiaofeng Zhang
- Shaanxi Institute of Forestry Inventory and Planning, Xi'an, 710082, China
| | - Junzhi Chen
- Academy of Forestry Inventory and Planning, National Forestry and Grassland Administration, Beijing, 100714, China.
| | - Xiaofeng Luan
- School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, 100083, China.
| | - Wei Wang
- Chinese Research Academy of Environmental Sciences, Beijing, 100012, China
| |
Collapse
|
39
|
Ke H, Gong S, He J, Zhang L, Cui B, Wang Y, Mo J, Zhou Y, Zhang H. Development and application of an automated air quality forecasting system based on machine learning. Sci Total Environ 2022; 806:151204. [PMID: 34710417 DOI: 10.1016/j.scitotenv.2021.151204] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 10/20/2021] [Accepted: 10/20/2021] [Indexed: 06/13/2023]
Abstract
As one of the most concerned issues in modern society, air quality has received extensive attentions from the public and the government, which promotes the continuous development and progress of air quality forecasting technology. In this study, an automated air quality forecasting system based on machine learning has been developed and applied for daily forecasts of six common pollutants (PM2.5, PM10, SO2, NO2, O3, and CO) and pollution levels, which can automatically find the best "Model + Hyperparameters" without human intervention. Five machine learning models and an ensemble model (Stacked Generalization) were integrated into the system, supported by a knowledge base containing the meteorological observed data, pollutant concentrations, pollutant emissions, and model reanalysis data. Then five-year data (2015-2019) of Beijing, Shanghai, Guangzhou, Chengdu, Xi'an, Wuhan, and Changchun in China, were used as an application case to study the effectiveness of the automated forecasting system. Based on the analysis of seven evaluation criteria and pollution level forecasts, combined with the forecasting results for the next 3-days, it is found that the automated system can achieve satisfactory forecasting performance, better than most of numerical model results. This implied that the developed system unveils a good application prospect in the field of environmental meteorology.
Collapse
Affiliation(s)
- Huabing Ke
- Climate and Weather Disasters Collaborative Innovation Center, Nanjing University of Information Science & Technology, Nanjing 210044, China; State Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological Sciences, Beijing 100081, China
| | - Sunling Gong
- State Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological Sciences, Beijing 100081, China.
| | - Jianjun He
- State Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological Sciences, Beijing 100081, China
| | - Lei Zhang
- State Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological Sciences, Beijing 100081, China
| | - Bin Cui
- Department of Computer Science and Technology & Key Laboratory of High Confidence Software Technologies (MOE), Peking University, Beijing, China
| | - Yaqiang Wang
- State Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological Sciences, Beijing 100081, China
| | - Jingyue Mo
- Climate and Weather Disasters Collaborative Innovation Center, Nanjing University of Information Science & Technology, Nanjing 210044, China; State Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological Sciences, Beijing 100081, China
| | - Yike Zhou
- State Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological Sciences, Beijing 100081, China
| | - Huan Zhang
- State Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological Sciences, Beijing 100081, China
| |
Collapse
|
40
|
Chen YM, Chen YJ, Ho WH, Tsai JT. Classifying chest CT images as COVID-19 positive/negative using a convolutional neural network ensemble model and uniform experimental design method. BMC Bioinformatics 2021; 22:147. [PMID: 34749629 PMCID: PMC8574139 DOI: 10.1186/s12859-021-04083-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 03/16/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND To classify chest computed tomography (CT) images as positive or negative for coronavirus disease 2019 (COVID-19) quickly and accurately, researchers attempted to develop effective models by using medical images. RESULTS A convolutional neural network (CNN) ensemble model was developed for classifying chest CT images as positive or negative for COVID-19. To classify chest CT images acquired from COVID-19 patients, the proposed COVID19-CNN ensemble model combines the use of multiple trained CNN models with a majority voting strategy. The CNN models were trained to classify chest CT images by transfer learning from well-known pre-trained CNN models and by applying their algorithm hyperparameters as appropriate. The combination of algorithm hyperparameters for a pre-trained CNN model was determined by uniform experimental design. The chest CT images (405 from COVID-19 patients and 397 from healthy patients) used for training and performance testing of the COVID19-CNN ensemble model were obtained from an earlier study by Hu in 2020. Experiments showed that, the COVID19-CNN ensemble model achieved 96.7% accuracy in classifying CT images as COVID-19 positive or negative, which was superior to the accuracies obtained by the individual trained CNN models. Other performance measures (i.e., precision, recall, specificity, and F1-score) obtained bythe COVID19-CNN ensemble model were higher than those obtained by individual trained CNN models. CONCLUSIONS The COVID19-CNN ensemble model had superior accuracy and excellent capability in classifying chest CT images as COVID-19 positive or negative.
Collapse
Affiliation(s)
- Yao-Mei Chen
- School of Nursing, Kaohsiung Medical University, Kaohsiung, 807 Taiwan
- Superintendent Office, Kaohsiung Medical University Hospital, Kaohsiung, 807 Taiwan
| | - Yenming J. Chen
- Management School, National Kaohsiung University of Science and Technology, Kaohsiung, 824 Taiwan
| | - Wen-Hsien Ho
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung, 807 Taiwan
- Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung, 807 Taiwan
| | - Jinn-Tsong Tsai
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung, 807 Taiwan
- Department of Computer Science, National Pingtung University, Pingtung, 900 Taiwan
| |
Collapse
|
41
|
Rahmanian S, Pourghasemi HR, Pouyan S, Karami S. Habitat potential modelling and mapping of Teucrium polium using machine learning techniques. Environ Monit Assess 2021; 193:759. [PMID: 34718878 DOI: 10.1007/s10661-021-09551-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 10/19/2021] [Indexed: 06/13/2023]
Abstract
Determining suitable habitats is important for the successful management and conservation of plant and wildlife species. Teucrium polium L. is a wild plant species found in Iran. It is widely used to treat numerous health problems. The range of this plant is shrinking due to habitat destruction and overexploitation. Therefore, habitat suitability (HS) modeling is critical for conservation. HS modeling can also identify the key characteristics of habitats that support this species. This study models the habitats of T. polium using five data mining models: random forest (RF), flexible discriminant analysis (FDA), multivariate adaptive regression splines (MARS), support vector machine (SVM), and generalized linear model (GLM). A total of 119 T. poliumlocations were identified and mapped. According to the RF model, the most important factors describing T. polium habitat were elevation, soil texture, and mean annual rainfall. HS maps (HSMs) were prepared, and habitat suitability was classified as low, medium, high, or very high. The percentages of the study area assigned high or very high suitability ratings by each of the models were 44.62% for FDA, 43.75% for GLM, 43.12% for SVM, 38.91% for RF, 28.72% for MARS, and 39.16% for their ensemble. Although the six models were reasonably accurate, the ensemble model had the highest AUC value, demonstrating a strong predictive performance. The rank order of the other models in this regard is RF, MARS, SVM, FDA, and GLM. HSMs can provide useful output to support the sustainable management of rangelands, reclamation, and land protection.
Collapse
Affiliation(s)
- Soroor Rahmanian
- Quantitative Plant Ecology and Biodiversity Research Lab, Department of Biology, Faculty of Science, Ferdowsi University of Mashhad, 9177948974, Mashhad, Iran
| | - Hamid Reza Pourghasemi
- Department of Natural Resources and Environmental Engineering, College of Agriculture, Shiraz University, 71441, 65186, Shiraz, Iran.
| | - Soheila Pouyan
- Department of Natural Resources and Environmental Engineering, College of Agriculture, Shiraz University, 71441, 65186, Shiraz, Iran
| | - Sahar Karami
- Quantitative Plant Ecology and Biodiversity Research Lab, Department of Biology, Faculty of Science, Ferdowsi University of Mashhad, 9177948974, Mashhad, Iran
| |
Collapse
|
42
|
Cui L, Wang S. Mapping the daily nitrous acid (HONO) concentrations across China during 2006-2017 through ensemble machine-learning algorithm. Sci Total Environ 2021; 785:147325. [PMID: 33957584 DOI: 10.1016/j.scitotenv.2021.147325] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 04/19/2021] [Accepted: 04/21/2021] [Indexed: 06/12/2023]
Abstract
Nitrous acid (HONO) is a major source of the hydroxyl radical (OH) and plays a key role in atmospheric photochemistry. The lack of spatially resolved HONO concentration information results in large knowledge gaps of HONO and its role in atmospheric chemistry and air pollution in China. In this work, an ensemble machine learning model comprising of random forest, gradient boosting, and back propagation neural network was proposed, for the first time, to estimate the long-term (2006-2017) daily HONO concentrations across China in 0.25° resolution. Further, the key factors controlling the space-time variablity of HONO concentrations were analyzed based on variable importance values. The ensemble model well characterized the spatiotemporal distribution of daily HONO concentrations with the sampled-based, site-based and by-year cross-validation (CV) R2 (RMSE) of 0.7 (0.36 ppbv), 0.67 (0.36 ppbv), and 0.62 (0.40 ppbv), respectively. HONO hotspots were mainly distributed in the Beijing-Tianjin-Hebei (BTH), Pearl River Delta (PRD), Yangtze River Delta (YRD), and several sites of Sichuan Basin, in line with the distribution patterns of the tropospheric NO2 columns and assimilated surface NO3- levels. The national HONO levels stagnated during 2006-2013, then declined after 2013 benefiting from the implementation of the Action Plan for Air Pollution Prevention and Control. The NO3- concentration, urban area, NO2 column density ranked as important variables for HONO prediction, while agricultral land, forest and grassland played minor roles in affecting HONO concentrations, suggesting the significant role of heterogeneous HONO production from anthropogenic precursor emissions. Leveraging the ground-level HONO observations, this study fills the gap of statistically modelling nationwide HONO in China, which provides essential data for atmospheric chemistry research.
Collapse
Affiliation(s)
- Lulu Cui
- State Key Joint Laboratory of Environmental Simulation and Pollution Control, School of Environment, Tsinghua University, Beijing 100084, China
| | - Shuxiao Wang
- State Key Joint Laboratory of Environmental Simulation and Pollution Control, School of Environment, Tsinghua University, Beijing 100084, China; State Environmental Protection Key Laboratory of Sources and Control of Air Pollution Complex, Beijing 100084, China.
| |
Collapse
|
43
|
Ke B, Nguyen H, Bui XN, Bui HB, Choi Y, Zhou J, Moayedi H, Costache R, Nguyen-Trang T. Predicting the sorption efficiency of heavy metal based on the biochar characteristics, metal sources, and environmental conditions using various novel hybrid machine learning models. Chemosphere 2021; 276:130204. [PMID: 34088091 DOI: 10.1016/j.chemosphere.2021.130204] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 02/17/2021] [Accepted: 03/04/2021] [Indexed: 06/12/2023]
Abstract
Heavy metals in water and wastewater are taken into account as one of the most hazardous environmental issues that significantly impact human health. The use of biochar systems with different materials helped significantly remove heavy metals in the water, especially wastewater treatment systems. Nevertheless, heavy metal's sorption efficiency on the biochar systems is highly dependent on the biochar characteristics, metal sources, and environmental conditions. Therefore, this study implicates the feasibility of biochar systems in the heavy metal sorption in water/wastewater and the use of artificial intelligence (AI) models in investigating efficiency sorption of heavy metal on biochar. Accordingly, this work investigated and proposed 20 artificial intelligent models for forecasting the sorption efficiency of heavy metal onto biochar based on five machine learning algorithms and bagging technique (BA). Accordingly, support vector machine (SVM), random forest (RF), artificial neural network (ANN), M5Tree, and Gaussian process (GP) algorithms were used as the key algorithms for the aim of this study. Subsequently, the individual models were bagged with each other to generate new ensemble models. Finally, 20 intelligent models were developed and evaluated, including SVM, RF, M5Tree, GP, ANN, BA-SVM, BA-RF, BA-M5Tree, BA-GP, BA-ANN, SVM-RF, SVM-M5Tree, SVM-GP, SVM-ANN, RF-M5Tree, RF-GP, RF-ANN, M5Tree-GP, M5Tree-ANN, GP-ANN. Of those, the hybrid models (i.e., BA-SVM, BA-RF, BA-M5Tree, BA-GP, BA-ANN, SVM-RF, SVM-M5Tree, SVM-GP, SVM-ANN, RF-M5Tree, RF-GP, RF-ANN, M5Tree-GP, M5Tree-ANN, GP-ANN) are introduced as the novelty of this study for estimating the heavy metal's sorption efficiency on the biochar systems. Also, the biochar characteristics, metal sources, and environmental conditions were comprehensively assessed and used, and they are considered as a novelty of the study as well. For this aim, a dataset of sorption efficiency of heavy metal was collected and processed with 353 experimental tests. Various performance indexes were applied to evaluate the models, such as RMSE, R2, MAE, color intensity, Taylor diagram, box and whiskers plots. This study's findings revealed that AI models could predict heavy metal's sorption efficiency onto biochar with high reliability, and the efficiency of the ensemble models is higher than those of individual models. The results also reported that the SVM-ANN ensemble model is the most superior model among 20 developed models. The predictive model proposed that heavy metal's efficiency sorption on biochar can be accurately forecasted and early warning for the water pollution by heavy metal.
Collapse
Affiliation(s)
- Bo Ke
- School of Resources and Environmental Engineering, Wuhan University of Technology, Wuhan, Hubei, 430070, China; School of Urban Construction, Wuchang University of Technology, Wuhan, 430223, China
| | - Hoang Nguyen
- Department of Surface Mining, Mining Faculty, Hanoi University of Mining and Geology, 18 Pho Vien, Duc Thang Ward, Bac Tu Liem District, Hanoi, 100000, Viet Nam.
| | - Xuan-Nam Bui
- Department of Surface Mining, Mining Faculty, Hanoi University of Mining and Geology, 18 Pho Vien, Duc Thang Ward, Bac Tu Liem District, Hanoi, 100000, Viet Nam; Center for Mining, Electro-Mechanical Research, Hanoi University of Mining and Geology, 18 Pho Vien, Duc Thang Ward, Bac Tu Liem District, Hanoi, 100000, Viet Nam
| | - Hoang-Bac Bui
- Faculty of Geosciences and Geoengineering, Hanoi University of Mining and Geology, 18 Vien St., Duc Thang Ward, Bac Tu Liem Dist., Hanoi, 100000, Viet Nam; Center for Excellence in Analysis and Experiment, Hanoi University of Mining and Geology, 18 Vien St., Duc Thang Ward, Bac Tu Liem Dist., Hanoi, 100000, Viet Nam.
| | - Yosoon Choi
- Department of Energy Resources Engineering, Pukyong National University, Busan, 48513, South Korea
| | - Jian Zhou
- School of Resources and Safety Engineering, Central South University, Changsha, Hunan, 410083, China
| | - Hossein Moayedi
- Department of Energy Resources Engineering, Universiti Teknologi Malaysia, Johor Bahru, Malaysia
| | - Romulus Costache
- Research Institute of the University of Bucharest, 90-92 Sos. Panduri, 5th District, Bucharest, Romania
| | - Thao Nguyen-Trang
- Division of Computational Mathematics and Engineering, Institute for Computational Science, Ton Duc Thang University, Ho Chi Minh City, 70000, Viet Nam; Faculty of Mathematics and Statistics, Ton Duc Thang University, Ho Chi Minh City, 700000, Viet Nam.
| |
Collapse
|
44
|
Mukherjee T, Sharma V, Sharma LK, Thakur M, Joshi BD, Sharief A, Thapa A, Dutta R, Dolker S, Tripathy B, Chandra K. Landscape-level habitat management plan through geometric reserve design for critically endangered Hangul (Cervus hanglu hanglu). Sci Total Environ 2021; 777:146031. [PMID: 33676208 DOI: 10.1016/j.scitotenv.2021.146031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Revised: 02/17/2021] [Accepted: 02/17/2021] [Indexed: 06/12/2023]
Abstract
Hangul (Cervus hanglu hanglu), the only red deer subspecies surviving in the Indian subcontinent, is of top conservation priority with global importance. Unfortunately, it has lost much of its historical distribution range, and it is now confined to Dachigam landscape within the Kashmir valley of India. The Government of India initiated a recovery plan in 2008 to augment their numbers through ex-situ conservation programs. However, it was necessary to identify potential hangul habitats in Kashmir valley for adopting landscape-level conservation planning for the species. Based on geometric aspects of reserve design, we modeled hangul habitat using an ensemble approach to identify hangul habitats. The present model indicates that the conifer and broadleaf mixed forests were the most suitable habitats. Only 9% of the total study landscape was found suitable for the species. We identified corridors among the suitable habitat blocks, which may be vital for the species' long-term genetic viability. We suggest reorganizing the existing management of Dachigam National Park (NP) following the landscape level and habitat block-level management planning based on the core principles of geometric reserve design. We recommend that the identified patch (PID-6) in the southern region of the landscape to be converted into a Conservation Reserve or merged with the Overa-Aru Wildlife Sanctuary. This habitat patch PID-6 may be a stepping stone habitat and vital for maintaining the species landscape connectivity and metapopulation dynamics.
Collapse
Affiliation(s)
| | - Vandana Sharma
- Indian Institute of Remote Sensing, Dehradun 248001, India
| | | | | | | | | | | | - Ritam Dutta
- Zoological Survey of India, Kolkata 700053, India
| | | | | | | |
Collapse
|
45
|
Tanveer MA, Khan MJ, Sajid H, Naseer N. Convolutional neural networks ensemble model for neonatal seizure detection. J Neurosci Methods 2021; 358:109197. [PMID: 33864835 DOI: 10.1016/j.jneumeth.2021.109197] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 04/11/2021] [Accepted: 04/12/2021] [Indexed: 10/21/2022]
Abstract
BACKGROUND Neonatal seizures are a common occurrence in clinical settings, requiring immediate attention and detection. Previous studies have proposed using manual feature extraction coupled with machine learning, or deep learning to classify between seizure and non-seizure states. NEW METHOD In this paper a deep learning based approach is used for neonatal seizure classification using electroencephalogram (EEG) signals. The architecture detects seizure activity in raw EEG signals as opposed to common state-of-art, where manual feature extraction with machine learning algorithms is used. The architecture is a two-dimensional (2D) convolutional neural network (CNN) to classify between seizure/non-seizure states. RESULTS The dataset used for this study is annotated by three experts and as such three separate models are trained on individual annotations, resulting in average accuracies (ACC) of 95.6 %, 94.8 % and 90.1 % respectively, and average area under the receiver operating characteristic curve (AUC) of 99.2 %, 98.4 % and 96.7 % respectively. The testing was done using 10-cross fold validation, so that the performance can be an accurate representation of the architectures classification capability in a clinical setting. After training/testing of the three individual models, a final ensemble model is made consisting of the three models. The ensemble model gives an average ACC and AUC of 96.3 % and 99.3 % respectively. COMPARISON WITH EXISTING METHODS This study outperforms previous studies, with increased ACC and AUC results coupled with use of small time windows (1 s) used for evaluation. CONCLUSION The proposed approach is promising for detecting seizure activity in unseen neonate data in a clinical setting.
Collapse
Affiliation(s)
- M Asjid Tanveer
- Intelligent Robotics Lab, National Center of Artificial Intelligence, National University of Science and Technology, Islamabad, Pakistan
| | - Muhammad Jawad Khan
- Intelligent Robotics Lab, National Center of Artificial Intelligence, National University of Science and Technology, Islamabad, Pakistan; School of Mechanical and Manufacturing Engineering, National Center of Artificial Intelligence, National University of Science and Technology, Islamabad, Pakistan.
| | - Hasan Sajid
- Intelligent Robotics Lab, National Center of Artificial Intelligence, National University of Science and Technology, Islamabad, Pakistan; School of Mechanical and Manufacturing Engineering, National Center of Artificial Intelligence, National University of Science and Technology, Islamabad, Pakistan
| | - Noman Naseer
- Department of Mechatronics and Biomedical Engineering, Air University, Islamabad, Pakistan
| |
Collapse
|
46
|
Yu X, Yang Q, Wang D, Li Z, Chen N, Kong DX. Predicting lung adenocarcinoma disease progression using methylation-correlated blocks and ensemble machine learning classifiers. PeerJ 2021; 9:e10884. [PMID: 33628643 PMCID: PMC7894106 DOI: 10.7717/peerj.10884] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 01/12/2021] [Indexed: 01/20/2023] Open
Abstract
Applying the knowledge that methyltransferases and demethylases can modify adjacent cytosine-phosphorothioate-guanine (CpG) sites in the same DNA strand, we found that combining multiple CpGs into a single block may improve cancer diagnosis. However, survival prediction remains a challenge. In this study, we developed a pipeline named "stacked ensemble of machine learning models for methylation-correlated blocks" (EnMCB) that combined Cox regression, support vector regression (SVR), and elastic-net models to construct signatures based on DNA methylation-correlated blocks for lung adenocarcinoma (LUAD) survival prediction. We used methylation profiles from the Cancer Genome Atlas (TCGA) as the training set, and profiles from the Gene Expression Omnibus (GEO) as validation and testing sets. First, we partitioned the genome into blocks of tightly co-methylated CpG sites, which we termed methylation-correlated blocks (MCBs). After partitioning and feature selection, we observed different diagnostic capacities for predicting patient survival across the models. We combined the multiple models into a single stacking ensemble model. The stacking ensemble model based on the top-ranked block had the area under the receiver operating characteristic curve of 0.622 in the TCGA training set, 0.773 in the validation set, and 0.698 in the testing set. When stratified by clinicopathological risk factors, the risk score predicted by the top-ranked MCB was an independent prognostic factor. Our results showed that our pipeline was a reliable tool that may facilitate MCB selection and survival prediction.
Collapse
Affiliation(s)
- Xin Yu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, Hubei, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Qian Yang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Dong Wang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, Hubei, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Zhaoyang Li
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Nianhang Chen
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei, China
| | - De-Xin Kong
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, Hubei, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei, China
| |
Collapse
|
47
|
Mukherjee T, Sharma LK, Kumar V, Sharief A, Dutta R, Kumar M, Joshi BD, Thakur M, Venkatraman C, Chandra K. Adaptive spatial planning of protected area network for conserving the Himalayan brown bear. Sci Total Environ 2021; 754:142416. [PMID: 33254933 DOI: 10.1016/j.scitotenv.2020.142416] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 09/12/2020] [Accepted: 09/14/2020] [Indexed: 06/12/2023]
Abstract
Large mammals that occur in low densities, particularly in the high-altitude areas, are globally threatened due to fragile climatic and ecological envelopes. Among bear species, the Himalayan brown bear (Ursus arctos isabellinus) has a distribution that is restricted to Himalayan highlands with relatively small and fragmented populations. To date, very little scientific information on the Himalayan brown bear, which is vital for the conservation of the species and the management of its habitats, especially in protected areas of the landscape, is available. The present study aims to understand the effectiveness of existing Himalayan Protected Areas in terms of representativeness for the conservation of Himalayan brown bear (HBB), an umbrella species in high-altitude habitats of the Himalayan region. We used the ensemble approach of the species distribution model and then assessed biological connectivity to predict the current and future distribution and movement of HBB in climate change scenarios for the year 2050. Approximately 33 protected areas (PAs) currently possess suitable habitats. Our model suggests a massive decline of approximately 73.38% and 72.87% under 4.5 and 8.5 representative concentration pathway (RCP) respectively in the year 2050 compared with the current distribution. The predicted change in suitability will result in loss of habitats from thirteen PAs; eight will become completely uninhabitable by the year 2050, followed by loss of connectivity in the majority of PAs. Habitat configuration analysis suggested a 40% decline in the number of suitable patches, a reduction in large habitat patches (up to 50%) and aggregation of suitable areas (9%) by 2050, indicating fragmentation. The predicted change in geographic isotherm will result in loss of habitats from thirteen PAs, eight of them will become completely inhabitable. Hence, these PAs may lose their effectiveness and representativeness in achieving the very objective of their existence or conservation goals. Therefore, we recommend adaptive spatial planning for protecting suitable habitats distributed outside the PA for climate change adaptation.
Collapse
Affiliation(s)
- Tanoy Mukherjee
- Zoological Survey of India, Prani Vigyan Bhawan, New Alipore, Kolkata 700053, West Bengal, India
| | - Lalit Kumar Sharma
- Zoological Survey of India, Prani Vigyan Bhawan, New Alipore, Kolkata 700053, West Bengal, India.
| | - Vineet Kumar
- Zoological Survey of India, Prani Vigyan Bhawan, New Alipore, Kolkata 700053, West Bengal, India; Saurashtra University, Rajkot 360005, Gujarat, India
| | - Amira Sharief
- Zoological Survey of India, Prani Vigyan Bhawan, New Alipore, Kolkata 700053, West Bengal, India; Saurashtra University, Rajkot 360005, Gujarat, India
| | - Ritam Dutta
- Zoological Survey of India, Prani Vigyan Bhawan, New Alipore, Kolkata 700053, West Bengal, India
| | - Manish Kumar
- Zoological Survey of India, Prani Vigyan Bhawan, New Alipore, Kolkata 700053, West Bengal, India
| | - Bheem Dutt Joshi
- Zoological Survey of India, Prani Vigyan Bhawan, New Alipore, Kolkata 700053, West Bengal, India
| | - Mukesh Thakur
- Zoological Survey of India, Prani Vigyan Bhawan, New Alipore, Kolkata 700053, West Bengal, India
| | - Chinnadurai Venkatraman
- Zoological Survey of India, Prani Vigyan Bhawan, New Alipore, Kolkata 700053, West Bengal, India
| | - Kailash Chandra
- Zoological Survey of India, Prani Vigyan Bhawan, New Alipore, Kolkata 700053, West Bengal, India
| |
Collapse
|
48
|
Gifani P, Shalbaf A, Vafaeezadeh M. Automated detection of COVID-19 using ensemble of transfer learning with deep convolutional neural network based on CT scans. Int J Comput Assist Radiol Surg 2021; 16:115-123. [PMID: 33191476 PMCID: PMC7667011 DOI: 10.1007/s11548-020-02286-w] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Accepted: 10/23/2020] [Indexed: 12/18/2022]
Abstract
PURPOSE COVID-19 has infected millions of people worldwide. One of the most important hurdles in controlling the spread of this disease is the inefficiency and lack of medical tests. Computed tomography (CT) scans are promising in providing accurate and fast detection of COVID-19. However, determining COVID-19 requires highly trained radiologists and suffers from inter-observer variability. To remedy these limitations, this paper introduces an automatic methodology based on an ensemble of deep transfer learning for the detection of COVID-19. METHODS A total of 15 pre-trained convolutional neural networks (CNNs) architectures: EfficientNets(B0-B5), NasNetLarge, NasNetMobile, InceptionV3, ResNet-50, SeResnet 50, Xception, DenseNet121, ResNext50 and Inception_resnet_v2 are used and then fine-tuned on the target task. After that, we built an ensemble method based on majority voting of the best combination of deep transfer learning outputs to further improve the recognition performance. We have used a publicly available dataset of CT scans, which consists of 349 CT scans labeled as being positive for COVID-19 and 397 negative COVID-19 CT scans that are normal or contain other types of lung diseases. RESULTS The experimental results indicate that the majority voting of 5 deep transfer learning architecture with EfficientNetB0, EfficientNetB3, EfficientNetB5, Inception_resnet_v2, and Xception has the higher results than the individual transfer learning structure and among the other models based on precision (0.857), recall (0.854) and accuracy (0.85) metrics in diagnosing COVID-19 from CT scans. CONCLUSION Our study based on an ensemble deep transfer learning system with different pre-trained CNNs architectures can work well on a publicly available dataset of CT images for the diagnosis of COVID-19 based on CT scans.
Collapse
Affiliation(s)
- Parisa Gifani
- Department of Biomedical Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Ahmad Shalbaf
- Department of Biomedical Engineering and Medical Physics, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Majid Vafaeezadeh
- School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
| |
Collapse
|
49
|
Singh P, Kaur R. An integrated fog and Artificial Intelligence smart health framework to predict and prevent COVID-19. Glob Transit 2020; 2:283-292. [PMID: 33205037 PMCID: PMC7659515 DOI: 10.1016/j.glt.2020.11.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 10/09/2020] [Accepted: 11/01/2020] [Indexed: 05/18/2023]
Abstract
Nowadays, COVID-19 is spreading at a rapid rate in almost all the continents of the world. It has already affected many people who are further spreading it day by day. Hence, it is the most essential to alert nearby people to be aware of it due to its communicable behavior. Till May 2020, no vaccine is available for the treatment of this COVID-19, but the existing technologies can be used to minimize its effect. Cloud/fog computing could be used to monitor and control this rapidly spreading infection in a cost-effective and time-saving manner. To strengthen COVID-19 patient prediction, Artificial Intelligence(AI) can be integrated with cloud/fog computing for practical solutions. In this paper, fog assisted the internet of things based quality of service framework is presented to prevent and protect from COVID-19. It provides real-time processing of users' health data to predict the COVID-19 infection by observing their symptoms and immediately generates an emergency alert, medical reports, and significant precautions to the user, their guardian as well as doctors/experts. It collects sensitive information from the hospitals/quarantine shelters through the patient IoT devices for taking necessary actions/decisions. Further, it generates an alert message to the government health agencies for controlling the outbreak of chronic illness and for tanking quick and timely actions.
Collapse
Affiliation(s)
- Prabhdeep Singh
- Department of Computer Science & Engineering, Punjabi University, Patiala, IN, India
| | - Rajbir Kaur
- Department of Electronics & Communication Engineering, Punjabi University, Patiala, IN, India
| |
Collapse
|
50
|
Saha S, Saha M, Mukherjee K, Arabameri A, Ngo PTT, Paul GC. Predicting the deforestation probability using the binary logistic regression, random forest, ensemble rotational forest, REPTree: A case study at the Gumani River Basin, India. Sci Total Environ 2020; 730:139197. [PMID: 32402979 DOI: 10.1016/j.scitotenv.2020.139197] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2020] [Revised: 05/01/2020] [Accepted: 05/01/2020] [Indexed: 04/15/2023]
Abstract
Rapid population growth and its corresponding effects like the expansion of human settlement, increasing agricultural land, and industry lead to the loss of forest area in most parts of the world especially in such highly populated nations like India. Forest canopy density (FCD) is a useful measure to assess the forest cover change in its own as numerous works of forest change have been done using only FCD with the help of remote sensing and GIS. The coupling of binary logistic regression (BLR), random forest (RF), ensemble of rotational forest and reduced error pruning trees (RTF-REPTree) with FCD makes it more convenient to find out the deforestation probability. Advanced vegetation index (AVI), bare soil index (BSI), shadow index (SI), and scaled vegetation density (VD) derived from Landsat imageries are the main input parameters to identify the FCD. After preparing the FCDs of 1990, 2000, 2010 and 2017 the deforestation map of the study area was prepared and considered as dependent parameter for deforestation probability modelling. On the other hand, twelve deforestation determining factors were used to delineate the deforestation probability with the help of BLR, RF and RTF-REPTree models. These deforestation probability models were validated through area under curve (AUC), receiver operating characteristics (ROC), efficiency, true skill statistics (TSS) and Kappa co-efficient. The validation result shows that all the models like BLR (AUC = 0.874), RF (AUC = 0.886) and RTF-REPTree (AUC = 0.919) have good capability of assessing the deforestation probability but among them, RTF-REPTree has the highest accuracy level. The result also shows that low canopy density area i.e. not under the dense forest cover has increased by 9.26% from 1990 to 2017. Besides, nearly 30% of the forested land is under high to very high deforestation probable zone, which needs to be protected with immediate measures.
Collapse
Affiliation(s)
- Sunil Saha
- Department of Geography, University of Gour Banga, Malda, West Bengal, India
| | - Mantosh Saha
- Research Scholar, Department of Geography, University of Gour Banga, India
| | - Kaustuv Mukherjee
- Department of Geography, Chandidas Mahavidyalaya, Khujutipara, Birbhum, India
| | - Alireza Arabameri
- Department of Geomorphology, Tarbiat Modares University, Tehran, Iran.
| | - Phuong Thao Thi Ngo
- Institute of Research and Development, Duy Tan University, Da Nang 550000, Viet Nam.
| | - Gopal Chandra Paul
- Research Scholar, Dept. of Geography, University of Gour Banga, Malda, India
| |
Collapse
|