51
|
Alrayes FS, Maray M, Alshuhail A, Almustafa KM, Darem AA, Al-Sharafi AM, Alotaibi SD. Privacy-preserving approach for IoT networks using statistical learning with optimization algorithm on high-dimensional big data environment. Sci Rep 2025; 15:3338. [PMID: 39870824 PMCID: PMC11772597 DOI: 10.1038/s41598-025-87454-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Accepted: 01/20/2025] [Indexed: 01/29/2025] Open
Abstract
In the present digital scenario, the explosion of Internet of Things (IoT) devices makes massive volumes of high-dimensional data, presenting significant data and privacy security challenges. As IoT networks enlarge, certifying sensitive data privacy while still employing data analytics authority is vital. In the period of big data, statistical learning has seen fast progressions in methodological practical and innovation applications. Privacy-preserving machine learning (ML) training in the development of aggregation permits a demander to firmly train ML techniques with the delicate data of IoT collected from IoT devices. The current solution is primarily server-assisted and fails to address collusion attacks among servers or data owners. Additionally, it needs to adequately account for the complex dynamics of the IoT environment. In a large-sized big data environment, privacy protection challenges are additionally enlarged. The data dimensional can have vague meaningful patterns, making it challenging to certify that privacy-preserving models do not destroy the efficacy and accuracy of statistical methods. This manuscript presents a Privacy-Preserving Statistical Learning with an Optimization Algorithm for a High-Dimensional Big Data Environment (PPSLOA-HDBDE) approach. The primary purpose of the PPSLOA-HDBDE approach is to utilize advanced optimization and ensemble techniques to ensure data confidentiality while maintaining analytical efficacy. In the primary stage, the linear scaling normalization (LSN) method scales the input data. Besides, the sand cat swarm optimizer (SCSO)-based feature selection (FS) process is employed to decrease the high dimensionality problem. Moreover, the recognition of intrusion detection takes place by using an ensemble of temporal convolutional network (TCN), multi-layer auto-encoder (MAE), and extreme gradient boosting (XGBoost) models. Lastly, the hyperparameter tuning of the three models is accomplished by utilizing an improved marine predator algorithm (IMPA) method. An extensive range of experimentations is performed to improve the PPSLOA-HDBDE technique's performance, and the outcomes are examined under distinct measures. The performance validation of the PPSLOA-HDBDE technique illustrated a superior accuracy value of 99.49% over existing models.
Collapse
|
research-article |
1 |
|
52
|
Jin Z, Ma Y, Chu L, Liu Y, Dubrow R, Chen K. Predicting spatiotemporally-resolved mean air temperature over Sweden from satellite data using an ensemble model. ENVIRONMENTAL RESEARCH 2022; 204:111960. [PMID: 34464620 DOI: 10.1016/j.envres.2021.111960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 07/29/2021] [Accepted: 08/23/2021] [Indexed: 06/13/2023]
Abstract
Mapping of air temperature (Ta) at high spatiotemporal resolution is critical to reducing exposure assessment errors in epidemiological studies on the health effects of air temperature. In this study, we applied a three-stage ensemble model to estimate daily mean Ta from satellite-based land surface temperature (Ts) over Sweden during 2001-2019 at a high spatial resolution of 1 × 1 km2. The ensemble model incorporated four base models, including a generalized additive model (GAM), a generalized additive mixed model (GAMM), and two machine learning models (random forest [RF] and extreme gradient boosting [XGBoost]), and allowed the weights for each model to vary over space, with the best-performing model for each grid cell assigned the highest weight. Various spatial predictors were included as adjustment variables in all the base models, including land cover type, normalized difference vegetation index (NDVI), and elevation. The ensemble model showed high performance with an overall R2 of 0.98 and a root mean square error of 1.38 °C in the ten-fold cross-validation, and outperformed each of the four base models. Although each base model performed well, the two machine learning models (RF [R2 = 0.97], XGBoost [R2 = 0.98]) had better performance than the two regression models (GAM [R2 = 0.95], GAMM [R2 = 0.96]). In the machine learning models, Ts was the dominant predictor of Ta, followed by day of year, NDVI, latitude, elevation, and longitude. The highly spatiotemporally-resolved Ta can improve temperature exposure assessment in future epidemiological studies.
Collapse
|
Research Support, N.I.H., Extramural |
3 |
|
53
|
Tache IA, Hatfaludi CA, Puiu A, Itu LM, Popa-Fotea NM, Calmac L, Scafa-Udriste A. Assessment of the functional severity of coronary lesions from optical coherence tomography based on ensembled learning. Biomed Eng Online 2023; 22:127. [PMID: 38104144 PMCID: PMC10724936 DOI: 10.1186/s12938-023-01192-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 12/07/2023] [Indexed: 12/19/2023] Open
Abstract
BACKGROUND Atherosclerosis is one of the most frequent cardiovascular diseases. The dilemma faced by physicians is whether to treat or postpone the revascularization of lesions that fall within the intermediate range given by an invasive fractional flow reserve (FFR) measurement. The paper presents a monocentric study for lesions significance assessment that can potentially cause ischemia on the large coronary arteries. METHODS A new dataset is acquired, comprising the optical coherence tomography (OCT) images, clinical parameters, echocardiography and FFR measurements collected from 80 patients with 102 lesions, with stable multivessel coronary artery disease. Having the ground truth given by the invasive FFR measurement, the dataset is challenging because almost 40% of the lesions are in the gray zone, having an FFR value between 0.75 and 0.85. Twenty-six features are extracted from OCT images, clinical characteristics, and echocardiography and the most relevant are identified by examining the models' accuracy. An ensembled learning is performed for solving the binary classification problem of lesion significance considering the leave-one-out cross-validation approach. RESULTS Ensemble models are designed from the multi-features voting from 5 features models by prediction aggregation with a maximum accuracy of 81.37% and a maximum area under the curve score (AUC) of 0.856. CONCLUSIONS The proposed explainable supervised learning-based lesion classification is a new method that can be improved by training with a larger multicenter dataset for further designing a tool for guiding the decision making of the clinician for the cases outside the gray zone and for the other situation extra clinical information about the lesion is needed.
Collapse
|
research-article |
2 |
|
54
|
Jin Z, Zhao H, Xian X, Li M, Qi Y, Guo J, Yang N, Lü Z, Liu W. Early warning and management of invasive crop pests under global warming: estimating the global geographical distribution patterns and ecological niche overlap of three Diabrotica beetles. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024; 31:13575-13590. [PMID: 38253826 DOI: 10.1007/s11356-024-32076-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 01/15/2024] [Indexed: 01/24/2024]
Abstract
Invasive alien pests (IAPs) pose a major threat to global agriculture and food production. When multiple IAPs coexist in the same habitat and use the same resources, the economic loss to local agricultural production increases. Many species of the Diabrotica genus, such as Diabrotica barberi, Diabrotica undecimpunctata, and Diabrotica virgifera, originating from the USA and Mexico, seriously damaged maize production in North America and Europe. However, the potential geographic distributions (PGDs) and degree of ecological niche overlap among the three Diabrotica beetles remain unclear; thus, the potential coexistence zone is unknown. Based on environmental and species occurrence data, we used an ensemble model (EM) to predict the PGDs and overlapping PGD of the three Diabrotica beetles. The n-dimensional hypervolumes concept was used to explore the degree of niche overlap among the three species. The EM showed better reliability than the individual models. According to the EM results, the PGDs and overlapping PGD of the three Diabrotica beetles were mainly distributed in North America, Europe, and Asia. Under the current scenario, D. virgifera has the largest PGD ranges (1615 × 104 km2). In the future, the PGD of this species will expand further and reach a maximum under the SSP5-8.5 scenario in the 2050s (2499 × 104 km2). Diabrotica virgifera showed the highest potential for invasion under the current and future global warming scenarios. Among the three studied species, the degree of ecological niche overlap was the highest for D. undecimpunctata and D. virgifera, with the highest similarity in the PGD patterns and maximum coexistence range. Under global warming, the PGDs of the three Diabrotica beetles are expected to expand to high latitudes. Identifying the PGDs of the three Diabrotica beetles provides an important reference for quarantine authorities in countries at risk of invasion worldwide to develop specific preventive measures against pests.
Collapse
|
|
1 |
|
55
|
Ledwidge MJ, Monk J, Mason SJ, Arnould JPY. Using vessels of opportunity for determining important habitats of bottlenose dolphins in Port Phillip Bay, south-eastern Australia. PeerJ 2024; 12:e18400. [PMID: 39494272 PMCID: PMC11531264 DOI: 10.7717/peerj.18400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 10/04/2024] [Indexed: 11/05/2024] Open
Abstract
Understanding species' critical habitat requirements is crucial for effective conservation and management. However, such information can be challenging to obtain, particularly for highly mobile, wide-ranging species such as cetaceans. In the absence of systematic surveys, alternative economically viable methods are needed, such as the use of data collected from platforms of opportunity, and modelling techniques to predict species distribution in un-surveyed areas. The present study used data collected by ecotourism and other vessels of opportunity to investigate important habitats of a small, poorly studied population of bottlenose dolphins in Port Phillip Bay, south-eastern Australia. Using 16 years of dolphin sighting location data, an ensemble habitat suitability model was built from which physical factors influencing dolphin distribution were identified. Results indicated that important habitats were those areas close to shipping channels and coastlines with these factors primarily influencing the variation in the likelihood of dolphin presence. The relatively good performance of the ensemble model suggests that simple presence-background data may be sufficient for predicting the species distribution where sighting data are limited. However, additional data from the center of Port Phillip Bay is required to further support this contention. Important habitat features identified in the study are likely to relate to favorable foraging conditions for dolphins as they are known to provide feeding, breeding, and spawning habitat for a diverse range of fish and cephalopod prey species. The results of the present study highlight the importance of affordable community-based data collection, such as ecotourism vessels, for obtaining information critical for effective management.
Collapse
|
research-article |
1 |
|
56
|
Araki S, Shimadera H, Chatani S, Kitayama K, Shima M. Long-term spatiotemporal variation of benzo[a]pyrene in Japan: Significant decrease in ambient concentrations, human exposure, and health risk. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024; 360:124650. [PMID: 39111529 DOI: 10.1016/j.envpol.2024.124650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 07/24/2024] [Accepted: 07/30/2024] [Indexed: 08/15/2024]
Abstract
Although Benzo[a]pyrene (BaP) is considered carcinogenic to humans, the health effects of exposure to ambient levels have not been sufficiently investigated. This study estimated the long-term spatiotemporal variation of BaP in Japan over nearly two decades at a fine spatial resolution of 1 km. This study aimed to obtain an accurate spatiotemporal distribution of BaP that can be used in epidemiological studies on the health effects of ambient BaP exposure. The annual BaP concentrations were estimated using an ensemble machine learning approach using various predictors, including the concentrations and emission intensities of the criteria air pollutants, and meteorological, land use, and traffic-related variables. The model performance, evaluated by location-based cross-validation, exhibited satisfactory accuracy (R2 of 0.693). Densely populated areas showed higher BaP levels and greater temporal reduction, whereas BaP levels remained higher in some industrial areas. The population-weighted BaP in 2018 was 0.12 ng m-3, a decrease of approximately 70% from its 2000 value of 0.44 ng m-3, which was also reflected in the estimated excess number of lung cancer incidences. Accordingly, the proportion of BaP exposure below 0.12 ng m-3, which is the BaP concentration associated with an excess lifetime cancer risk of 10-5, reached 67% in 2018. Our estimates can be used in epidemiological studies to assess the health effects of BaP exposure at ambient concentrations.
Collapse
|
|
1 |
|
57
|
Zamani MG, Nikoo MR, Jahanshahi S, Barzegar R, Meydani A. Forecasting water quality variable using deep learning and weighted averaging ensemble models. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:124316-124340. [PMID: 37996598 DOI: 10.1007/s11356-023-30774-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 10/27/2023] [Indexed: 11/25/2023]
Abstract
Water quality variables, including chlorophyll-a (Chl-a), play a pivotal role in comprehending and evaluating the condition of aquatic ecosystems. Chl-a, a pigment present in diverse aquatic organisms, notably algae and cyanobacteria, serves as a valuable indicator of water quality. Thus, the objectives of this study encompass: (1) the assessment of the predictive capabilities of four deep learning (DL) models - namely, recurrent neural network (RNN), long short-term memory (LSTM), gated recurrence unit (GRU), and temporal convolutional network (TCN) - in forecasting Chl-a concentrations; (2) the incorporation of these DL models into ensemble models (EMs) employing genetic algorithm (GA) and non-dominated sorting genetic algorithm (NSGA-II) to harness the strengths of each standalone model; and (3) the evaluation of the efficacy of the developed EMs. Utilizing data collected at 15-min intervals from Small Prespa Lake (SPL) in Greece, the models employed hourly Chl-a concentration lag times, extending up to 6 h, as models' inputs to forecast Chla (t+1). The proposed models underwent training on 70% of the dataset and were subsequently validated on the remaining 30%. Among the standalone DL models, the GRU model exhibited superior performance in Chl-a forecasting, surpassing the RNN, LSTM, and TCN models by 8%, 2%, and 2%, respectively. Furthermore, the integration of DL models through single-objective GA and multi-objective NSGA-II optimization algorithms yielded hybrid models adept at effectively forecasting both low and high Chl-a concentrations. The ensemble model based on NSGA-II outperformed standalone DL models as well as the GA-based model across a range of evaluation indices. For instance, considering the R-squared metric, the study's findings demonstrated that the EM-NSGA-II stands out with exceptional effectiveness compared to DL and EM-GA models, showcasing improvements of 14% (RNN), 8% (LSTM), 6% (GRU), 8% (TCN), and 3% (EM-GA) during the testing phase.
Collapse
|
|
2 |
|
58
|
Wang S, Lin M, Meng Y, Jiang T, Fan F, Wang S. Self-expansion full information optimization strategy: Convenient and efficient method for near infrared spectrum auto-analysis. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2023; 303:123224. [PMID: 37603976 DOI: 10.1016/j.saa.2023.123224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/06/2023] [Accepted: 07/31/2023] [Indexed: 08/23/2023]
Abstract
An essential step in the application of near infrared spectroscopy technology is the spectrum preprocessing. A reasonable implementation of it ensures that the effective spectral information is correctly extracted and, also that the model's accuracy is increased. However, some analysts' research still uses the manual approach of trial and error, particularly those less skilled ones. Previous papers have provided preprocessing optimization algorithms for NIR, but there are still some problems that need to be resolved, such as the unwieldy sequence determination of preprocessing method or, the fluctuated optimization outcomes or, lack of sufficient statistical information. This research suggests a spectrum auto-analysis methodology named self-expansion full information optimization strategy, a new powerful open-source technique for concurrently addressing all of these above issues simultaneously. For the first time in the field of chemometrics, this algorithm offers a reliable and effective automatic near infrared auto-modelling method based on the statistical informatics. With the aid of its built-in modules, such as information generators, spectrum processors, etc., it is able to fully search the common preprocessing techniques, which is determined by Monte Carlo cross validation. Then the final ensemble calibration model is built by employing the optimized preprocessing schemes, along with the wavelength variables screening algorithm. The optimization strategy can offer the user objective useful statistics information created throughout the modeling process to further examine the model's effectiveness. The results demonstrate that the suggested method can easily and successfully extract spectrum information and develop calibration models by putting it to the test on two groups of actual near-infrared spectral data. Additionally, this optimization strategy can also be applied to other spectrum analysis areas, such Raman spectroscopy or infrared spectroscopy, by changing a few of its parameters, and has extraordinary application value.
Collapse
|
|
2 |
|
59
|
Shahabi MS, Shalbaf A, Rostami R. Prediction of response to repetitive transcranial magnetic stimulation for major depressive disorder using hybrid Convolutional recurrent neural networks and raw Electroencephalogram Signal. Cogn Neurodyn 2023; 17:909-920. [PMID: 37522037 PMCID: PMC10374518 DOI: 10.1007/s11571-022-09881-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 08/03/2022] [Accepted: 08/28/2022] [Indexed: 11/30/2022] Open
Abstract
Major Depressive Disorder (MDD) is a high prevalence disease that needs an effective and timely treatment to prevent its progress and additional costs. Repetitive Transcranial Magnetic Stimulation (rTMS) is an effective treatment option for MDD patients which uses strong magnetic pulses to stimulate specific regions of the brain. However, some patients do not respond to this treatment which causes the waste of multiple weeks as treatment time and clinical resources. Therefore developing an effective way for the prediction of response to the rTMS treatment of depression is necessary. In this work, we proposed a hybrid model created by pre-trained Convolutional Neural Networks (CNN) models and Bidirectional Long Short-Term Memory (BLSTM) cells to predict response to rTMS treatment from raw EEG signal. Three pre-trained CNN models named VGG16, InceptionResNetV2, and EffecientNetB0 were utilized as Transfer Learning (TL) models to construct hybrid TL-BLSTM models. Then an ensemble of these models was created using weighted majority voting which the weights were optimized by Differential Evolution (DE) optimization algorithm. Evaluation of these models shows the superior performance of the ensemble model by the accuracy of 98.51%, sensitivity of 98.64%, specificity of 98.36%, F1-score of 98.6%, and AUC of 98.5%. Therefore, the ensemble of the proposed hybrid convolutional recurrent networks can efficiently predict the treatment outcome of rTMS using raw EEG data.
Collapse
|
research-article |
2 |
|
60
|
Yakovyna V, Shakhovska N, Szpakowska A. A novel hybrid supervised and unsupervised hierarchical ensemble for COVID-19 cases and mortality prediction. Sci Rep 2024; 14:9782. [PMID: 38684770 PMCID: PMC11059164 DOI: 10.1038/s41598-024-60637-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 04/25/2024] [Indexed: 05/02/2024] Open
Abstract
Though COVID-19 is no longer a pandemic but rather an endemic, the epidemiological situation related to the SARS-CoV-2 virus is developing at an alarming rate, impacting every corner of the world. The rapid escalation of the coronavirus has led to the scientific community engagement, continually seeking solutions to ensure the comfort and safety of society. Understanding the joint impact of medical and non-medical interventions on COVID-19 spread is essential for making public health decisions that control the pandemic. This paper introduces two novel hybrid machine-learning ensembles that combine supervised and unsupervised learning for COVID-19 data classification and regression. The study utilizes publicly available COVID-19 outbreak and potential predictive features in the USA dataset, which provides information related to the outbreak of COVID-19 disease in the US, including data from each of 3142 US counties from the beginning of the epidemic (January 2020) until June 2021. The developed hybrid hierarchical classifiers outperform single classification algorithms. The best-achieved performance metrics for the classification task were Accuracy = 0.912, ROC-AUC = 0.916, and F1-score = 0.916. The proposed hybrid hierarchical ensemble combining both supervised and unsupervised learning allows us to increase the accuracy of the regression task by 11% in terms of MSE, 29% in terms of the area under the ROC, and 43% in terms of the MPP metric. Thus, using the proposed approach, it is possible to predict the number of COVID-19 cases and deaths based on demographic, geographic, climatic, traffic, public health, social-distancing-policy adherence, and political characteristics with sufficiently high accuracy. The study reveals that virus pressure is the most important feature in COVID-19 spread for classification and regression analysis. Five other significant features were identified to have the most influence on COVID-19 spread. The combined ensembling approach introduced in this study can help policymakers design prevention and control measures to avoid or minimize public health threats in the future.
Collapse
|
research-article |
1 |
|
61
|
G UM, P UM. SmartScanPCOS: A feature-driven approach to cutting-edge prediction of Polycystic Ovary Syndrome using Machine Learning and Explainable Artificial Intelligence. Heliyon 2024; 10:e39205. [PMID: 39492914 PMCID: PMC11530826 DOI: 10.1016/j.heliyon.2024.e39205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 10/08/2024] [Accepted: 10/09/2024] [Indexed: 11/05/2024] Open
Abstract
PolyCystic Ovarian Syndrome (PCOS) poses significant challenges to women's reproductive health due to its diagnostic complexity arising from a variety of symptoms, including hirsutism, anovulation, pain, obesity, hyperandrogenism, and oligomenorrhea, necessitating multiple clinical tests. Leveraging Artificial Intelligence (AI) in healthcare offers several benefits that can significantly impact patient care, streamline operations, and improve medical outcomes overall. This study presents an Explainable Artificial Intelligence (XAI)-driven PCOS smart predictor, structured as a hierarchical ensemble consisting of two tiers of Random Forest classifiers following extensive analysis of seven conventional classifiers and two additional stacking ensemble classifiers. An open-source data set comprising numerical parametric features linked to PCOS for classifier training was used. Moreover, to identify essential features for PCOS prediction three feature selection methods: Threshold-driven Optimized Principal Component Analysis (TOPCA), Optimized Salp Swarm (OSSM), and Threshold-driven Optimized Mutual Information Method (TOMIM) were fine-tuned through thresholding and improvisation to detect diverse attribute sets with varying numbers and combinations. Notably, the two-level Random Forest classifier model outperformed others with a remarkable 99.31 % accuracy by employing the top 17 features selected through the Threshold-driven Optimized Mutual Information Method (TOMIM) along with anoverallaccuracy of 99.32 % with 8 fold cross validation for 25 runs. The Smart predictor, constructed using Shapash - a Python library for Explainable Artificial Intelligence - was utilized to deploy the two-level Random Forest classifier model. Ensuring transparency and result reliability, visualizations from robust Explainable AI libraries were employed at different prediction stages for all considered classifiers in this study.
Collapse
|
research-article |
1 |
|
62
|
He M, Bakker EM, Lew MS. DPD (DePression Detection) Net: a deep neural network for multimodal depression detection. Health Inf Sci Syst 2024; 12:53. [PMID: 39544256 PMCID: PMC11557813 DOI: 10.1007/s13755-024-00311-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Accepted: 10/14/2024] [Indexed: 11/17/2024] Open
Abstract
Depression is one of the most prevalent mental conditions which could impair people's productivity and lead to severe consequences. The diagnosis of this disease is complex as it often relies on a physician's subjective interview-based screening. The aim of our work is to propose deep learning models for automatic depression detection by using different data modalities, which could assist in the diagnosis of depression. Current works on automatic depression detection mostly are tested on a single dataset, which might lack robustness, flexibility and scalability. To alleviate this problem, we design a novel Graph Neural Network-enhanced Transformer model named DePressionDetect Net (DPD Net) that leverages textual, audio and visual features and can work under two different application settings: the clinical setting and the social media setting. The model consists of a unimodal encoder module for encoding single modality, a multimodal encoder module for integrating the multimodal information, and a detection module for producing the final prediction. We also propose a model named DePressionDetect-with-EEG Net (DPD-E Net) to incorporate Electroencephalography (EEG) signals and speech data for depression detection. Experiments across four benchmark datasets show that DPD Net and DPD-E Net can outperform the state-of-the-art models on three datasets (i.e., E-DAIC dataset, Twitter depression dataset and MODMA dataset), and achieve competitive performance on the fourth one (i.e., D-vlog dataset). Ablation studies demonstrate the advantages of the proposed modules and the effectiveness of combining diverse modalities for automatic depression detection.
Collapse
|
research-article |
1 |
|
63
|
Kim G, Park YM, Yoon HJ, Choi JH. A multi-kernel and multi-scale learning based deep ensemble model for predicting recurrence of non-small cell lung cancer. PeerJ Comput Sci 2023; 9:e1311. [PMID: 37346527 PMCID: PMC10280639 DOI: 10.7717/peerj-cs.1311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 03/06/2023] [Indexed: 06/23/2023]
Abstract
Predicting recurrence in patients with non-small cell lung cancer (NSCLC) before treatment is vital for guiding personalized medicine. Deep learning techniques have revolutionized the application of cancer informatics, including lung cancer time-to-event prediction. Most existing convolutional neural network (CNN) models are based on a single two-dimensional (2D) computational tomography (CT) image or three-dimensional (3D) CT volume. However, studies have shown that using multi-scale input and fusing multiple networks provide promising performance. This study proposes a deep learning-based ensemble network for recurrence prediction using a dataset of 530 patients with NSCLC. This network assembles 2D CNN models of various input slices, scales, and convolutional kernels, using a deep learning-based feature fusion model as an ensemble strategy. The proposed framework is uniquely designed to benefit from (i) multiple 2D in-plane slices to provide more information than a single central slice, (ii) multi-scale networks and multi-kernel networks to capture the local and peritumoral features, (iii) ensemble design to integrate features from various inputs and model architectures for final prediction. The ensemble of five 2D-CNN models, three slices, and two multi-kernel networks, using 5 × 5 and 6 × 6 convolutional kernels, achieved the best performance with an accuracy of 69.62%, area under the curve (AUC) of 72.5%, F1 score of 70.12%, and recall of 70.81%. Furthermore, the proposed method achieved competitive results compared with the 2D and 3D-CNN models for cancer outcome prediction in the benchmark studies. Our model is also a potential adjuvant treatment tool for identifying NSCLC patients with a high risk of recurrence.
Collapse
|
research-article |
2 |
|
64
|
Wang H, Ding J, Wang S, Li L, Song J, Bai D. Enhancing predictive accuracy for urinary tract infections post-pediatric pyeloplasty with explainable AI: an ensemble TabNet approach. Sci Rep 2025; 15:2455. [PMID: 39828726 PMCID: PMC11743759 DOI: 10.1038/s41598-024-82282-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Accepted: 12/04/2024] [Indexed: 01/22/2025] Open
Abstract
Ureteropelvic junction obstruction (UPJO) is a common pediatric condition often treated with pyeloplasty. Despite the surgical intervention, postoperative urinary tract infections (UTIs) occur in over 30% of cases within six months, adversely affecting recovery and increasing both clinical and economic burdens. Current prediction methods for postoperative UTIs rely on empirical judgment and limited clinical parameters, underscoring the need for a robust, multifactorial predictive model. We retrospectively analyzed data from 764 pediatric patients who underwent unilateral pyeloplasty at the Children's Hospital affiliated with the Capital Institute of Pediatrics between January 2012 and January 2023. A total of 25 clinical features were extracted, including patient demographics, medical history, surgical details, and various postoperative indicators. Feature engineering was initially performed, followed by a comparative analysis of five machine learning algorithms (Logistic Regression, SVM, Random Forest, XGBoost, and LightGBM) and the deep learning TabNet model. This comparison highlighted the respective strengths and limitations of traditional machine learning versus deep learning approaches. Building on these findings, we developed an ensemble learning model, meta-learner, that effectively integrates both methodologies, and utilized SHAP(Shapley Additive Explanation, SHAP) to complete the visualization of the integrated black-box model. Among the 764 pediatric pyeloplasty cases analyzed, 265 (34.7%) developed postoperative UTIs, predominantly within the first three months. Early UTIs significantly increased the likelihood of re-obstruction (P < 0.01), underscoring the critical impact of infection on surgical outcomes. In evaluating the performance of six algorithms, TabNet outperformed traditional models, with the order from lowest to highest as follows: Logistic Regression, SVM, Random Forest, XGBoost, LightGBM, and TabNet. Feature engineering markedly improved the predictive accuracy of traditional models, as evidenced by the enhanced performance of LightGBM (Accuracy: 0.71, AUC: 0.78 post-engineering). The proposed ensemble approach, combining LightGBM and TabNet with a Logistic Regression meta-learner, achieved superior predictive accuracy (Accuracy: 0.80, AUC: 0.80) while reducing dependence on feature engineering. SHAP analysis further revealed eGFR and ALB as significant predictors of UTIs post-pyeloplasty, providing new clinical insights into risk factors. In summary, we have introduced the first ensemble prediction model, incorporating both machine learning and deep learning (meta-learner), to predict urinary tract infections following pediatric pyeloplasty. This ensemble approach mitigates the dependency of machine learning models on feature engineering while addressing the issue of overfitting in deep learning-based models like TabNet, particularly in the context of small medical datasets. By improving prediction accuracy, this model supports proactive interventions, reduces postoperative infections and re-obstruction rates, enhances pyeloplasty outcomes, and alleviates health and economic burdens.Level of evidence IV Case series with no comparison group.
Collapse
|
research-article |
1 |
|
65
|
Hussain S, Aslam W, Mehmood A, Choi GS, Ashraf I. A machine learning based framework for IoT devices identification using web traffic. PeerJ Comput Sci 2024; 10:e1834. [PMID: 38660201 PMCID: PMC11041939 DOI: 10.7717/peerj-cs.1834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 01/02/2024] [Indexed: 04/26/2024]
Abstract
Identification of the Internet of Things (IoT) devices has become an essential part of network management to secure the privacy of smart homes and offices. With its wide adoption in the current era, IoT has facilitated the modern age in many ways. However, such proliferation also has associated privacy and data security risks. In the case of smart homes and smart offices, unknown IoT devices increase vulnerabilities and chances of data theft. It is essential to identify the connected devices for secure communication. It is very difficult to maintain the list of rules when the number of connected devices increases and human involvement is necessary to check whether any intruder device has approached the network. Therefore, it is required to automate device identification using machine learning methods. In this article, we propose an accuracy boosting model (ABM) using machine learning models of random forest and extreme gradient boosting. Featuring engineering techniques are employed along with cross-validation to accurately identify IoT devices such as lights, smoke detectors, thermostat, motion sensors, baby monitors, socket, TV, security cameras, and watches. The proposed ensemble model utilizes random forest (RF) and extreme gradient boosting (XGB) as base learners with adaptive boosting. The proposed ensemble model is tested with extensive experiments involving the IoT Device Identification dataset from a public repository. Experimental results indicate a higher accuracy of 91%, precision of 93%, recall of 93%, and F1 score of 93%.
Collapse
|
research-article |
1 |
|
66
|
Asaf MZ, Rasul H, Akram MU, Hina T, Rashid T, Shaukat A. A Modified Deep Semantic Segmentation Model for Analysis of Whole Slide Skin Images. Sci Rep 2024; 14:23489. [PMID: 39379448 PMCID: PMC11461484 DOI: 10.1038/s41598-024-71080-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 08/23/2024] [Indexed: 10/10/2024] Open
Abstract
Automated segmentation of biomedical image has been recognized as an important step in computer-aided diagnosis systems for detection of abnormalities. Despite its importance, the segmentation process remains an open challenge due to variations in color, texture, shape diversity and boundaries. Semantic segmentation often requires deeper neural networks to achieve higher accuracy, making the segmentation model more complex and slower. Due to the need to process a large number of biomedical images, more efficient and cheaper image processing techniques for accurate segmentation are needed. In this article, we present a modified deep semantic segmentation model that utilizes the backbone of EfficientNet-B3 along with UNet for reliable segmentation. We trained our model on Non-melanoma skin cancer segmentation for histopathology dataset to divide the image in 12 different classes for segmentation. Our method outperforms the existing literature with an increase in average class accuracy from 79 to 83%. Our approach also shows an increase in overall accuracy from 85 to 94%.
Collapse
|
research-article |
1 |
|
67
|
Feng Z, Zhang L, Tang N, Li X, Xing W. Ensemble modeling of aquatic plant invasions and economic cost analysis in China under climate change scenarios. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 957:177444. [PMID: 39522784 DOI: 10.1016/j.scitotenv.2024.177444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Revised: 11/04/2024] [Accepted: 11/05/2024] [Indexed: 11/16/2024]
Abstract
Pistia stratiotes, Eichhornia crassipes, Alternanthera philoxeroides, and Cabomba caroliniana are officially recognized as invasive aquatic plants in China. Accurately predicting their invasion dynamics under climate change is crucial for the future safety of aquatic ecosystems. Compared to single prediction models, ensemble models that integrate multiple algorithms provide more accurate forecasts. However, there has been a notable lack of research utilizing ensemble models to collectively predict the invasive regions of these four species in China. To address this gap, we collected and analyzed comprehensive data on species distribution, climate, altitude, population density, and the normalized difference vegetation index to accurately predict the future invasive regions and potential warnings for aquatic systems concerning these species. Our results indicate that suitable areas for invasive aquatic plants in China are primarily located in the southeastern region. Significant differences exist in the suitable habitats for each species: P. stratiotes and E. crassipes have broad distribution areas, covering most water systems in southeastern China, while C. caroliniana is concentrated in the middle and lower reaches of the Yangtze River and the estuaries of the Yangtze and Pearl Rivers. A. philoxeroides has an extensive invasion area, with the North China Plain projected to become a suitable invasion region in the future. The main factors influencing future invasions are human activities and climate change. In addition, under climate change, the suitable habitats for these invasive aquatic plants are expected to expand towards higher latitudes. We also estimated the economic costs associated with invasive aquatic plants in China using the Invacost database, revealing cumulative costs of US$5525.17 million, where damage costs (89.70%) significantly exceed management costs (10.30%). Our innovative approach, employing various ensemble algorithms and water system invasion forecasts, aims to effectively mitigate the future invasions and economic impacts of these species.
Collapse
|
|
1 |
|
68
|
Luo Z, Liu W, Wu J, Aiqing H, Guo J. Prediction of cold chain loading environment for agricultural products based on K-medoids-LSTM-XGBoost ensemble model. PeerJ Comput Sci 2024; 10:e2510. [PMID: 39896411 PMCID: PMC11784797 DOI: 10.7717/peerj-cs.2510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Accepted: 10/21/2024] [Indexed: 02/04/2025]
Abstract
Cold chain loading is a crucial aspect in the process of cold chain transportation, aiming to enhance the quality, reduce energy consumption, and minimize costs associated with cold chain logistics. To achieve these objectives, this study proposes a prediction method based on the combined model of K-medoids-long short-term memory (LSTM) networks-eXtreme Gradient Boosting (XGBoost). This ensemble model accurately predicts the temperature for a specified future time period, providing an appropriate cold chain loading environment for goods. After obtaining temperature data pertaining to the cold chain loading environment, the K-medoids algorithm is initially employed to fuse the data, which is then fed into the constructed ensemble model. The model's mean absolute error (MAE) is determined to be 2.5343. The experimental results demonstrate that the K-medoids-LSTM-XGBoost combined prediction model outperforms individual models and similar ensemble models in accurately predicting the agricultural product's cold chain loading environment. This model offers improved monitoring and management capabilities for personnel involved in the cold chain loading process.
Collapse
|
research-article |
1 |
|
69
|
Agogo GO, Mwambi H. Application of machine learning algorithms in an epidemiologic study of mortality. Ann Epidemiol 2025; 102:36-47. [PMID: 39756630 DOI: 10.1016/j.annepidem.2024.12.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 12/20/2024] [Accepted: 12/29/2024] [Indexed: 01/07/2025]
Abstract
PURPOSE Epidemiologic studies are important in assessing risk factors of mortality. Machine learning (ML) is efficient in analyzing multidimensional data to unravel dependencies between risk factors and health outcomes. METHODS Using a representative sample from the National Health and Nutrition Examination Survey data collected from 2009 to 2016 linked to the National Death Index public-use mortality data through December 31, 2019, we applied logistic, random forests, k-Nearest Neighbors, multivariate adaptive regression splines, support vector machines, extreme gradient boosting, and super learner ML algorithms to study risk factors of all-cause mortality. We evaluated the algorithms using area under the receiver operating curve (AUC-ROC), sensitivity, negative predictive value (NPV) among other metrics and interpreted the results using SHapley Additive exPlanation. RESULTS The AUC-ROC ranged from 0.80 ─ 0.87. The super learner had the highest AUC-ROC of 0.87 (95 % CI, 0.86 ─ 0.88), sensitivity of 0.86 (95 % CI, 0.84 ─ 0.88) and NPV of 0.98 (95 % CI, 0.98 ─ 0.99). Key risk factors of mortality included advanced age, larger waist circumference, male and systolic blood pressure. Being married, high annual household income, and high education level were linked with low risk of mortality. CONCLUSIONS Machine learning can be used to identify risk factors of mortality, which is critical for individualized targeted interventions in epidemiologic studies.
Collapse
|
|
1 |
|
70
|
Brar AS, Singh K. A multi-objective stacked regression method for distance based colour measuring device. Sci Rep 2024; 14:5530. [PMID: 38448462 PMCID: PMC10918078 DOI: 10.1038/s41598-024-54785-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 02/16/2024] [Indexed: 03/08/2024] Open
Abstract
Identifying colour from a distance is challenging due to the external noise associated with the measurement process. The present study focuses on developing a colour measuring system and a novel Multi-target Regression (MTR) model for accurate colour measurement from distance. Herein, a novel MTR method, referred as Multi-Objective Stacked Regression (MOSR) is proposed. The core idea behind MOSR is based on stacking as an ensemble approach with multi-objective evolutionary learning using NSGA-II. A multi-objective optimization approach is used for selecting base learners that maximises prediction accuracy while minimising ensemble complexity, which is further compared with six state-of-the-art methods over the colour dataset. Classification and regression tree (CART), Random Forest (RF) and Support Vector Machine (SVM) were used as regressor algorithms. MOSR outperformed all compared methods with the highest coefficient of determination values for all three targets of the colour dataset. Rigorous comparison with state-of-the-art methods over 18 benchmarked datasets showed MOSR outperformed in 15 datasets when CART was used as a regressor algorithm and 11 datasets when RF and SVM were used as regressor algorithms. The MOSR method was statistically superior to compared methods and can be effectively used to measure accurate colour values in the distance-based colour measuring device.
Collapse
|
research-article |
1 |
|
71
|
He C, Wu F, Fu L, Kong L, Lu Z, Qi Y, Xu H. Improving cardiovascular risk prediction with machine learning: a focus on perivascular adipose tissue characteristics. Biomed Eng Online 2024; 23:77. [PMID: 39098936 PMCID: PMC11299393 DOI: 10.1186/s12938-024-01273-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 07/24/2024] [Indexed: 08/06/2024] Open
Abstract
BACKGROUND Timely prevention of major adverse cardiovascular events (MACEs) is imperative for reducing cardiovascular diseases-related mortality. Perivascular adipose tissue (PVAT), the adipose tissue surrounding coronary arteries, has attracted increased amounts of attention. Developing a model for predicting the incidence of MACE utilizing machine learning (ML) integrating clinical and PVAT features may facilitate targeted preventive interventions and improve patient outcomes. METHODS From January 2017 to December 2019, we analyzed a cohort of 1077 individuals who underwent coronary CT scanning at our facility. Clinical features were collected alongside imaging features, such as coronary artery calcium (CAC) scores and perivascular adipose tissue (PVAT) characteristics. Logistic regression (LR), Framingham Risk Score, and ML algorithms were employed for MACE prediction. RESULTS We screened seven critical features to improve the practicability of the model. MACE patients tended to be older, smokers, and hypertensive. Imaging biomarkers such as CAC scores and PVAT characteristics differed significantly between patients with and without a 3-year MACE risk in a population that did not exhibit disparities in laboratory results. The ensemble model, which leverages multiple ML algorithms, demonstrated superior predictive performance compared with the other models. Finally, the ensemble model was used for risk stratification prediction to explore its clinical application value. CONCLUSIONS The developed ensemble model effectively predicted MACE incidence based on clinical and imaging features, highlighting the potential of ML algorithms in cardiovascular risk prediction and personalized medicine. Early identification of high-risk patients may facilitate targeted preventive interventions and improve patient outcomes.
Collapse
|
research-article |
1 |
|
72
|
Yuan W, Zuo W, Li Q, Chen W, Liu L, Li J. Modeling the current and future habitat suitability of clematis tangutica (ranunculaceae) on the qinghai-tibet plateau based on an ensemble method. ENVIRONMENTAL MONITORING AND ASSESSMENT 2024; 197:71. [PMID: 39694933 DOI: 10.1007/s10661-024-13538-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 12/09/2024] [Indexed: 12/20/2024]
Abstract
This research explores the impact of environmental changes on the distribution of Clematis tangutica, providing theoretical support for its conservation, development, utilization, and early warning monitoring of potential impacts on the ecological environment and local plant communities. An ensemble model in R was used to simulate the suitable habitats of Clematis tangutica on the Tibetan Plateau, integrating climate, topography, and soil variables. Simulations were conducted under three distinct future climate scenarios. The ensemble model exhibited superior performance, as indicated by a true skill statistic of 0.9203, compared to the individual models. Clematis tangutica primarily occupies the eastern Tibetan Plateau, with optimal habitats predominantly located in western Sichuan Province. Regions of inadequate suitability encompass approximately 69.72% of the total area (equivalent to approximately 1743 thousand square kilometers), while highly suitable areas constitute about 5.48% (equivalent to approximately 137 thousand square kilometers). In the future, as the temperature rises on the Qinghai-Tibet Plateau, overall precipitation is expected to increase, though regional differences will exist, particularly the SSP245 scenario in the 2050s, the centroid of Clematis tangutica distribution is projected to shift northwest, potentially providing favorable conditions. The distribution pattern of Clematis tangutica is strongly influenced by fluctuations in temperature and elevation, as these factors directly affect the plant's ability to thrive in specific regions. Changes in these variables may alter its future distribution, particularly under climate change scenarios. There is a tendency for the center of mass of Clematis tangutica to migrate northwest under future climatic conditions.
Collapse
|
|
1 |
|
73
|
Dai TY, Radhakrishnan P, Nweye K, Estrada R, Niyogi D, Nagy Z. Analyzing the impact of COVID-19 on the electricity demand in Austin, TX using an ensemble-model based counterfactual and 400,000 smart meters. COMPUTATIONAL URBAN SCIENCE 2023; 3:20. [PMID: 37192956 PMCID: PMC10162906 DOI: 10.1007/s43762-023-00095-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Revised: 03/17/2023] [Accepted: 03/29/2023] [Indexed: 05/18/2023]
Abstract
The COVID-19 pandemic caused lifestyle changes and has led to the new electricity demand patterns in the presence of non-pharmaceutical interventions such as work-from-home policy and lockdown. Quantifying the effect on electricity demand is critical for future electricity market planning yet challenging in the context of limited smart metered buildings, which leads to limited understanding of the temporal and spatial variations in building energy use. This study uses a large scale private smart meter electricity demand data from the City of Austin, combined with publicly available environmental data, and develops an ensemble regression model for long term daily electricity demand prediction. Using 15-min resolution data from over 400,000 smart meters from 2018 to 2020 aggregated by building type and zip code, our proposed model precisely formalizes the counterfactual universe in the without COVID-19 scenario. The model is used to understand building electricity demand changes during the pandemic and to identify relationships between such changes and socioeconomic patterns. Results indicate the increase in residential usage , demonstrating the spatial redistribution of energy consumption during the work-from-home period. Our experiments demonstrate the effectiveness of our proposed framework by assessing multiple socioeconomic impacts with the comparison between the counterfactual universe and observations.
Collapse
|
research-article |
2 |
|
74
|
Byeon DH, Lee WH. Ensemble evaluation of potential distribution of Procambarus clarkii using multiple species distribution models. Oecologia 2024; 204:589-601. [PMID: 38386057 DOI: 10.1007/s00442-024-05516-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 01/20/2024] [Indexed: 02/23/2024]
Abstract
Procambarus clarkii is a notorious invasive species that has led to ecological concerns owing to its high viability and rapid reproduction. South Korea, a country exposed to a high risk of introduction of invasive species due to active international trade, has suffered from recent massive invasions by invasive species, necessitating the evaluation of potential areas requiring intensive monitoring. In this study, we developed two different types of species distribution models, CLIMEX and random forest, for P. clarkii using occurrence records from the United States. The potential distribution in the United States was predicted along coastal lines and inland regions located below 40°N latitude The model was then applied to evaluate the potential distribution in South Korea, and an ensemble map was constructed to identify the most vulnerable domestic regions. According to both models, the domestic potential distribution was highest in most areas located at low altitudes. In the ensemble model, most of the low-altitude western regions, the eastern coast, and some southern inland regions were predicted to be suitable for the distribution of P. clarkii, and a similar distribution pattern was predicted when the model was projected into the future climate. Through this study, it is possible to secure basic data that can be used for the early monitoring of the introduction and subsequent distribution of P. clarkii.
Collapse
|
|
1 |
|
75
|
Natha P, Tera SP, Chinthaginjala R, Rab SO, Narasimhulu CV, Kim TH. Boosting skin cancer diagnosis accuracy with ensemble approach. Sci Rep 2025; 15:1290. [PMID: 39779772 PMCID: PMC11711234 DOI: 10.1038/s41598-024-84864-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Accepted: 12/27/2024] [Indexed: 01/11/2025] Open
Abstract
Skin cancer is common and deadly, hence a correct diagnosis at an early age is essential. Effective therapy depends on precise classification of the several skin cancer forms, each with special traits. Because dermoscopy and other sophisticated imaging methods produce detailed lesion images, early detection has been enhanced. It's still difficult to analyze the images to differentiate benign from malignant tumors, though. Better predictive modeling methods are needed since the diagnostic procedures used now frequently produce inaccurate and inconsistent results. In dermatology, Machine learning (ML) models are becoming essential for the automatic detection and classification of skin cancer lesions from image data. With the ensemble model, which mix several ML approaches to take use of their advantages and lessen their disadvantages, this work seeks to improve skin cancer predictions. We introduce a new method, the Max Voting method, for optimization of skin cancer classification. On the HAM10000 and ISIC 2018 datasets, we trained and assessed three distinct ML models: Random Forest (RF), Multi-layer Perceptron Neural Network (MLPN), and Support Vector Machine (SVM). Overall performance was increased by the combined predictions made with the Max Voting technique. Moreover, feature vectors that were optimally produced from image data by a Genetic Algorithm (GA) were given to the ML models. We demonstrate that the Max Voting method greatly improves predictive performance, reaching an accuracy of 94.70% and producing the best results for F1-measure, recall, and precision. The most dependable and robust approach turned out to be Max Voting, which combines the benefits of numerous pre-trained ML models to provide a new and efficient method for classifying skin cancer lesions.
Collapse
|
research-article |
1 |
|