1
|
Gracida-Osorno C, Molina-Salinas GM, Góngora-Hernández R, Brito-Loeza C, Uc-Cachón AH, Paniagua-Sierra JR. Machine Learning for Predicting Chronic Renal Disease Progression in COVID-19 Patients with Acute Renal Injury: A Feasibility Study. Biomedicines 2024; 12:1511. [PMID: 39062084 PMCID: PMC11274434 DOI: 10.3390/biomedicines12071511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 05/21/2024] [Accepted: 05/31/2024] [Indexed: 07/28/2024] Open
Abstract
This study aimed to determine the feasibility of applying machine-learning methods to assess the progression of chronic kidney disease (CKD) in patients with coronavirus disease (COVID-19) and acute renal injury (AKI). The study was conducted on patients aged 18 years or older who were diagnosed with COVID-19 and AKI between April 2020 and March 2021, and admitted to a second-level hospital in Mérida, Yucatán, México. Of the admitted patients, 47.92% died and 52.06% were discharged. Among the discharged patients, 176 developed AKI during hospitalization, and 131 agreed to participate in the study. The study's results indicated that the area under the receiver operating characteristic curve (AUC-ROC) for the four models was 0.826 for the support vector machine (SVM), 0.828 for the random forest, 0.840 for the logistic regression, and 0.841 for the boosting model. Variable selection methods were utilized to enhance the performance of the classifier, with the SVM model demonstrating the best overall performance, achieving a classification rate of 99.8% ± 0.1 in the training set and 98.43% ± 1.79 in the validation set in AUC-ROC values. These findings have the potential to aid in the early detection and management of CKD, a complication of AKI resulting from COVID-19. Further research is required to confirm these results.
Collapse
Affiliation(s)
- Carlos Gracida-Osorno
- Servicio de Medicina Interna, Hospital General Regional No. 1, CMN Ignacio García Téllez, Instituto Mexicano del Seguro Social, Mérida 97150, Mexico
| | - Gloria María Molina-Salinas
- Unidad de Investigación Médica Yucatán, Hospital de Especialidades, CMN Ignacio García Téllez, Instituto Mexicano del Seguro Social, Mérida 97150, Mexico; (G.M.M.-S.); (A.H.U.-C.)
| | - Roxana Góngora-Hernández
- Facultad de Matemáticas, Universidad Autónoma de Yucatán, Mérida 97119, Mexico; (R.G.-H.); (C.B.-L.)
| | - Carlos Brito-Loeza
- Facultad de Matemáticas, Universidad Autónoma de Yucatán, Mérida 97119, Mexico; (R.G.-H.); (C.B.-L.)
| | - Andrés Humberto Uc-Cachón
- Unidad de Investigación Médica Yucatán, Hospital de Especialidades, CMN Ignacio García Téllez, Instituto Mexicano del Seguro Social, Mérida 97150, Mexico; (G.M.M.-S.); (A.H.U.-C.)
| | - José Ramón Paniagua-Sierra
- Unidad de Investigación Médica en Enfermedades Nefrológicas, Hospital de Especialidades, CMN Siglo XXI, Instituto Mexicano del Seguro Social, México City 06720, Mexico;
| |
Collapse
|
2
|
Wu W, Fukui S. Using Human Resources Data to Predict Turnover of Community Mental Health Employees: Prediction and Interpretation of Machine Learning Methods. Int J Ment Health Nurs 2024. [PMID: 38961607 DOI: 10.1111/inm.13387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 05/09/2024] [Accepted: 06/20/2024] [Indexed: 07/05/2024]
Abstract
This study used machine learning (ML) to predict mental health employees' turnover in the following 12 months using human resources data in a community mental health centre. The data contain 621 employees' information (e.g., demographics, job information and client information served by employees) hired between 2011 and 2021 (56.5% turned over during the study period). Six ML methods (i.e., logistic regression, elastic net, random forest [RF], gradient boosting machine [GBM], neural network and support vector machine) were used to predict turnover, along with graphical and statistical tools to interpret predictive relationship patterns and potential interactions. The result suggests that RF and GBM led to better prediction according to specificity, sensitivity and area under the curve (>0.8). The turnover predictors (e.g., past work years, work hours, wage, age, exempt status, educational degree, marital status and employee type) were identified, including those that may be unique to the mental health employee population (e.g., training hours and the proportion of clients with schizophrenia diagnosis). It also revealed nonlinear and nonmonotonic predictive relationships (e.g., wage and employee age), as well as interaction effects, such that past work years interact with other variables in turnover prediction. The study indicates that ML methods showed the predictability of mental health employee turnover using human resources data. The identified predictors and the nonlinear and interactive relationships shed light on developing new predictive models for turnover that warrant further investigations.
Collapse
Affiliation(s)
- Wei Wu
- Department of Psychology, Indiana University Indianapolis, Indianapolis, Indiana, USA
| | - Sadaaki Fukui
- School of Social Work, Indiana University, Indianapolis, Indiana, USA
| |
Collapse
|
3
|
Shin S, Choi TY, Han DH, Choi B, Cho E, Seog Y, Koo BN. An explainable machine learning model to predict early and late acute kidney injury after major hepatectomy. HPB (Oxford) 2024; 26:949-959. [PMID: 38705794 DOI: 10.1016/j.hpb.2024.04.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 12/13/2023] [Accepted: 04/19/2024] [Indexed: 05/07/2024]
Abstract
BACKGROUND Risk assessment models for acute kidney injury (AKI) after major hepatectomy that differentiate between early and late AKI are lacking. This retrospective study aimed to create a model predicting AKI through machine learning and identify features that contribute to the development of early and late AKI. METHODS Patients that underwent major hepatectomy were categorized into the No-AKI, Early-AKI (within 48 h) or Late-AKI group (between 48 h and 7 days). Modeling was done with 20 perioperative features and the performance of prediction models were measured by the area under the receiver operating characteristic curve (AUROCC). Shapley Additive Explanation (SHAP) values were utilized to explain the outcome of the prediction model. RESULTS Of the 1383 patients included in this study, 1229, 110 and 44 patients were categorized into the No-AKI, Early-AKI and Late-AKI group, respectively. The CatBoost classifier exhibited the greatest AUROCC of 0.758 (95% CI: 0.671-0.847) and was found to differentiate well between Early and Late-AKI. We identified different perioperative features for predicting each outcome and found 1-year mortality to be greater for Early-AKI. CONCLUSIONS Our results suggest that risk factors are different for Early and Late-AKI after major hepatectomy, and 1-year mortality is greater for Early-AKI.
Collapse
Affiliation(s)
- Seokyung Shin
- Department of Anesthesiology and Pain Medicine, Severance Hospital, Anesthesia and Pain Research Institute, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodamun-gu, Seoul 03722, South Korea
| | - Tae Y Choi
- Department of Anesthesiology and Pain Medicine, Severance Hospital, Anesthesia and Pain Research Institute, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodamun-gu, Seoul 03722, South Korea
| | - Dai H Han
- Department of Surgery, Division of Hepato-biliary and Pancreatic Surgery, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodamun-gu, Seoul 03722, South Korea
| | - Boin Choi
- Severance Hospital, 50-1 Yonsei-ro, Seodamun-gu, Seoul 03722, South Korea
| | - Eunsung Cho
- Severance Hospital, 50-1 Yonsei-ro, Seodamun-gu, Seoul 03722, South Korea
| | - Yeong Seog
- Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodamun-gu, Seoul 03722, South Korea
| | - Bon-Nyeo Koo
- Department of Anesthesiology and Pain Medicine, Severance Hospital, Anesthesia and Pain Research Institute, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodamun-gu, Seoul 03722, South Korea.
| |
Collapse
|
4
|
Li M, Yin S, Liu Z, Zhang H. Machine learning enables electrical resistivity modeling of printed lines in aerosol jet 3D printing. Sci Rep 2024; 14:14614. [PMID: 38918598 PMCID: PMC11199662 DOI: 10.1038/s41598-024-65693-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 06/24/2024] [Indexed: 06/27/2024] Open
Abstract
Among various non-contact direct ink writing techniques, aerosol jet printing (AJP) stands out due to its distinct advantages, including a more adaptable working distance (2-5 mm) and higher resolution (~ 10 μm). These characteristics make AJP a promising technology for the precise customization of intricate electrical functional devices. However, complex interactions among the machine, process, and materials result in low controllability over the electrical performance of printed lines. This significantly affects the functionality of printed components, thereby limiting the broad applications of AJP. Therefore, a systematic machine learning approach that integrates experimental design, geometrical features extraction, and non-parametric modeling is proposed to achieve printing quality optimization and electrical resistivity prediction for the printed lines in AJP. Specifically, three classical convolutional neural networks (CNNs) architectures are compared for extracting representative features of printed lines, and an optimal operating window is identified to effectively discriminate better line morphology from inferior printed line patterns within the design space. Subsequently, three representative non-parametric machine learning techniques are employed for resistivity modeling. Following that, the modeling performances of the adopted machine learning methods were systematically compared based on four conventional evaluation metrics. Together, these aspects contribute to optimizing the printed line morphology, while simultaneously identifying the optimal resistivity model for accurate predictions in AJP.
Collapse
Affiliation(s)
- Mingdong Li
- School of Information Engineering, Suzhou University, Suzhou, 234000, China
| | - Shuai Yin
- School of Mechanical and Aerospace, Nanyang Technological University, Singapore, 639798, Singapore
| | - Zhixin Liu
- School of Mechanical and Aerospace, Nanyang Technological University, Singapore, 639798, Singapore
| | - Haining Zhang
- School of Information Engineering, Suzhou University, Suzhou, 234000, China.
- School of Mechanical and Aerospace, Nanyang Technological University, Singapore, 639798, Singapore.
| |
Collapse
|
5
|
Haruna SI, Ibrahim YE, Hassan IH, Al-shawafi A, Zhu H. Bond Strength Assessment of Normal Strength Concrete-Ultra-High-Performance Fiber Reinforced Concrete Using Repeated Drop-Weight Impact Test: Experimental and Machine Learning Technique. MATERIALS (BASEL, SWITZERLAND) 2024; 17:3032. [PMID: 38930404 PMCID: PMC11205906 DOI: 10.3390/ma17123032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 05/29/2024] [Accepted: 06/17/2024] [Indexed: 06/28/2024]
Abstract
Ultra-high-performance concrete (UHPC) has been used in building joints due to its increased strength, crack resistance, and durability, serving as a repair material. However, efficient repair depends on whether the interfacial substrate can provide adequate bond strength under various loading scenarios. The objective of this study is to investigate the bonding behavior of composite U-shaped normal strength concrete-ultra-high-performance fiber reinforced concrete (NSC-UHPFRC) specimens using multiple drop-weight impact testing techniques. The composite interface was treated using grooving (Gst), natural fracture (Nst), and smoothing (Sst) techniques. Ensemble machine learning (ML) algorithms comprising XGBoost and CatBoost, support vector machine (SVM), and generalized linear machine (GLM) were employed to train and test the simulation dataset to forecast the impact failure strength (N2) composite U-shaped NSC-UHPFRC specimen. The results indicate that the reference NSC samples had the highest impact strength and surface treatment played a substantial role in ensuring the adequate bond strength of NSC-UHPFRC. NSC-UHPFRC-Nst can provide sufficient bond strength at the interface, resulting in a monolithic structure that can resist repeated drop-weight impact loads. NSC-UHPFRC-Sst and NSC-UHPFRC-Gst exhibit significant reductions in impact strength properties. The ensemble ML correctly predicts the failure strength of the NSC-UHPFRC composite. The XGBoost ensemble model gave coefficient of determination (R2) values of approximately 0.99 and 0.9643 at the training and testing stages. The highest predictions were obtained using the GLM model, with an R2 value of 0.9805 at the testing stage.
Collapse
Affiliation(s)
- Sadi I. Haruna
- Engineering Management Department, College of Engineering, Prince Sultan University, Riyadh 11586, Saudi Arabia; (Y.E.I.); (H.Z.)
| | - Yasser E. Ibrahim
- Engineering Management Department, College of Engineering, Prince Sultan University, Riyadh 11586, Saudi Arabia; (Y.E.I.); (H.Z.)
| | | | - Ali Al-shawafi
- School of Civil Engineering, Tianjin University, Tianjin 300350, China
| | - Han Zhu
- Engineering Management Department, College of Engineering, Prince Sultan University, Riyadh 11586, Saudi Arabia; (Y.E.I.); (H.Z.)
| |
Collapse
|
6
|
Liu B, Guo B, Zhuo R, Dai F. Estimation of soil organic carbon in LUCAS soil database using Vis-NIR spectroscopy based on hybrid kernel Gaussian process regression. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 321:124687. [PMID: 38909558 DOI: 10.1016/j.saa.2024.124687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 06/02/2024] [Accepted: 06/18/2024] [Indexed: 06/25/2024]
Abstract
Soil Organic Carbon (SOC) is crucial for determining soil fertility and environmental quality. The problem with traditional SOC chemical analysis methods is that they are time-consuming and resource-intensive. In recent years, visible-near infrared (Vis-NIR) spectroscopy has been employed as an alternative method for SOC determination. However, when applied on a larger scale, the prediction accuracy of soil properties decreases due to the heterogeneity of samples. Therefore, this study compared and analyzed the performance of partial least squares regression (PLSR), support vector regression (SVR), random forest (RF), and gaussian process regression (GPR) in predicting SOC. On this basis, a GPR model based on a hybrid kernel function (HKF-GPR) was proposed for SOC prediction. This hybrid kernel function was designed according to the properties of single kernel functions and the characteristics of soil spectral data. Results indicate that in large soil spectral databases, the GPR model outperforms other models in estimating SOC. The HKF-GPR model achieved the best SOC estimation accuracy, with an R2 of 0.7671, RMSE of 5.2934 g/kg, RPD of 2.0721, and RPIQ of 2.5789. Compared to other regression models, the HKF-GPR model proposed in this paper offers broader applicability and superior performance, enabling SOC estimation in large soil spectral libraries.
Collapse
Affiliation(s)
- Baoyang Liu
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China; Communications Information Transmission and Convergence Technology Laboratory, Hangzhou 310018, China.
| | - Baofeng Guo
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China; Communications Information Transmission and Convergence Technology Laboratory, Hangzhou 310018, China.
| | - Renxiong Zhuo
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China; Communications Information Transmission and Convergence Technology Laboratory, Hangzhou 310018, China.
| | - Fan Dai
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China; Communications Information Transmission and Convergence Technology Laboratory, Hangzhou 310018, China.
| |
Collapse
|
7
|
Woodhouse AW, Kocaarslan A, Garden JA, Mutlu H. Unlocking the Potential of Polythioesters. Macromol Rapid Commun 2024:e2400260. [PMID: 38824417 DOI: 10.1002/marc.202400260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 05/20/2024] [Indexed: 06/03/2024]
Abstract
As the demand for sustainable polymers increases, most research efforts have focused on polyesters, which can be bioderived and biodegradable. Yet analogous polythioesters, where one of the oxygen atoms has been replaced by a sulfur atom, remain a relatively untapped source of potential. The incorporation of sulfur allows the polymer to exhibit a wide range of favorable properties, such as thermal resistance, degradability, and high refractive index. Polythioester synthesis represents a frontier in research, holding the promise of paving the way for eco-friendly alternatives to conventional polyesters. Moreover, polythioester research can also open avenues to the development of sustainable and recyclable materials. In the last 25 years, many methods to synthesize polythioesters have been developed. However, to date no industrial synthesis of polythioesters has been developed due to challenges of costs, yields, and the toxicity of the by-products. This review will summarize the recent advances in polythioester synthesis, covering step-growth polymerization, ring-opening polymerization (ROP), and biosynthesis. Crucially, the benefits and challenges of the processes will be highlighted, paying particular attention to their sustainability, with the aim of encouraging further exploration and research into the fast-growing field of polythioesters.
Collapse
Affiliation(s)
- Adam W Woodhouse
- Institut de Science des Matériaux de Mulhouse, UMR 7361 CNRS/Université de Haute Alsace, 15 Rue Jean Starcky, Mulhouse, Cedex, 68057, France
- School of Chemistry, Joseph Black Building, David Brewster Road, Edinburgh, EH9 3FJ, UK
| | - Azra Kocaarslan
- Institute of Chemical Technology and Polymer Chemistry, Karlsruhe Institute of Technology, Engesserstrasee 15, 76131, Karlsruhe, Germany
| | - Jennifer A Garden
- School of Chemistry, Joseph Black Building, David Brewster Road, Edinburgh, EH9 3FJ, UK
| | - Hatice Mutlu
- Institut de Science des Matériaux de Mulhouse, UMR 7361 CNRS/Université de Haute Alsace, 15 Rue Jean Starcky, Mulhouse, Cedex, 68057, France
| |
Collapse
|
8
|
Kushwaha NL, Kudnar NS, Vishwakarma DK, Subeesh A, Jatav MS, Gaddikeri V, Ahmed AA, Abdelaty I. Stacked hybridization to enhance the performance of artificial neural networks (ANN) for prediction of water quality index in the Bagh river basin, India. Heliyon 2024; 10:e31085. [PMID: 38784559 PMCID: PMC11112320 DOI: 10.1016/j.heliyon.2024.e31085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 05/03/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open
Abstract
Water quality assessment is paramount for environmental monitoring and resource management, particularly in regions experiencing rapid urbanization and industrialization. This study introduces Artificial Neural Networks (ANN) and its hybrid machine learning models, namely ANN-RF (Random Forest), ANN-SVM (Support Vector Machine), ANN-RSS (Random Subspace), ANN-M5P (M5 Pruned), and ANN-AR (Additive Regression) for water quality assessment in the rapidly urbanizing and industrializing Bagh River Basin, India. The Relief algorithm was employed to select the most influential water quality input parameters, including Nitrate (NO3-), Magnesium (Mg2+), Sulphate (SO42-), Calcium (Ca2+), and Potassium (K+). The comparative analysis of developed ANN and its hybrid models was carried out using statistical indicators (i.e., Nash-Sutcliffe Efficiency (NSE), Pearson Correlation Coefficient (PCC), Coefficient of Determination (R2), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Root Square Error (RRSE), Relative Absolute Error (RAE), and Mean Bias Error (MBE)) and graphical representations (i.e., Taylor diagram). Results indicate that the integration of support vector machine (SVM) with ANN significantly improves performance, yielding impressive statistical indicators: NSE (0.879), R2 (0.904), MAE (22.349), and MBE (12.548). The methodology outlined in this study can serve as a template for enhancing the predictive capabilities of ANN models in various other environmental and ecological applications, contributing to sustainable development and safeguarding natural resources.
Collapse
Affiliation(s)
- Nand Lal Kushwaha
- Department of Soil and Water Engineering, Punjab Agricultural University Ludhiana, Punjab, 141004, India
- Division of Agricultural Engineering, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India
| | - Nanabhau S. Kudnar
- Department of Geography, C. J. Patel College Tirora, Gondia, Maharashtra, 441911, India
| | - Dinesh Kumar Vishwakarma
- Department of Irrigation and Drainage Engineering, G.B. Pant University of Agriculture and Technology, Pantnagar, Uttarakhand, 263145, India
| | - A. Subeesh
- ICAR- Central Institute of Agricultural Engineering, Bhopal, Madhya Pradesh, 462038, India
| | - Malkhan Singh Jatav
- National Institute of Hydrology, North Western Regional Centre, Jodhpur, Rajasthan, 342003, India
| | - Venkatesh Gaddikeri
- Division of Agricultural Engineering, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India
| | - Ashraf A. Ahmed
- Department of Civil and Environmental Engineering, Brunel University London, Kingston Lane, Uxbridge UB38PH, UK
| | - Ismail Abdelaty
- Water and Water Structures Engineering Department, Faculty of Engineering, Zagazig University, Zagazig, 44519, Egypt
| |
Collapse
|
9
|
Fang Z, Ke H, Ma Y, Zhao S, Zhou R, Ma Z, Liu Z. Design optimization of groundwater circulation well based on numerical simulation and machine learning. Sci Rep 2024; 14:11506. [PMID: 38769108 PMCID: PMC11106317 DOI: 10.1038/s41598-024-62545-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Accepted: 05/17/2024] [Indexed: 05/22/2024] Open
Abstract
The optimal design of groundwater circulation wells (GCWs) is challenging. The key to purifying groundwater using this technique is its proficiency and productivity. However, traditional numerical simulation methods are limited by long modeling times, random optimization schemes, and optimization results that are not comprehensive. To address these issues, this study introduced an innovative approach for the optimal design of a GCW using machine learning methods. The FloPy package was used to create and implement the MODFLOW and MODPATH models. Subsequently, the formulated models were employed to calculate the characteristic indicators of the effectiveness of the GCW operation, including the radius of influence (R) and the ratio of particle recovery (Pr). A detailed collection of 3000 datasets, including measures of operational efficiency and key elements in machine learning, was meticulously compiled into documents through model execution. The optimization models were trained and evaluated using multiple linear regression (MLR), artificial neural networks (ANN), and support vector machines (SVM). The models produced by the three approaches exhibited notable correlations between anticipated outcomes and datasets. For the optimal design of circulating well parameters, machine learning methods not only improve the optimization speed, but also expand the scope of parameter optimization. Consequently, these models were applied to optimize the configuration of the GCW at a site in Xi'an. The optimal scheme for R (Q = 293.17 m3/d, a = 6.09 m, L = 7.28 m) and optimal scheme for Pr (Q = 300 m3/d, a = 3.64 m, L = 1 m) were obtained. The combination of numerical simulations and machine learning is an effective tool for optimizing and predicting the GCW remediation effect.
Collapse
Affiliation(s)
- Zhang Fang
- Key Laboratory of Groundwater Resources and Environment, Ministry of Education, Jilin University, Changchun, 130021, People's Republic of China.
| | - Hao Ke
- Key Laboratory of Groundwater Resources and Environment, Ministry of Education, Jilin University, Changchun, 130021, People's Republic of China
| | - Yanling Ma
- Key Laboratory of Groundwater Resources and Environment, Ministry of Education, Jilin University, Changchun, 130021, People's Republic of China
| | - Siyuan Zhao
- Key Laboratory of Groundwater Resources and Environment, Ministry of Education, Jilin University, Changchun, 130021, People's Republic of China
| | - Rui Zhou
- Key Laboratory of Groundwater Resources and Environment, Ministry of Education, Jilin University, Changchun, 130021, People's Republic of China
| | - Zhe Ma
- Key Laboratory of Groundwater Resources and Environment, Ministry of Education, Jilin University, Changchun, 130021, People's Republic of China
| | - Zhiguo Liu
- Key Laboratory of Groundwater Resources and Environment, Ministry of Education, Jilin University, Changchun, 130021, People's Republic of China
| |
Collapse
|
10
|
Yang D, Zhou Y, Jie Y, Li Q, Shi T. Non-destructive detection of defective maize kernels using hyperspectral imaging and convolutional neural network with attention module. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 313:124166. [PMID: 38493512 DOI: 10.1016/j.saa.2024.124166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 03/04/2024] [Accepted: 03/14/2024] [Indexed: 03/19/2024]
Abstract
Rapid, effective and non-destructive detection of the defective maize kernels is crucial for their high-quality storage in granary. Hyperspectral imaging (HSI) coupled with convolutional neural network (CNN) based on spectral and spatial attention (Spl-Spal-At) module was proposed for identifying the different types of maize kernels. The HSI data within 380-1000 nm of six classes of sprouted, heat-damaged, insect-damaged, moldy, broken and healthy kernels was collected. The CNN-Spl-At, CNN-Spal-At and CNN-Spl-Spal-At models were established based on the spectra, images and their fusion features as inputs for the recognition of different kernels. Further compared the performances of proposed models and conventional models were built by support vector machine (SVM) and extreme learning machine (ELM). The results indicated that the recognition ability of CNN with attention series models was significantly better than that of SVM and ELM models and fused features were more conducive to expressing the appearance of different kernels than single features. And the CNN-Spl-Spal-At model had an optimal recognition result with high average classification accuracy of 98.04 % and 94.56 % for the training and testing sets, respectively. The recognition results were visually presented on the surface image of kernels with different colors. The CNN-Spl-Spal-At model was built in this study could effectively detect defective maize kernels, and it also had great potential to provide the analysis approaches for the development of non-destructive testing equipment based on HSI technique for maize quality.
Collapse
Affiliation(s)
- Dong Yang
- Academy of National Food and Strategic Reserves Administration, Beijing 100037, China; National Engineering Research Center of Grain Storage and Logistics, Beijing 100037, China
| | - Yuxing Zhou
- Academy of National Food and Strategic Reserves Administration, Beijing 100037, China; National Engineering Research Center of Grain Storage and Logistics, Beijing 100037, China
| | - Yu Jie
- Academy of National Food and Strategic Reserves Administration, Beijing 100037, China; National Engineering Research Center of Grain Storage and Logistics, Beijing 100037, China
| | - Qianqian Li
- Academy of National Food and Strategic Reserves Administration, Beijing 100037, China; National Engineering Research Center of Grain Storage and Logistics, Beijing 100037, China
| | - Tianyu Shi
- Academy of National Food and Strategic Reserves Administration, Beijing 100037, China; National Engineering Research Center of Grain Storage and Logistics, Beijing 100037, China.
| |
Collapse
|
11
|
Bui TBC, Iida D, Kitamura Y, Kokawa M. Utilization of multiple-dilution fluorescence fingerprint facilitates prediction of chemical attributes in spice extracts. Food Chem 2024; 438:138028. [PMID: 38091861 DOI: 10.1016/j.foodchem.2023.138028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 11/14/2023] [Indexed: 12/28/2023]
Abstract
Fluorescence Fingerprint (FF) is a powerful tool for rapid quality assessment of various foods and plant-derived products. However, the conventional utilization of FFs measured at a single dilution level (DL) to substitute chemical analyses is extremely challenging, especially for multicomponent materials like spice extracts because fluorescence intensity and concentration widely differ between components, with complex phenomena like inner filter effects. Here, we proposed a new strategy to use the meta-data comprised of FFs measured at multiple DLs with machine learning to estimate common chemical attributes including total polyphenol and flavonoid contents, and antioxidant abilities. This strategy achieved more consistently satisfactory performance in estimation of all chemical attributes of spice extracts compared to using a single DL. Hence, the workflow employed in this study is expected to serve as an alternative method to quickly evaluate the chemical quality of spice extracts, as well as other plant products and food materials.
Collapse
Affiliation(s)
- Thi Bao Chau Bui
- Graduate School of Science and Technology, University of Tsukuba, Ibaraki, Japan; Institute of Life and Environmental Sciences, University of Tsukuba, Ibaraki, Japan; Japan Society for the Promotion of Science (PD), Ibaraki, Japan
| | - Daiki Iida
- Graduate School of Science and Technology, University of Tsukuba, Ibaraki, Japan
| | - Yutaka Kitamura
- Institute of Life and Environmental Sciences, University of Tsukuba, Ibaraki, Japan
| | - Mito Kokawa
- Institute of Life and Environmental Sciences, University of Tsukuba, Ibaraki, Japan.
| |
Collapse
|
12
|
Chappel JR, Kirkwood-Donelson KI, Reif DM, Baker ES. From big data to big insights: statistical and bioinformatic approaches for exploring the lipidome. Anal Bioanal Chem 2024; 416:2189-2202. [PMID: 37875675 PMCID: PMC10954412 DOI: 10.1007/s00216-023-04991-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 10/01/2023] [Accepted: 10/05/2023] [Indexed: 10/26/2023]
Abstract
The goal of lipidomic studies is to provide a broad characterization of cellular lipids present and changing in a sample of interest. Recent lipidomic research has significantly contributed to revealing the multifaceted roles that lipids play in fundamental cellular processes, including signaling, energy storage, and structural support. Furthermore, these findings have shed light on how lipids dynamically respond to various perturbations. Continued advancement in analytical techniques has also led to improved abilities to detect and identify novel lipid species, resulting in increasingly large datasets. Statistical analysis of these datasets can be challenging not only because of their vast size, but also because of the highly correlated data structure that exists due to many lipids belonging to the same metabolic or regulatory pathways. Interpretation of these lipidomic datasets is also hindered by a lack of current biological knowledge for the individual lipids. These limitations can therefore make lipidomic data analysis a daunting task. To address these difficulties and shed light on opportunities and also weaknesses in current tools, we have assembled this review. Here, we illustrate common statistical approaches for finding patterns in lipidomic datasets, including univariate hypothesis testing, unsupervised clustering, supervised classification modeling, and deep learning approaches. We then describe various bioinformatic tools often used to biologically contextualize results of interest. Overall, this review provides a framework for guiding lipidomic data analysis to promote a greater assessment of lipidomic results, while understanding potential advantages and weaknesses along the way.
Collapse
Affiliation(s)
- Jessie R Chappel
- Bioinformatics Research Center, Department of Biological Sciences, North Carolina State University, Raleigh, NC, 27606, USA
| | - Kaylie I Kirkwood-Donelson
- Immunity, Inflammation, and Disease Laboratory, Division of Intramural Research, National Institute of Environmental Health Sciences, Durham, NC, 27709, USA
| | - David M Reif
- Predictive Toxicology Branch, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Durham, NC, 27709, USA.
| | - Erin S Baker
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514, USA.
| |
Collapse
|
13
|
Grelet C, Larsen T, Crowe MA, Wathes DC, Ferris CP, Ingvartsen KL, Marchitelli C, Becker F, Vanlierde A, Leblois J, Schuler U, Auer FJ, Köck A, Dale L, Sölkner J, Christophe O, Hummel J, Mensching A, Fernández Pierna JA, Soyeurt H, Calmels M, Reding R, Gelé M, Chen Y, Gengler N, Dehareng F. Prediction of key milk biomarkers in dairy cows through milk mid-infrared spectra and international collaborations. J Dairy Sci 2024; 107:1669-1684. [PMID: 37863287 DOI: 10.3168/jds.2023-23843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 09/23/2023] [Indexed: 10/22/2023]
Abstract
At the individual cow level, suboptimum fertility, mastitis, negative energy balance, and ketosis are major issues in dairy farming. These problems are widespread on dairy farms and have an important economic impact. The objectives of this study were (1) to assess the potential of milk mid-infrared (MIR) spectra to predict key biomarkers of energy deficit (citrate, isocitrate, glucose-6 phosphate [glucose-6P], free glucose), ketosis (β-hydroxybutyrate [BHB] and acetone), mastitis (N-acetyl-β-d-glucosaminidase activity [NAGase] and lactate dehydrogenase), and fertility (progesterone); (2) to test alternative methodologies to partial least squares (PLS) regression to better account for the specific asymmetric distribution of the biomarkers; and (3) to create robust models by merging large datasets from 5 international or national projects. Benefiting from this international collaboration, the dataset comprised a total of 9,143 milk samples from 3,758 cows located in 589 herds across 10 countries and represented 7 breeds. The samples were analyzed by reference chemistry for biomarker contents, whereas the MIR analyses were performed on 30 instruments from different models and brands, with spectra harmonized into a common format. Four quantitative methodologies were evaluated to address the strongly skewed distribution of some biomarkers. Partial least squares regression was used as the reference basis, and compared with a random modification of distribution associated with PLS (random-downsampling-PLS), an optimized modification of distribution associated with PLS (KennardStone-downsampling-PLS), and support vector machine (SVM). When the ability of MIR to predict biomarkers was too low for quantification, different qualitative methodologies were tested to discriminate low versus high values of biomarkers. For each biomarker, 20% of the herds were randomly removed within all countries to be used as the validation dataset. The remaining 80% of herds were used as the calibration dataset. In calibration, the 3 alternative methodologies outperform the PLS performances for the majority of biomarkers. However, in the external herd validation, PLS provided the best results for isocitrate, glucose-6P, free glucose, and lactate dehydrogenase (coefficient of determination in external herd validation [R2v] = 0.48, 0.58, 0.28, and 0.24, respectively). For other molecules, PLS-random-downsampling and PLS-KennardStone-downsampling outperformed PLS in the majority of cases, but the best results were provided by SVM for citrate, BHB, acetone, NAGase, and progesterone (R2v = 0.94, 0.58, 0.76, 0.68, and 0.15, respectively). Hence, PLS and SVM based on the entire dataset provided the best results for normal and skewed distributions, respectively. Complementary to the quantitative methods, the qualitative discriminant models enabled the discrimination of high and low values for BHB, acetone, and NAGase with a global accuracy around 90%, and glucose-6P with an accuracy of 83%. In conclusion, MIR spectra of milk can enable quantitative screening of citrate as a biomarker of energy deficit and discrimination of low and high values of BHB, acetone, and NAGase, as biomarkers of ketosis and mastitis. Finally, progesterone could not be predicted with sufficient accuracy from milk MIR spectra to be further considered. Consequently, MIR spectrometry can bring valuable information regarding the occurrence of energy deficit, ketosis, and mastitis in dairy cows, which in turn have major influences on their fertility and survival.
Collapse
Affiliation(s)
- C Grelet
- Walloon Agricultural Research Center (CRA-W), Gembloux, Belgium, 5030
| | - T Larsen
- Department of Animal and Veterinary Sciences, Aarhus University, Tjele, Denmark, DK-8830
| | - M A Crowe
- University College Dublin (UCD), Dublin, Ireland, D04 C1P1
| | - D C Wathes
- Royal Veterinary College (RVC), London, United Kingdom, CM24 1RW
| | - C P Ferris
- Agri-Food and Biosciences Institute (AFBI), Belfast, Northern Ireland, BT9 5PX
| | - K L Ingvartsen
- Department of Animal and Veterinary Sciences, Aarhus University, Tjele, Denmark, DK-8830
| | - C Marchitelli
- Research Center for Animal Production and Aquaculture (CREA), Roma, Italy, 00184
| | - F Becker
- Leibniz Institute for Farm Animal Biology (FBN), Dummerstorf, Germany, 18196
| | - A Vanlierde
- Walloon Agricultural Research Center (CRA-W), Gembloux, Belgium, 5030
| | - J Leblois
- EEIG European Milk Recording (EMR), Ciney, Belgium, 5590
| | | | - F J Auer
- LKV-Austria, Vienna, Austria, A-1200
| | - A Köck
- ZuchtData, Vienna, Austria, A-1200
| | - L Dale
- LKV Baden Württemberg, Stuttgart, Germany, D-70190
| | - J Sölkner
- University of Natural Resources and Life Sciences, Vienna, Austria, A-1180
| | - O Christophe
- Walloon Agricultural Research Center (CRA-W), Gembloux, Belgium, 5030
| | - J Hummel
- University of Göttingen, Göttingen, Germany, D-37075
| | - A Mensching
- University of Göttingen, Göttingen, Germany, D-37075
| | | | - H Soyeurt
- University of Liège, Gembloux Agro-Bio Tech (Ulg-GxABT), Gembloux, Belgium, 5030
| | - M Calmels
- Seenovia, Saint Berthevin, France, 53940
| | - R Reding
- Convis, Ettelbruck, Luxembourg, 9085
| | - M Gelé
- Idele, Paris, France, 75012
| | - Y Chen
- University of Liège, Gembloux Agro-Bio Tech (Ulg-GxABT), Gembloux, Belgium, 5030
| | - N Gengler
- University of Liège, Gembloux Agro-Bio Tech (Ulg-GxABT), Gembloux, Belgium, 5030
| | - F Dehareng
- Walloon Agricultural Research Center (CRA-W), Gembloux, Belgium, 5030.
| |
Collapse
|
14
|
AlHarkan K, Sultana N, Al Mulhim N, AlAbdulKader AM, Alsafwani N, Barnawi M, Alasqah K, Bazuhair A, Alhalwah Z, Bokhamseen D, Aljameel SS, Alamri S, Alqurashi Y, Ghamdi KA. Artificial intelligence approaches for early detection of neurocognitive disorders among older adults. Front Comput Neurosci 2024; 18:1307305. [PMID: 38444404 PMCID: PMC10913197 DOI: 10.3389/fncom.2024.1307305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 01/29/2024] [Indexed: 03/07/2024] Open
Abstract
Introduction Dementia is one of the major global health issues among the aging population, characterized clinically by a progressive decline in higher cognitive functions. This paper aims to apply various artificial intelligence (AI) approaches to detect patients with mild cognitive impairment (MCI) or dementia accurately. Methods Quantitative research was conducted to address the objective of this study using randomly selected 343 Saudi patients. The Chi-square test was conducted to determine the association of the patient's cognitive function with various features, including demographical and medical history. Two widely used AI algorithms, logistic regression and support vector machine (SVM), were used for detecting cognitive decline. This study also assessed patients' cognitive function based on gender and developed the predicting models for males and females separately. Results Fifty four percent of patients have normal cognitive function, 34% have MCI, and 12% have dementia. The prediction accuracies for all the developed models are greater than 71%, indicating good prediction capability. However, the developed SVM models performed the best, with an accuracy of 93.3% for all patients, 94.4% for males only, and 95.5% for females only. The top 10 significant predictors based on the developed SVM model are education, bedtime, taking pills for chronic pain, diabetes, stroke, gender, chronic pains, coronary artery diseases, and wake-up time. Conclusion The results of this study emphasize the higher accuracy and reliability of the proposed methods in cognitive decline prediction that health practitioners can use for the early detection of dementia. This research can also stipulate substantial direction and supportive intuitions for scholars to enhance their understanding of crucial research, emerging trends, and new developments in future cognitive decline studies.
Collapse
Affiliation(s)
- Khalid AlHarkan
- Department of Family and Community Medicine, College of Medicine, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Nahid Sultana
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Noura Al Mulhim
- Department of Physiology, College of Medicine, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Assim M. AlAbdulKader
- Department of Family and Community Medicine, College of Medicine, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Noor Alsafwani
- Department of Pathology, College of Medicine, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Marwah Barnawi
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Khulud Alasqah
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Anhar Bazuhair
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Zainab Alhalwah
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Dina Bokhamseen
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Sumayh S. Aljameel
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Sultan Alamri
- Department of Family Medicine, College of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Yousef Alqurashi
- Respiratory Care Department, College of Applied Medical Sciences, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Kholoud Al Ghamdi
- Department of Physiology, College of Medicine, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| |
Collapse
|
15
|
Zhang L, Ye L, Wang F, Gao W, Yu J, Zhang L. Prediction of Hydrogen Abstraction Rate Constants at the Allylic Site between Alkenes and OH with Multiple Machine Learning Models. J Phys Chem A 2024; 128:761-772. [PMID: 38237153 DOI: 10.1021/acs.jpca.3c06917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2024]
Abstract
Hydrogen abstraction reactions between hydrocarbons and hydroxyl radicals are important propagation steps in radical chain reactions, playing a crucial role in atmospheric and combustion chemistry. This study focuses on predicting the rate constants of the prototype of the reaction class of hydrogen abstractions, i.e., the primary allylic hydrogen abstraction from alkenes by the OH radical, via utilizing machine learning (ML) methods. Specifically, three distinct models, namely, feedforward neural network (FNN), support vector regression (SVR), and Gaussian process regression (GPR), have been employed to construct robust ML models for prediction. We proposed a novel strategy that seamlessly integrates descriptor preprocessing, a pairwise linear correlation analysis, and a model-specific Wrapper method to enhance the effectiveness of the feature selection procedure. The selected feature subset was then evaluated using two cross-validation techniques, i.e., leave-one-group-out (LOGO) and K-fold cross-validations, for each of the three ML models (FNN, SVR, and GPR) to assess their predictive and stability performance. The results demonstrate that the FNN model, trained with seven representative descriptors, achieves superior performance compared to the other two methods. For the FNN model, the average percentage deviation is 39.06% on the test set by performing LOGO cross-validation, while the repeated 10-fold cross-validation achieves a percentage prediction deviation of 19.1%. Two larger alkenes with 10 carbons were selected to test the prediction performance of the trained FNN model on primary allylic hydrogen abstraction. Results show that the kinetic predictions follow well the modified three-parameter Arrhenius equation, indicating the reliable performance of FNN in predicting hydrogen abstraction rate constants, especially for the primary allylic site. Hopefully, this work can shed useful light on the application of ML in generating chemical kinetic parameters of hydrocarbon combustion chemistry.
Collapse
Affiliation(s)
- Lei Zhang
- School of Chemical Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Lili Ye
- School of Chemical Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Fan Wang
- School of Chemical Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Wei Gao
- State Key Laboratory of Fire Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Jinhui Yu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, Hubei 430074, China
| | - Lidong Zhang
- National Synchrotron Radiation Laboratory, University of Science and Technology of China, Hefei, Anhui 230026, China
| |
Collapse
|
16
|
Uddin S, Lu H. Dataset meta-level and statistical features affect machine learning performance. Sci Rep 2024; 14:1670. [PMID: 38238551 PMCID: PMC10796674 DOI: 10.1038/s41598-024-51825-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 01/09/2024] [Indexed: 01/22/2024] Open
Abstract
What dataset features affect machine learning (ML) performance has primarily been unknown in the current literature. This study examines the impact of tabular datasets' different meta-level and statistical features on the performance of various ML algorithms. The three meta-level features this study considered are the dataset size, the number of attributes and the ratio between the positive (class 1) and negative (class 0) class instances. It considered four statistical features for each dataset: mean, standard deviation, skewness and kurtosis. After applying the required scaling, this study averaged (uniform and weighted) each dataset's different attributes to quantify its four statistical features. We analysed 200 open-access tabular datasets from the Kaggle (147) and UCI Machine Learning Repository (53) and developed ML classification models (through classification implementation and hyperparameter tuning) for each dataset. Then, this study developed multiple regression models to explore the impact of dataset features on ML performance. We found that kurtosis has a statistically significant negative effect on the accuracy of the three non-tree-based ML algorithms of the Support vector machine (SVM), Logistic regression (LR) and K-nearest neighbour (KNN) for their classical implementation with both uniform and weighted aggregations. This study observed similar findings in most cases for ML implementations through hyperparameter tuning, except for SVM with weighted aggregation. Meta-level and statistical features barely show any statistically significant impact on the accuracy of the two tree-based ML algorithms (Decision tree and Random forest), except for implementation through hyperparameter tuning for the weighted aggregation. When we excluded some datasets based on the imbalanced statistics and a significantly higher contribution of one attribute compared to others to the classification performance, we found a significant effect of the meta-level ratio feature and statistical mean and standard deviation features on SVM, LR and KNN accuracy in many cases. Our findings open a new research direction in understanding how dataset characteristics affect ML performance and will help researchers select appropriate ML algorithms for a possible optimal accuracy outcome.
Collapse
Affiliation(s)
- Shahadat Uddin
- School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, NSW, 2037, Australia.
| | - Haohui Lu
- School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, NSW, 2037, Australia
| |
Collapse
|
17
|
Sun W, Mo Z, Li Y, Xiao J, Jia L, Huang S, Liao C, Du J, He S, Chen L, Zhang W, Yang X. Machine learning-based ensemble prediction model for the gamma passing rate of VMAT-SBRT plan. Phys Med 2024; 117:103204. [PMID: 38154373 DOI: 10.1016/j.ejmp.2023.103204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 10/29/2023] [Accepted: 12/21/2023] [Indexed: 12/30/2023] Open
Abstract
PURPOSE The purpose of this study was to accurately predict or classify the beam GPR with an ensemble model by using machine learning for SBRT(VMAT) plans. METHODS A total of 128 SBRT VMAT plans with 330 arc beams were retrospectively selected, and 216 radiomics and 34 plan complexity features were calculated for each arc beam. Three models for GPR prediction and classification using support vector machine algorithm were as follows: (1) plan complexity feature-based model (plan model); (2) radiomics feature-based model (radiomics model); and (3) an ensemble model combining the two models (ensemble model). The prediction performance was evaluated by calculating the mean absolute error (MAE), root mean square error (RMSE), and Spearman's correlation coefficient (SC), and the classification performance was measured by calculating the area under the receiver operating characteristic curve (AUC), accuracy, specificity, and sensitivity. RESULTS The MAE, RMSE and SC at the 2 %/2 mm gamma criterion in the test dataset were 1.4 %, 2.57 %, and 0.563, respectively, for the plan model; 1.42 %, and 2.51 %, and 0.508, respectively, for the radiomics model; and 1.33 %, 2.49 %, and 0.611, respectively, for the ensemble model. The accuracy, specificity, sensitivity, and AUC at the 2 %/2 mm gamma criterion in the test dataset were 0.807, 0.824, 0.681, and 0.854, respectively, for the plan model; 0.860, 0.893, 0.624, and 0.877, respectively, for the radiomics model; and 0.852, 0.871, 0.710, and 0.896, respectively, for the ensemble model. CONCLUSIONS The ensemble model can improve the prediction and classification performance for the GPR of SBRT (VMAT).
Collapse
Affiliation(s)
- Wenzhao Sun
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China; Guangdong Esophageal Cancer Institute, Guangzhou, China.
| | - Zijie Mo
- Shenzhen United Imaging Research Institute of Innovative Medical Equipment, Shenzhen, China
| | - Yongbao Li
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Jifeng Xiao
- Shenzhen United Imaging Research Institute of Innovative Medical Equipment, Shenzhen, China
| | - Lecheng Jia
- Shenzhen United Imaging Research Institute of Innovative Medical Equipment, Shenzhen, China
| | - Sijuan Huang
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Can Liao
- Shanghai United Imaging Healthcare Co., Ltd., Shanghai, China
| | - Jinlong Du
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Shumeng He
- United Imaging Research Institute of Intelligent Imaging, Beijing, China
| | - Li Chen
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Wei Zhang
- Shanghai United Imaging Healthcare Co., Ltd., Shanghai, China
| | - Xin Yang
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| |
Collapse
|
18
|
Ghosh SK, Khandoker AH. A machine learning driven monogram for predicting chronic kidney disease stages 3-5. Sci Rep 2023; 13:21613. [PMID: 38062134 PMCID: PMC10703939 DOI: 10.1038/s41598-023-48815-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 11/30/2023] [Indexed: 12/18/2023] Open
Abstract
Chronic kidney disease (CKD) remains one of the most prominent global causes of mortality worldwide, necessitating accurate prediction models for early detection and prevention. In recent years, machine learning (ML) techniques have exhibited promising outcomes across various medical applications. This study introduces a novel ML-driven monogram approach for early identification of individuals at risk for developing CKD stages 3-5. This retrospective study employed a comprehensive dataset comprised of clinical and laboratory variables from a large cohort of diagnosed CKD patients. Advanced ML algorithms, including feature selection and regression models, were applied to build a predictive model. Among 467 participants, 11.56% developed CKD stages 3-5 over a 9-year follow-up. Several factors, such as age, gender, medical history, and laboratory results, independently exhibited significant associations with CKD (p < 0.05) and were utilized to create a risk function. The Linear regression (LR)-based model achieved an impressive R-score (coefficient of determination) of 0.954079, while the support vector machine (SVM) achieved a slightly lower value. An LR-based monogram was developed to facilitate the process of risk identification and management. The ML-driven nomogram demonstrated superior performance when compared to traditional prediction models, showcasing its potential as a valuable clinical tool for the early detection and prevention of CKD. Further studies should focus on refining the model and validating its performance in diverse populations.
Collapse
Affiliation(s)
- Samit Kumar Ghosh
- Healthcare Engineering Innovation Center (HEIC), Department of Biomedical Engineering, Khalifa University, Abu Dhabi, United Arab Emirates.
| | - Ahsan H Khandoker
- Healthcare Engineering Innovation Center (HEIC), Department of Biomedical Engineering, Khalifa University, Abu Dhabi, United Arab Emirates
| |
Collapse
|
19
|
Budiarto A, Tsang KCH, Wilson AM, Sheikh A, Shah SA. Machine Learning-Based Asthma Attack Prediction Models From Routinely Collected Electronic Health Records: Systematic Scoping Review. JMIR AI 2023; 2:e46717. [PMID: 38875586 PMCID: PMC11041490 DOI: 10.2196/46717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 09/28/2023] [Accepted: 10/09/2023] [Indexed: 06/16/2024]
Abstract
BACKGROUND An early warning tool to predict attacks could enhance asthma management and reduce the likelihood of serious consequences. Electronic health records (EHRs) providing access to historical data about patients with asthma coupled with machine learning (ML) provide an opportunity to develop such a tool. Several studies have developed ML-based tools to predict asthma attacks. OBJECTIVE This study aims to critically evaluate ML-based models derived using EHRs for the prediction of asthma attacks. METHODS We systematically searched PubMed and Scopus (the search period was between January 1, 2012, and January 31, 2023) for papers meeting the following inclusion criteria: (1) used EHR data as the main data source, (2) used asthma attack as the outcome, and (3) compared ML-based prediction models' performance. We excluded non-English papers and nonresearch papers, such as commentary and systematic review papers. In addition, we also excluded papers that did not provide any details about the respective ML approach and its result, including protocol papers. The selected studies were then summarized across multiple dimensions including data preprocessing methods, ML algorithms, model validation, model explainability, and model implementation. RESULTS Overall, 17 papers were included at the end of the selection process. There was considerable heterogeneity in how asthma attacks were defined. Of the 17 studies, 8 (47%) studies used routinely collected data both from primary care and secondary care practices together. Extreme imbalanced data was a notable issue in most studies (13/17, 76%), but only 38% (5/13) of them explicitly dealt with it in their data preprocessing pipeline. The gradient boosting-based method was the best ML method in 59% (10/17) of the studies. Of the 17 studies, 14 (82%) studies used a model explanation method to identify the most important predictors. None of the studies followed the standard reporting guidelines, and none were prospectively validated. CONCLUSIONS Our review indicates that this research field is still underdeveloped, given the limited body of evidence, heterogeneity of methods, lack of external validation, and suboptimally reported models. We highlighted several technical challenges (class imbalance, external validation, model explanation, and adherence to reporting guidelines to aid reproducibility) that need to be addressed to make progress toward clinical adoption.
Collapse
Affiliation(s)
- Arif Budiarto
- Asthma UK Center for Applied Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
- Bioinformatics and Data Science Research Center, Bina Nusantara University, Jakarta, Indonesia
| | - Kevin C H Tsang
- Asthma UK Center for Applied Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Andrew M Wilson
- Norwich Medical School, University of East Anglia, Norwich, United Kingdom
- Norfolk and Norwich University Hospital NHS Foundation Trust, Norwich, United Kingdom
| | - Aziz Sheikh
- Asthma UK Center for Applied Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Syed Ahmar Shah
- Asthma UK Center for Applied Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
20
|
Ali L, Sivaramakrishnan K, Kuttiyathil MS, Chandrasekaran V, Ahmed OH, Al-Harahsheh M, Altarawneh M. Prediction of Thermogravimetric Data in the Thermal Recycling of e-waste Using Machine Learning Techniques: A Data-driven Approach. ACS OMEGA 2023; 8:43254-43270. [PMID: 38024703 PMCID: PMC10652257 DOI: 10.1021/acsomega.3c07228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 10/12/2023] [Accepted: 10/17/2023] [Indexed: 12/01/2023]
Abstract
The release of bromine-free hydrocarbons and gases is a major challenge faced in the thermal recycling of e-waste due to the corrosive effects of produced HBr. Metal oxides such as Fe2O3 (hematite) are excellent debrominating agents, and they are copyrolyzed along with tetrabromophenol (TBP), a lesser used brominated flame retardant that is a constituent of printed circuit boards in electronic equipment. The pyrolytic (N2) and oxidative (O2) decomposition of TBP with Fe2O3 has been previously investigated with thermogravimetric analysis (TGA) at four different heating rates of 5, 10, 15, and 20 °C/min, and the mass loss data between room temperature and 800 °C were reported. The objective of our paper is to study the effectiveness of machine learning (ML) techniques to reproduce these TGA data so that the use of the instrument can be eliminated to enhance the potential of online monitoring of copyrolysis in e-waste treatment. This will reduce experimental and human errors as well as improve process time significantly. TGA data are both nonlinear and multidimensional, and hence, nonlinear regression techniques such as random forest (RF) and gradient boosting regression (GBR) showed the highest prediction accuracies of 0.999 and lowest prediction errors among all the ML models employed in this work. The large data sets allowed us to explore three different scenarios of model training and validation, where the number of training samples were varied from 10,000 to 40,000 for both TBP and TBP + hematite samples under N2 (pyrolysis) and O2 (combustion) environments. The novelty of our study is that ML techniques have not been employed for the copyrolysis of these compounds, while the significance is the excellent potential of enhanced online monitoring of e-waste treatment and extension to other characterization techniques such as spectroscopy and chromatography. Lastly, e-waste recycling could greatly benefit from ML applications since it has the potential to reduce total and operational costs and improve overall process time and efficiency, thereby encouraging more treatment plants to adopt these techniques, resulting in reducing the increasing environmental footprint of e-waste.
Collapse
Affiliation(s)
- Labeeb Ali
- Department
of Chemical and Petroleum Engineering, United
Arab Emirates University, Sheikh Khalifa Bin Zayed Street, Al-Ain 15551, United Arab
Emirates
| | - Kaushik Sivaramakrishnan
- Department
of Chemical and Petroleum Engineering, United
Arab Emirates University, Sheikh Khalifa Bin Zayed Street, Al-Ain 15551, United Arab
Emirates
| | - Mohamed Shafi Kuttiyathil
- Department
of Chemical and Petroleum Engineering, United
Arab Emirates University, Sheikh Khalifa Bin Zayed Street, Al-Ain 15551, United Arab
Emirates
| | - Vignesh Chandrasekaran
- Department
of Computer Science, University of British
Columbia, Vancouver V6T 1Z4, Canada
| | - Oday H. Ahmed
- Department
of Physics, College of Education, Al-Iraqia
University, Baghdad 10071, Iraq
| | - Mohammad Al-Harahsheh
- Chemical
Engineering Department, Jordan University
of Science and Technology, Irbid 22110, Jordan
| | - Mohammednoor Altarawneh
- Department
of Chemical and Petroleum Engineering, United
Arab Emirates University, Sheikh Khalifa Bin Zayed Street, Al-Ain 15551, United Arab
Emirates
| |
Collapse
|
21
|
He Q, Zhang H, Li T, Zhang X, Li X, Dong C. NIR Spectral Inversion of Soil Physicochemical Properties in Tea Plantations under Different Particle Size States. SENSORS (BASEL, SWITZERLAND) 2023; 23:9107. [PMID: 38005495 PMCID: PMC10675699 DOI: 10.3390/s23229107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 11/05/2023] [Accepted: 11/09/2023] [Indexed: 11/26/2023]
Abstract
Soil fertility is vital for the growth of tea plants. The physicochemical properties of soil play a key role in the evaluation of soil fertility. Thus, realizing the rapid and accurate detection of soil physicochemical properties is of great significance for promoting the development of precision agriculture in tea plantations. In recent years, spectral data have become an important tool for the non-destructive testing of soil physicochemical properties. In this study, a support vector regression (SVR) model was constructed to model the hydrolyzed nitrogen, available potassium, and effective phosphorus in tea plantation soils of different grain sizes. Then, the successful projections algorithm (SPA) and least-angle regression (LAR) and bootstrapping soft shrinkage (BOSS) variable importance screening methods were used to optimize the variables in the soil physicochemical properties. The findings demonstrated that soil particle sizes of 0.25-0.5 mm produced the best predictions for all three physicochemical properties. After further using the dimensionality reduction approach, the LAR algorithm (R2C = 0.979, R2P = 0.976, RPD = 6.613) performed optimally in the prediction model for hydrolytic nitrogen at a soil particle size of 0.25~0.5. The models using data dimensionality reduction and those that used the BOSS method to estimate available potassium (R2C = 0.977, R2P = 0.981, RPD = 7.222) and effective phosphorus (R2C = 0.969, R2P = 0.964, RPD = 5.163) had the best accuracy. In order to offer a reference for the accurate detection of soil physicochemical properties in tea plantations, this study investigated the modeling effect of each physicochemical property under various soil particle sizes and integrated the regression model with various downscaling strategies.
Collapse
Affiliation(s)
- Qinghai He
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310008, China; (Q.H.); (X.L.)
- Shandong Academy of Agricultural Machinery Science, Jinan 250100, China;
| | - Haowen Zhang
- Shandong Academy of Agricultural Machinery Science, Jinan 250100, China;
- Tea Research Institute, Shandong Academy of Agricultural Sciences, Jinan 250100, China;
- College of Mechanical and Electrical Engineering, Shandong Agricultural University, Tai’an 271000, China;
| | - Tianhua Li
- College of Mechanical and Electrical Engineering, Shandong Agricultural University, Tai’an 271000, China;
| | - Xiaojia Zhang
- Tea Research Institute, Shandong Academy of Agricultural Sciences, Jinan 250100, China;
| | - Xiaoli Li
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310008, China; (Q.H.); (X.L.)
| | - Chunwang Dong
- Tea Research Institute, Shandong Academy of Agricultural Sciences, Jinan 250100, China;
| |
Collapse
|
22
|
Lin R, Peng B, Li L, He X, Yan H, Tian C, Luo H, Yin G. Application of serum Raman spectroscopy combined with classification model for rapid breast cancer screening. Front Oncol 2023; 13:1258436. [PMID: 37965448 PMCID: PMC10640987 DOI: 10.3389/fonc.2023.1258436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 10/13/2023] [Indexed: 11/16/2023] Open
Abstract
Introduction This study aimed to evaluate the feasibility of using general Raman spectroscopy as a method to screen for breast cancer. The objective was to develop a machine learning model that utilizes Raman spectroscopy to detect serum samples from breast cancer patients, benign cases, and healthy subjects, with puncture biopsy as the gold standard for comparison. The goal was to explore the value of Raman spectroscopy in the differential diagnosis of breast cancer, benign lesions, and healthy individuals. Methods In this study, blood serum samples were collected from a total of 333 participants. Among them, there were 129 cases of tumors (pathologically diagnosed as breast cancer and labeled as cancer), 91 cases of benign lesions (pathologically diagnosed as benign and labeled as benign), and 113 cases of healthy controls (labeled as normal). Raman spectra of the serum samples from each group were collected. To classify the normal, benign, and cancer sample groups, principal component analysis (PCA) combined with support vector machine (SVM) was used. The SVM model was evaluated using a cross-validation method. Results The results of the study revealed significant differences in the mean Raman spectra of the serum samples between the normal and tumor/benign groups. Although the mean Raman spectra showed slight variations between the cancer and benign groups, the SVM model achieved a remarkable prediction accuracy of up to 98% for classifying cancer, benign, and normal groups. Discussion In conclusion, this exploratory study has demonstrated the tremendous potential of general Raman spectroscopy as a clinical adjunctive diagnostic and rapid screening tool for breast cancer.
Collapse
Affiliation(s)
- Runrui Lin
- School of Medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - Bowen Peng
- School of Electronic Science and Engineering, Nanjing University, Nanjing, China
| | - Lintao Li
- Radiation Oncology Key Laboratory of Sichuan Province, Sichuan Cancer Hospital & Institute, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaoliang He
- School of Clinical Medicine, Southwest Medical University, Luzhou, China
| | - Huan Yan
- School of Medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - Chao Tian
- Radiation Oncology Key Laboratory of Sichuan Province, Sichuan Cancer Hospital & Institute, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China
| | - Huaichao Luo
- Radiation Oncology Key Laboratory of Sichuan Province, Sichuan Cancer Hospital & Institute, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China
| | - Gang Yin
- Radiation Oncology Key Laboratory of Sichuan Province, Sichuan Cancer Hospital & Institute, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
23
|
Sancar N, Tabrizi SS. Machine learning approach for the detection of vitamin D level: a comparative study. BMC Med Inform Decis Mak 2023; 23:219. [PMID: 37845674 PMCID: PMC10580577 DOI: 10.1186/s12911-023-02323-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 10/03/2023] [Indexed: 10/18/2023] Open
Abstract
BACKGROUND After the World Health Organization declared the COVID-19 pandemic, the role of Vitamin D has become even more critical for people worldwide. The most accurate way to define vitamin D level is 25-hydroxy vitamin D(25-OH-D) blood test. However, this blood test is not always feasible. Most data sets used in health science research usually contain highly correlated features, which is referred to as multicollinearity problem. This problem can lead to misleading results and overfitting problems in the ML training process. Therefore, the proposed study aims to determine a clinically acceptable ML model for the detection of the vitamin D status of the North Cyprus adult participants accurately, without the need to determine 25-OH-D level, taking into account the multicollinearity problem. METHOD The study was conducted with 481 observations who applied voluntarily to Internal Medicine Department at NEU Hospital. The classification performance of four conventional supervised ML models, namely, Ordinal logistic regression(OLR), Elastic-net ordinal regression(ENOR), Support Vector Machine(SVM), and Random Forest (RF) was compared. The comparative analysis is performed regarding the model's sensitivity to the participant's metabolic syndrome(MtS)'positive status, hyper-parameter tuning, sensitivities to the size of training data, and the classification performance of the models. RESULTS Due to the presence of multicollinearity, the findings showed that the performance of the SVM(RBF) is obviously negatively affected when the test is examined. Moreover, it can be obviously detected that RF is more robust than other models when the variations in the size of training data are examined. This experiment's result showed that the selected RF and ENOR showed better performances than the other two models when the size of training samples was reduced. Since the multicollinearity is more severe in the small samples, it can be concluded that RF and ENOR are not affected by the presence of the multicollinearity problem. The comparative analysis revealed that the RF classifier performed better and was more robust than the other proposed models in terms of accuracy (0.94), specificity (0.96), sensitivity or recall (0.94), precision (0.95), F1-score (0.95), and Cohen's kappa (0.90). CONCLUSION It is evident that the RF achieved better than the SVM(RBF), ENOR, and OLR. These comparison findings will be applied to develop a Vitamin D level intelligent detection system for being used in routine clinical, biochemical tests, and lifestyle characteristics of individuals to decrease the cost and time of vitamin D level detection.
Collapse
Affiliation(s)
- Nuriye Sancar
- Department of Mathematics, Near East University, Nicosia, 99138, Turkey.
| | - Sahar S Tabrizi
- Department of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| |
Collapse
|
24
|
Wang G, Zeng M, Li J, Liu Y, Wei D, Long Z, Chen H, Zang X, Yang J. Neural Representation of Collective Self-esteem in Resting-state Functional Connectivity and its Validation in Task-dependent Modality. Neuroscience 2023; 530:66-78. [PMID: 37619767 DOI: 10.1016/j.neuroscience.2023.08.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 08/01/2023] [Accepted: 08/09/2023] [Indexed: 08/26/2023]
Abstract
INTRODUCTION Collective self-esteem (CSE) is an important personality variable, defined as self-worth derived from membership in social groups. A study explored the neural basis of CSE using a task-based functional magnetic resonance imaging (fMRI) paradigm; however, task-independent neural basis of CSE remains to be explored, and whether the CSE neural basis of resting-state fMRI is consistent with that of task-based fMRI is unclear. METHODS We built support vector regression (SVR) models to predict CSE scores using topological metrics measured in the resting-state functional connectivity network (RSFC) as features. Then, to test the reliability of the SVR analysis, the activation pattern of the identified brain regions from SVR analysis was used as features to distinguish collective self-worth from other conditions by multivariate pattern classification in task-based fMRI dataset. RESULTS SVR analysis results showed that leverage centrality successfully decoded the individual differences in CSE. The ventromedial prefrontal cortex, anterior cingulate cortex, posterior cingulate gyrus, precuneus, orbitofrontal cortex, posterior insula, postcentral gyrus, inferior parietal lobule, temporoparietal junction, and inferior frontal gyrus, which are involved in self-referential processing, affective processing, and social cognition networks, participated in this prediction. Multivariate pattern classification analysis found that the activation pattern of the identified regions from the SVR analysis successfully distinguished collective self-worth from relational self-worth, personal self-worth and semantic control. CONCLUSION Our findings revealed CSE neural basis in the whole-brain RSFC network, and established the concordance between leverage centrality and the activation pattern (evoked during collective self-worth task) of the identified regions in terms of representing CSE.
Collapse
Affiliation(s)
- Guangtong Wang
- Faculty of Psychology, Southwest University, Chongqing 400715, China; Key Laboratory of Cognition and Personality, Ministry of Education, Southwest University, Chongqing 400715, China
| | - Mei Zeng
- Faculty of Psychology, Southwest University, Chongqing 400715, China; Key Laboratory of Cognition and Personality, Ministry of Education, Southwest University, Chongqing 400715, China
| | - Jiwen Li
- Faculty of Psychology, Southwest University, Chongqing 400715, China; Key Laboratory of Cognition and Personality, Ministry of Education, Southwest University, Chongqing 400715, China
| | - Yadong Liu
- Faculty of Psychology, Southwest University, Chongqing 400715, China; Key Laboratory of Cognition and Personality, Ministry of Education, Southwest University, Chongqing 400715, China
| | - Dongtao Wei
- Faculty of Psychology, Southwest University, Chongqing 400715, China; Key Laboratory of Cognition and Personality, Ministry of Education, Southwest University, Chongqing 400715, China
| | - Zhiliang Long
- Faculty of Psychology, Southwest University, Chongqing 400715, China; Key Laboratory of Cognition and Personality, Ministry of Education, Southwest University, Chongqing 400715, China
| | - Haopeng Chen
- Faculty of Psychology, Southwest University, Chongqing 400715, China; Key Laboratory of Cognition and Personality, Ministry of Education, Southwest University, Chongqing 400715, China
| | - Xinlei Zang
- Faculty of Psychology, Southwest University, Chongqing 400715, China; Key Laboratory of Cognition and Personality, Ministry of Education, Southwest University, Chongqing 400715, China
| | - Juan Yang
- Faculty of Psychology, Southwest University, Chongqing 400715, China; Key Laboratory of Cognition and Personality, Ministry of Education, Southwest University, Chongqing 400715, China.
| |
Collapse
|
25
|
Bian X, Zhao Z, Liu J, Liu P, Shi H, Tan X. Discretized butterfly optimization algorithm for variable selection in the rapid determination of cholesterol by near-infrared spectroscopy. ANALYTICAL METHODS : ADVANCING METHODS AND APPLICATIONS 2023; 15:5190-5198. [PMID: 37779476 DOI: 10.1039/d3ay01636f] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
Abstract
The blood cholesterol level is strongly associated with cardiovascular disease. It is necessary to develop a rapid method to determine the cholesterol concentration of blood. In this study, a discretized butterfly optimization algorithm-partial least squares (BOA-PLS) method combined with near-infrared (NIR) spectroscopy is firstly proposed for rapid determination of the cholesterol concentration in blood. In discretized BOA, the butterfly vector is described by 1 or 0, which represents whether the variable is selected or not, respectively. In the optimization process, four transfer functions, i.e., arctangent, V-shaped, improved arctangent (I-atan) and improved V-shaped (I-V), are introduced and compared for discretization of the butterfly position. The partial least squares (PLS) model is established between the selected NIR variables and cholesterol concentrations. The iteration number, transfer functions and the performance of butterflies are investigated. The proposed method is compared with full-spectrum PLS, multiplicative scatter correction-PLS (MSC-PLS), max-min scaling-PLS (MMS-PLS), MSC-MMS-PLS, uninformative variable elimination-PLS (UVE-PLS), Monte Carlo uninformative variable elimination-PLS (MCUVE-PLS) and randomization test-PLS (RT-PLS). Results show that the I-V function is the best transfer function for discretization. Both preprocessing and variable selection can improve the prediction performance of PLS. Variable selection methods based on BOA are better than those based on statistics. Furthermore, I-V-BOA-PLS has the highest predictive accuracy among the seven variable selection methods. MSC-MMS can further improve the prediction ability of I-V-BOA-PLS. Therefore, BOA-PLS combined with NIR spectroscopy is promising for the rapid determination of cholesterol concentration in blood.
Collapse
Affiliation(s)
- Xihui Bian
- State Key Laboratory of Separation Membranes and Membrane Processes, School of Chemical Engineering and Technology, Tiangong University, Tianjin, 300387, China.
- Shandong Provincial Key Laboratory of Olefin Catalysis and Polymerization, Shandong Chambroad Holding Group Co. Ltd., Binzhou 256500, China
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, Shandong University, Jinan, 250012, China
| | - Zizhen Zhao
- State Key Laboratory of Separation Membranes and Membrane Processes, School of Chemical Engineering and Technology, Tiangong University, Tianjin, 300387, China.
| | - Jianwen Liu
- State Key Laboratory of Separation Membranes and Membrane Processes, School of Chemical Engineering and Technology, Tiangong University, Tianjin, 300387, China.
| | - Peng Liu
- State Key Laboratory of Separation Membranes and Membrane Processes, School of Chemical Engineering and Technology, Tiangong University, Tianjin, 300387, China.
| | - Huibing Shi
- Shandong Provincial Key Laboratory of Olefin Catalysis and Polymerization, Shandong Chambroad Holding Group Co. Ltd., Binzhou 256500, China
| | - Xiaoyao Tan
- State Key Laboratory of Separation Membranes and Membrane Processes, School of Chemical Engineering and Technology, Tiangong University, Tianjin, 300387, China.
| |
Collapse
|
26
|
Keshtehgar A, Dahmardeh M, Ghanbari A, Khammari I. Prediction models of macro-nutrient content in plant organs of Cucumis melo in response to soil elements using support vector regression. PeerJ 2023; 11:e15417. [PMID: 37810792 PMCID: PMC10552743 DOI: 10.7717/peerj.15417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 04/24/2023] [Indexed: 10/10/2023] Open
Abstract
Background Undoubtedly, the importance of food and food security as one of the present and future challenges is not invisible to anyone. Nowadays, the development of methods for monitoring the nutrient content in crop products is an essential issue for implementing reasonable and logical soil properties management. The modeling technique can evaluate the soil properties of fields and study the subject of crop yield through soil management. This study aims to predict fruit yield and macro-nutrient content in plant organs of Cucumis melo in response to soil elements using support vector regression (SVR). Methodology In the spring of 2020, this study was done as a factorial test in a randomized complete block design with three replications. The first factor was the use of fertilizers in six levels: no fertilizer (control), cow manure (30 t ha-1), sheep manure (30 t ha-1), nanobiomic foliar application (2 l ha-1), silicone foliar application (3 l ha-1), and chemical fertilizer from urea, triple superphosphate, and potassium sulfate sources (200, 100, and 150 kg ha-1). In addition, four levels of vermicompost considering as the second factor: no vermicompost (control), 5, 10, and 15 t ha-1. Input data sets such as fruit yield and nitrogen, phosphorus, and potassium levels in the seeds, fruits, leaves, and roots are used to calibrate the probabilistic model of SP using SVR. Results According to the results, when the data sets of the nitrogen, phosphorus, and potassium in the fruit uses as input, the accuracy of these models was higher than 80.0% (R2 = 0.807 for predicting fruit nitrogen; R2 = 0.999 for fruit phosphorus; R2 = 0.968 for fruit potassium). Also, the results of the prediction models in response to soil elements showed that the soil nitrogen content ranged from 0.05 to 1.1%, soil phosphorus from 10 to 59 mg kg-1, and soil potassium from 180 to 320 mg kg-1, which offers a suitable macro-nutrient content in the soil. Likewise, the best fruit nitrogen content ranged from 1.27 to 4.33%, fruit phosphorus from 15.74 to 26.19%, fruit potassium from 15.19 to 19.67%, and fruit yield from 2.16 to 5.95 kg per plant obtained under NPK chemical fertilizers and using 15 t ha-1 of vermicompost. Conclusions Because the fruit values had the highest contribution in prediction than observed values, thus identified as the best plant organs in response to soil elements. Based on our findings, the importance of fruit phosphorus identifies as a determinant that strongly influenced melon prediction models. More significant values of soil elements do not affect increasing fruit yield and macro-nutrient content in plant organs, and excessive application may not be economical. Therefore, our studies provide an efficient approach with potentially high accuracy to estimate fruit yield and macro-nutrient in the fruits of Cucumis melo in response to soil elements and cause a saving in the amount of fertilizer during the growing season.
Collapse
Affiliation(s)
- Abbas Keshtehgar
- Department of Agronomy, University of Zabol, Zabol, Sistan and Baluchestan, Iran
| | - Mahdi Dahmardeh
- Department of Agronomy, University of Zabol, Zabol, Sistan and Baluchestan, Iran
| | - Ahmad Ghanbari
- Department of Agronomy, University of Zabol, Zabol, Sistan and Baluchestan, Iran
| | - Issa Khammari
- Department of Agronomy, University of Zabol, Zabol, Sistan and Baluchestan, Iran
| |
Collapse
|
27
|
Chen X, He L, Shi K, Wu Y, Lin S, Fang Y. Interpretable Machine Learning for Fall Prediction Among Older Adults in China. Am J Prev Med 2023; 65:579-586. [PMID: 37087076 DOI: 10.1016/j.amepre.2023.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 04/14/2023] [Accepted: 04/14/2023] [Indexed: 04/24/2023]
Abstract
INTRODUCTION Falls in older adults are potentially devastating, whereas an accurate fall risk prediction model for community-dwelling older Chinese is still lacking. The objective of this study was to build prediction models for falls and fall-related injuries among community-dwelling older adults in China. METHODS This study used data (Waves 2015 and 2018) from 5,818 participants from the China Health and Retirement Longitudinal Study. A total of 107 input variables at the baseline level were regarded as candidate features. Five machine learning algorithms were used to build the 3-year fall and fall-related injury risk prediction models. SHapley Additive exPlanations was used for the prediction model explanation. Analyses were conducted in 2022. RESULTS The logistic regression model achieved the best performance among fall and fall-related injury prediction models with an area under the receiver operating characteristic curve of 0.739 and 0.757, respectively. Experience of falling was the most important feature in both models. Other important features included basic activity of daily living, instrumental activity of daily living, depressive symptoms, house tidiness, grip strength, and sleep duration. The important features unique to the fall model were house temperature, sex, and flush toilets, whereas lung function, smoking, and Internet access were exclusively related to the fall-related injury model. CONCLUSIONS This study suggests that the optimal models hold promise for screening out older adults at high risk for falls in facilitated targeted interventions. Fall prevention strategies should specifically focus on fall history, physical functions, psychological factors, and home environment.
Collapse
Affiliation(s)
- Xiaodong Chen
- Center for Aging and Health Research, School of Public Health, Xiamen University, Xiamen, China
| | - Lingxiao He
- Center for Aging and Health Research, School of Public Health, Xiamen University, Xiamen, China
| | - Kewei Shi
- Center for Aging and Health Research, School of Public Health, Xiamen University, Xiamen, China
| | - Yafei Wu
- Center for Aging and Health Research, School of Public Health, Xiamen University, Xiamen, China
| | - Shaowu Lin
- Center for Aging and Health Research, School of Public Health, Xiamen University, Xiamen, China
| | - Ya Fang
- Center for Aging and Health Research, School of Public Health, Xiamen University, Xiamen, China.
| |
Collapse
|
28
|
Chhoa H, Chabriat H, Anato AJ, Bamba M, Zittoun F, Chevret S, Biard L. Improvement of an External Predictive Model Based on New Information Using a Synthetic Data Approach: Application to CADASIL. Neurol Genet 2023; 9:e200091. [PMID: 38235365 PMCID: PMC10691224 DOI: 10.1212/nxg.0000000000200091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 06/07/2023] [Indexed: 01/19/2024]
Abstract
Background and Objectives Cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL) is the most frequent hereditary cerebral small vessel disease. It is caused by mutations of the NOTCH3 gene. The disease evolves progressively over decades leading to stroke, disability, cognitive decline, and functional dependency. The course and clinical severity of CADASIL seem heterogeneous. Predictive models are thus needed to improve prognostic evaluation and inform future clinical trials. A predictive model of the 3-year variation in the Mattis Dementia Rating Scale (MDRS), which reflects the global cognitive performance of patients with CADASIL, was previously proposed. This model made predictions based on demographic, clinical, and MRI data. We aimed to improve this existing predictive model by integrating a new potential factor, the location of the genetic mutation in the different epidermal growth factor (EGFr) domains of the NOTCH3 gene, dichotomized into EGFr domains 1 to 6 or 7 to 34. Methods We used a new synthetic data approach to improve the initial predictive model by incorporating additional genetic information. This method combined the predicted outcomes from the previous model and 5 "synthetic" data sets with the observed outcome in a new data set. We then applied a multiple imputation method for missing data on the mutation location. Results The new data set included 367 patients who were followed up for 30 to 42 months. In the multivariable model with synthetic data, patients with NOTCH3 mutations in EGFr domains 7 to 34 had an additional average decrease of -1.4 points (standard error 0.67, p = 0.035) in their MDRS score variation over 3 years compared with patients with mutations located in EGFr domains 1 to 6. Cross-validation results highlighted the improved predictive performance of the enhanced model. Moreover, the model estimation was found to be more robust than fitting a model without synthetic data. Discussion The use of synthetic data improved the predictive model of MDRS change over 3 years in CADASIL. The predictive performance and estimation robustness of the predictive model were enhanced using this approach, whether genetic information was used. A statistically significant association between the location of the mutation in the NOTCH3 gene and the 3-year MDRS score variation was detected.
Collapse
Affiliation(s)
- Henri Chhoa
- From the ECSTRRA Team (H. Chhoa, S.C., L.B.), Université Paris-Cité, UMR1153, INSERM; Translational Neurovascular Centre (H. Chabriat), GH Saint-Louis-Lariboisière, Assistance Publique des Hôpitaux de Paris APHP, Université Paris-Cité and DHU NeuroVasc Sorbonne Paris-Cité; UMR 1161 (H. Chabriat), INSERM; and ENSAI (A.J.A., M.B., F.Z.), Ecole d'ingénieur statistique, data science et big data, Bruz, France
| | - Hugues Chabriat
- From the ECSTRRA Team (H. Chhoa, S.C., L.B.), Université Paris-Cité, UMR1153, INSERM; Translational Neurovascular Centre (H. Chabriat), GH Saint-Louis-Lariboisière, Assistance Publique des Hôpitaux de Paris APHP, Université Paris-Cité and DHU NeuroVasc Sorbonne Paris-Cité; UMR 1161 (H. Chabriat), INSERM; and ENSAI (A.J.A., M.B., F.Z.), Ecole d'ingénieur statistique, data science et big data, Bruz, France
| | - Adelina Joanita Anato
- From the ECSTRRA Team (H. Chhoa, S.C., L.B.), Université Paris-Cité, UMR1153, INSERM; Translational Neurovascular Centre (H. Chabriat), GH Saint-Louis-Lariboisière, Assistance Publique des Hôpitaux de Paris APHP, Université Paris-Cité and DHU NeuroVasc Sorbonne Paris-Cité; UMR 1161 (H. Chabriat), INSERM; and ENSAI (A.J.A., M.B., F.Z.), Ecole d'ingénieur statistique, data science et big data, Bruz, France
| | - Mamadou Bamba
- From the ECSTRRA Team (H. Chhoa, S.C., L.B.), Université Paris-Cité, UMR1153, INSERM; Translational Neurovascular Centre (H. Chabriat), GH Saint-Louis-Lariboisière, Assistance Publique des Hôpitaux de Paris APHP, Université Paris-Cité and DHU NeuroVasc Sorbonne Paris-Cité; UMR 1161 (H. Chabriat), INSERM; and ENSAI (A.J.A., M.B., F.Z.), Ecole d'ingénieur statistique, data science et big data, Bruz, France
| | - Florent Zittoun
- From the ECSTRRA Team (H. Chhoa, S.C., L.B.), Université Paris-Cité, UMR1153, INSERM; Translational Neurovascular Centre (H. Chabriat), GH Saint-Louis-Lariboisière, Assistance Publique des Hôpitaux de Paris APHP, Université Paris-Cité and DHU NeuroVasc Sorbonne Paris-Cité; UMR 1161 (H. Chabriat), INSERM; and ENSAI (A.J.A., M.B., F.Z.), Ecole d'ingénieur statistique, data science et big data, Bruz, France
| | - Sylvie Chevret
- From the ECSTRRA Team (H. Chhoa, S.C., L.B.), Université Paris-Cité, UMR1153, INSERM; Translational Neurovascular Centre (H. Chabriat), GH Saint-Louis-Lariboisière, Assistance Publique des Hôpitaux de Paris APHP, Université Paris-Cité and DHU NeuroVasc Sorbonne Paris-Cité; UMR 1161 (H. Chabriat), INSERM; and ENSAI (A.J.A., M.B., F.Z.), Ecole d'ingénieur statistique, data science et big data, Bruz, France
| | - Lucie Biard
- From the ECSTRRA Team (H. Chhoa, S.C., L.B.), Université Paris-Cité, UMR1153, INSERM; Translational Neurovascular Centre (H. Chabriat), GH Saint-Louis-Lariboisière, Assistance Publique des Hôpitaux de Paris APHP, Université Paris-Cité and DHU NeuroVasc Sorbonne Paris-Cité; UMR 1161 (H. Chabriat), INSERM; and ENSAI (A.J.A., M.B., F.Z.), Ecole d'ingénieur statistique, data science et big data, Bruz, France
| |
Collapse
|
29
|
Rácz A, Vincze A, Volk B, Balogh GT. Extending the limitations in the prediction of PAMPA permeability with machine learning algorithms. Eur J Pharm Sci 2023; 188:106514. [PMID: 37402429 DOI: 10.1016/j.ejps.2023.106514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 06/21/2023] [Accepted: 07/01/2023] [Indexed: 07/06/2023]
Abstract
Gastrointestinal absorption is a key factor amongst the ADME-related (absorption, distribution, metabolism and excretion) pharmacokinetic properties; therefore, it has a major role in drug discovery and drug safety determinations. The Parallel Artificial Membrane Permeability Assay (PAMPA) can be considered as the most popular and well-known screening assay for the measurement of gastrointestinal absorption. Our study provides quantitative structure-property relationship (QSPR) models based on experimental PAMPA permeability data for almost four hundred diverse molecules, which is a great extension of the applicability of the models in the chemical space. Two- and three-dimensional molecular descriptors were applied for the model building in every case. We have compared the performance of a classical partial least squares regression (PLS) model with two major machine learning algorithms: artificial neural networks (ANN) and support vector machine (SVM). Due to the applied gradient pH in the experiments, we have calculated the descriptors for the model building at pH values of 7.4 and 6.5, and compared the effect of pH on the performance of the models. After a complex validation protocol, the best model had an R2=0.91 for the training set, and R2= 0.84 for the external test set. The developed models are capable for the robust and fast prediction of new compounds with an excellent accuracy compared to the previous QSPR models.
Collapse
Affiliation(s)
- Anita Rácz
- Plasma Chemistry Research Group, Institute of Materials and Environmental Chemistry, Research Centre for Natural Sciences, Magyar tudósok krt. 2., Budapest H-1117, Hungary.
| | - Anna Vincze
- Department of Chemical and Environmental Process Engineering, Budapest University of Technology and Economics, Műegyetem rakpart 3., Budapest H-1111, Hungary
| | - Balázs Volk
- Directorate of Drug Substance Development, Egis Pharmaceuticals Plc., P.O. Box 100, Budapest H-1475, Hungary
| | - György T Balogh
- Department of Chemical and Environmental Process Engineering, Budapest University of Technology and Economics, Műegyetem rakpart 3., Budapest H-1111, Hungary; Department of Pharmaceutical Chemistry, Semmelweis University, Hőgyes Endre út 9., Budapest H-1092, Hungary.
| |
Collapse
|
30
|
Li W, Shao C, Li C, Zhou H, Yu L, Yang J, Wan H, He Y. Metabolomics: A useful tool for ischemic stroke research. J Pharm Anal 2023; 13:968-983. [PMID: 37842657 PMCID: PMC10568109 DOI: 10.1016/j.jpha.2023.05.015] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 05/14/2023] [Accepted: 05/29/2023] [Indexed: 10/17/2023] Open
Abstract
Ischemic stroke (IS) is a multifactorial and heterogeneous disease. Despite years of studies, effective strategies for the diagnosis, management and treatment of stroke are still lacking in clinical practice. Metabolomics is a growing field in systems biology. It is starting to show promise in the identification of biomarkers and in the use of pharmacometabolomics to help patients with certain disorders choose their course of treatment. The development of metabolomics has enabled further and more biological applications. Particularly, metabolomics is increasingly being used to diagnose diseases, discover new drug targets, elucidate mechanisms, and monitor therapeutic outcomes and its potential effect on precision medicine. In this review, we reviewed some recent advances in the study of metabolomics as well as how metabolomics might be used to identify novel biomarkers and understand the mechanisms of IS. Then, the use of metabolomics approaches to investigate the molecular processes and active ingredients of Chinese herbal formulations with anti-IS capabilities is summarized. We finally summarized recent developments in single cell metabolomics for exploring the metabolic profiles of single cells. Although the field is relatively young, the development of single cell metabolomics promises to provide a powerful tool for unraveling the pathogenesis of IS.
Collapse
Affiliation(s)
- Wentao Li
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou, 310053, China
| | - Chongyu Shao
- School of Basic Medicine Sciences, Zhejiang Chinese Medical University, Hangzhou, 310053, China
| | - Chang Li
- School of Basic Medicine Sciences, Zhejiang Chinese Medical University, Hangzhou, 310053, China
| | - Huifen Zhou
- School of Basic Medicine Sciences, Zhejiang Chinese Medical University, Hangzhou, 310053, China
| | - Li Yu
- School of Basic Medicine Sciences, Zhejiang Chinese Medical University, Hangzhou, 310053, China
| | - Jiehong Yang
- School of Basic Medicine Sciences, Zhejiang Chinese Medical University, Hangzhou, 310053, China
| | - Haitong Wan
- School of Basic Medicine Sciences, Zhejiang Chinese Medical University, Hangzhou, 310053, China
| | - Yu He
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou, 310053, China
| |
Collapse
|
31
|
Budiarto A, Sheikh A, Wilson A, Price DB, Shah SA. Handling Class Imbalance in Machine Learning-based Prediction Models: A Case Study in Asthma Management. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-5. [PMID: 38083129 DOI: 10.1109/embc40787.2023.10340751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
A data-driven prediction tool has the potential to provide early warning of an asthma attack and improve asthma management and outcomes. Most previous machine learning (ML)-based studies for asthma attack prediction have reported a severe class imbalance, with major implications for model performance. We aimed to undertake a systematic comparison of several class imbalance handling techniques in the context of risk prediction models for asthma prognosis. We used data from 9,835 asthma patients extracted from the Medical Information Mart for Intensive Care (MIMIC) IV database and deployed five class imbalance handling methods based on synthetic minority oversampling technique (SMOTE) and cost function customisation. We then compared their performances in improving two-class classifier models developed using logistic regression (LR) and extreme gradient boosting (XGBoost) for three different prediction tasks with varying severity of class imbalance (proportion of majority class ranging from 90.86% to 98.98%). The cost function customisation technique substantially outperformed the SMOTE-based methods in all tasks. XGBoost combined with cost function customisation achieved the highest prediction performance for the outcome with the most extreme class imbalance ratio (AUC = 0.72). Our findings suggest that the cost function customisation-based approach to tackle class imbalance provides substantially better performance compared to oversampling in the context of asthma management.Clinical Relevance- This study underscores the challenge of class imbalance in the context of prediction tools to improve asthma management and outcomes and provides a methodological solution that addresses the challenge. Accurate asthma prediction tools can provide early warning and potentially prevent deterioration thereby improving the quality of life of patients with asthma.
Collapse
|
32
|
Pchitskaya E, Vasiliev P, Smirnova D, Chukanov V, Bezprozvanny I. SpineTool is an open-source software for analysis of morphology of dendritic spines. Sci Rep 2023; 13:10561. [PMID: 37386071 PMCID: PMC10310755 DOI: 10.1038/s41598-023-37406-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 06/21/2023] [Indexed: 07/01/2023] Open
Abstract
Dendritic spines form most excitatory synaptic inputs in neurons and these spines are altered in many neurodevelopmental and neurodegenerative disorders. Reliable methods to assess and quantify dendritic spines morphology are needed, but most existing methods are subjective and labor intensive. To solve this problem, we developed an open-source software that allows segmentation of dendritic spines from 3D images, extraction of their key morphological features, and their classification and clustering. Instead of commonly used spine descriptors based on numerical metrics we used chord length distribution histogram (CLDH) approach. CLDH method depends on distribution of lengths of chords randomly generated within dendritic spines volume. To achieve less biased analysis, we developed a classification procedure that uses machine-learning algorithm based on experts' consensus and machine-guided clustering tool. These approaches to unbiased and automated measurements, classification and clustering of synaptic spines that we developed should provide a useful resource for a variety of neuroscience and neurodegenerative research applications.
Collapse
Affiliation(s)
- Ekaterina Pchitskaya
- Laboratory of Molecular Neurodegeneration, Peter the Great St. Petersburg Polytechnic University, Khlopina St. 11, St. Petersburg, Russia, 194021.
| | - Peter Vasiliev
- Laboratory of Molecular Neurodegeneration, Peter the Great St. Petersburg Polytechnic University, Khlopina St. 11, St. Petersburg, Russia, 194021
- Department of Applied Mathematics, Peter the Great St. Petersburg Polytechnic University, Polytechnicheskaya St. 29, St. Petersburg, Russia, 195251
| | - Daria Smirnova
- Department of Applied Mathematics, Peter the Great St. Petersburg Polytechnic University, Polytechnicheskaya St. 29, St. Petersburg, Russia, 195251
| | - Vyacheslav Chukanov
- Laboratory of Molecular Neurodegeneration, Peter the Great St. Petersburg Polytechnic University, Khlopina St. 11, St. Petersburg, Russia, 194021
- Department of Applied Mathematics, Peter the Great St. Petersburg Polytechnic University, Polytechnicheskaya St. 29, St. Petersburg, Russia, 195251
| | - Ilya Bezprozvanny
- Laboratory of Molecular Neurodegeneration, Peter the Great St. Petersburg Polytechnic University, Khlopina St. 11, St. Petersburg, Russia, 194021.
- Department of Physiology, UT Southwestern Medical Center at Dallas, Dallas, TX, 75390, USA.
| |
Collapse
|
33
|
Ashraf WM, Uddin GM, Tariq R, Ahmed A, Farhan M, Nazeer MA, Hassan RU, Naeem A, Jamil H, Krzywanski J, Sosnowski M, Dua V. Artificial Intelligence Modeling-Based Optimization of an Industrial-Scale Steam Turbine for Moving toward Net-Zero in the Energy Sector. ACS OMEGA 2023; 8:21709-21725. [PMID: 37360426 PMCID: PMC10285957 DOI: 10.1021/acsomega.3c01227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 05/16/2023] [Indexed: 06/28/2023]
Abstract
Augmentation of energy efficiency in the power generation systems can aid in decarbonizing the energy sector, which is also recognized by the International Energy Agency (IEA) as a solution to attain net-zero from the energy sector. With this reference, this article presents a framework incorporating artificial intelligence (AI) for improving the isentropic efficiency of a high-pressure (HP) steam turbine installed at a supercritical power plant. The data of the operating parameters taken from a supercritical 660 MW coal-fired power plant is well-distributed in the input and output spaces of the operating parameters. Based on hyperparameter tuning, two advanced AI modeling algorithms, i.e., artificial neural network (ANN) and support vector machine (SVM), are trained and, subsequently, validated. ANN, as turned out to be a better-performing model, is utilized to conduct the Monte Carlo technique-based sensitivity analysis toward the high-pressure (HP) turbine efficiency. Subsequently, the ANN model is deployed for evaluating the impact of individual or combination of operating parameters on the HP turbine efficiency under three real-power generation capacities of the power plant. The parametric study and nonlinear programming-based optimization techniques are applied to optimize the HP turbine efficiency. It is estimated that the HP turbine efficiency can be improved by 1.43, 5.09, and 3.40% as compared to that of the average values of input parameters for half-load, mid-load, and full-load power generation modes, respectively. The annual reduction in CO2 measuring 58.3, 123.5, and 70.8 kilo ton/year (kt/y) corresponds to half-load, mid-load, and full load, respectively, and noticeable mitigation of SO2, CH4, N2O, and Hg emissions is estimated for the three power generation modes of the power plant. The AI-based modeling and optimization analysis is conducted to enhance the operation excellence of the industrial-scale steam turbine that promotes higher-energy efficiency and contributes to the net-zero target from the energy sector.
Collapse
Affiliation(s)
- Waqar Muhammad Ashraf
- The
Sargent Centre for Process Systems Engineering, Department of Chemical
Engineering, University College London, Torrington Place, London WC1E 7JE, U.K.
| | - Ghulam Moeen Uddin
- Department
of Mechanical Engineering, University of
Engineering & Technology, Lahore, Punjab 54890, Pakistan
| | - Rasikh Tariq
- Facultad
de Ingeniería, Universidad Autónoma
de Yucatán, Av.
Industrias No Contaminantes por Anillo Periférico Norte, Apdo.
Postal 150, Cordemex, Mérida, 97203 Yucatán, Mexico
- Tecnológico
Nacional de México/IT de Mérida, Departamento de Sistemas y Computación, Mérida, Mexico
| | - Afaq Ahmed
- Department
of Mechanical Engineering, University of
Engineering & Technology, Lahore, Punjab 54890, Pakistan
| | - Muhammad Farhan
- Department
of Mechanical Engineering, University of
Engineering & Technology, Lahore, Punjab 54890, Pakistan
| | - Muhammad Aarif Nazeer
- Department
of Mechanical Engineering, University of
Engineering & Technology, Lahore, Punjab 54890, Pakistan
| | - Rauf Ul Hassan
- Department
of Mechanical Engineering, University of
Engineering & Technology, Lahore, Punjab 54890, Pakistan
| | - Ahmad Naeem
- Department
of Automotive Engineering Technology, Punjab
Tianjin University of Technology, Lahore 54000, Pakistan
| | - Hanan Jamil
- Department
of Mechanical Engineering, University of
Engineering & Technology, Lahore, Punjab 54890, Pakistan
| | - Jaroslaw Krzywanski
- Faculty
of Science and Technology, Jan Dlugosz University
in Czestochowa, 13/15 Armii Krajowej Av., 42-200 Czestochowa, Poland
| | - Marcin Sosnowski
- Faculty
of Science and Technology, Jan Dlugosz University
in Czestochowa, 13/15 Armii Krajowej Av., 42-200 Czestochowa, Poland
| | - Vivek Dua
- The
Sargent Centre for Process Systems Engineering, Department of Chemical
Engineering, University College London, Torrington Place, London WC1E 7JE, U.K.
| |
Collapse
|
34
|
Bu Y, Jiang X, Tian J, Hu X, Han L, Huang D, Luo H. Rapid nondestructive detecting of sorghum varieties based on hyperspectral imaging and convolutional neural network. JOURNAL OF THE SCIENCE OF FOOD AND AGRICULTURE 2023; 103:3970-3983. [PMID: 36397181 DOI: 10.1002/jsfa.12344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 10/24/2022] [Accepted: 11/18/2022] [Indexed: 05/03/2023]
Abstract
BACKGROUND The purity of sorghum varieties is an important indicator of the quality of raw materials used in the distillation of liquors. Different varieties of sorghum may be mixed during the acquisition process, which will affect the flavor and quality of liquor. To facilitate the rapid identification of sorghum varieties, this study proposes a sorghum variety identification model using hyperspectral imaging (HSI) technology combined with convolutional neural network (AlexNet). RESULTS First, the watershed algorithm, which was modified with the extended-maxim transform, was used to segment the hyperspectral images of a single sorghum grain. The isolated forest algorithm was used to eliminate abnormal spectral data from the complete spectral data. Secondly, the AlexNet model of sorghum variety identification was established based on the two-dimensional gray image data of sorghum grain in group 1. The effects of different preprocessing methods and different convolution kernel sizes on the performance of the AlexNet model were discussed. The eigenvalues of the last layer of the AlexNet model were visualized using the t-distributed random neighborhood embedding method, which is used to evaluate the separability of features extracted by the AlexNet model. The performance differences between the optimal AlexNet model and traditional machine learning models for sorghum variety identification were compared. Finally, the varieties of sorghum grains in groups 2 and 3 were identified based on the optimal AlexNet model, and the average accuracy values of the test set reached 95.62% and 95.91% respectively. CONCLUSION The results in this study demonstrated that HSI combined with the AlexNet model could provide a feasible technical approach for the detection of sorghum varieties. © 2022 Society of Chemical Industry.
Collapse
Affiliation(s)
- Youhua Bu
- College of Mechanical Engineering, Sichuan University of Science and Engineering, Yibin, China
| | - Xinna Jiang
- College of Mechanical Engineering, Sichuan University of Science and Engineering, Yibin, China
| | - Jianping Tian
- College of Mechanical Engineering, Sichuan University of Science and Engineering, Yibin, China
| | - Xinjun Hu
- College of Mechanical Engineering, Sichuan University of Science and Engineering, Yibin, China
- Key Laboratory of Brewing Biotechnology and Application of Sichuan Province, Yibin, China
| | - Lipeng Han
- College of Mechanical Engineering, Sichuan University of Science and Engineering, Yibin, China
| | - Dan Huang
- College of Bioengineering, Sichuan University of Science and Engineering, Yibin, China
- Sichuan Engineering Technology Research Center for Liquor-Making Grains, Yibin, China
| | - Huibo Luo
- College of Bioengineering, Sichuan University of Science and Engineering, Yibin, China
- Sichuan Engineering Technology Research Center for Liquor-Making Grains, Yibin, China
| |
Collapse
|
35
|
Guo D, He W, Wei L, Song Y, Qi J, Yao Y, Chen X, Huang J, Lu Y, Zhu X. The Zhu-Lu formula: a machine learning-based intraocular lens power calculation formula for highly myopic eyes. EYE AND VISION (LONDON, ENGLAND) 2023; 10:26. [PMID: 37259154 DOI: 10.1186/s40662-023-00342-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 04/12/2023] [Indexed: 06/02/2023]
Abstract
BACKGROUND To develop a novel machine learning-based intraocular lens (IOL) power calculation formula for highly myopic eyes. METHODS A total of 1828 eyes (from 1828 highly myopic patients) undergoing cataract surgery in our hospital were used as the internal dataset, and 151 eyes from 151 highly myopic patients from two other hospitals were used as external test dataset. The Zhu-Lu formula was developed based on the eXtreme Gradient Boosting and the support vector regression algorithms. Its accuracy was compared in the internal and external test datasets with the Barrett Universal II (BUII), Emmetropia Verifying Optical (EVO) 2.0, Kane, Pearl-DGS and Radial Basis Function (RBF) 3.0 formulas. RESULTS In the internal test dataset, the Zhu-Lu, RBF 3.0 and BUII ranked top three from low to high taking into account standard deviations (SDs) of prediction errors (PEs). The Zhu-Lu and RBF 3.0 showed significantly lower median absolute errors (MedAEs) than the other formulas (all P < 0.05). In the external test dataset, the Zhu-Lu, Kane and EVO 2.0 ranked top three from low to high considering SDs of PEs. The Zhu-Lu formula showed a comparable MedAE with BUII and EVO 2.0 but significantly lower than Kane, Pearl-DGS and RBF 3.0 (all P < 0.05). The Zhu-Lu formula ranked first regarding the percentages of eyes within ± 0.50 D of the PE in both test datasets (internal: 80.61%; external: 72.85%). In the axial length subgroup analysis, the PE of the Zhu-Lu stayed stably close to zero in all subgroups. CONCLUSIONS The novel IOL power calculation formula for highly myopic eyes demonstrated improved and stable predictive accuracy compared with other artificial intelligence-based formulas.
Collapse
Affiliation(s)
- Dongling Guo
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
- NHC Key Laboratory of Myopia, Fudan University, Shanghai, China
- Key Laboratory of Myopia, Chinese Academy of Medical Science, Shanghai, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, Shanghai, China
| | - Wenwen He
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
- NHC Key Laboratory of Myopia, Fudan University, Shanghai, China
- Key Laboratory of Myopia, Chinese Academy of Medical Science, Shanghai, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, Shanghai, China
| | - Ling Wei
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
- NHC Key Laboratory of Myopia, Fudan University, Shanghai, China
- Key Laboratory of Myopia, Chinese Academy of Medical Science, Shanghai, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, Shanghai, China
| | - Yunxiao Song
- University of Illinois at Urbana-Champaign, Illinois, USA
| | - Jiao Qi
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
- NHC Key Laboratory of Myopia, Fudan University, Shanghai, China
- Key Laboratory of Myopia, Chinese Academy of Medical Science, Shanghai, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, Shanghai, China
| | - Yunqian Yao
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
- NHC Key Laboratory of Myopia, Fudan University, Shanghai, China
- Key Laboratory of Myopia, Chinese Academy of Medical Science, Shanghai, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, Shanghai, China
| | - Xu Chen
- Shanghai Aier Eye Hospital, Shanghai, China
| | - Jinhai Huang
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China
- Eye Hospital of Wenzhou Medical University, Wenzhou, China
| | - Yi Lu
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China.
- NHC Key Laboratory of Myopia, Fudan University, Shanghai, China.
- Key Laboratory of Myopia, Chinese Academy of Medical Science, Shanghai, China.
- Shanghai Key Laboratory of Visual Impairment and Restoration, Shanghai, China.
| | - Xiangjia Zhu
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, 200031, China.
- NHC Key Laboratory of Myopia, Fudan University, Shanghai, China.
- Key Laboratory of Myopia, Chinese Academy of Medical Science, Shanghai, China.
- Shanghai Key Laboratory of Visual Impairment and Restoration, Shanghai, China.
| |
Collapse
|
36
|
Kavaliauskas A, Žydelis R, Castaldi F, Auškalnienė O, Povilaitis V. Predicting Maize Theoretical Methane Yield in Combination with Ground and UAV Remote Data Using Machine Learning. PLANTS (BASEL, SWITZERLAND) 2023; 12:plants12091823. [PMID: 37176880 PMCID: PMC10181051 DOI: 10.3390/plants12091823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 04/26/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023]
Abstract
The accurate, timely, and non-destructive estimation of maize total-above ground biomass (TAB) and theoretical biochemical methane potential (TBMP) under different phenological stages is a substantial part of agricultural remote sensing. The assimilation of UAV and machine learning (ML) data may be successfully applied in predicting maize TAB and TBMP; however, in the Nordic-Baltic region, these technologies are not fully exploited. Therefore, in this study, during the maize growing period, we tracked unmanned aerial vehicle (UAV) based multispectral bands (blue, red, green, red edge, and infrared) at the main phenological stages. In the next step, we calculated UAV-based vegetation indices, which were combined with field measurements and different ML models, including generalized linear, random forest, as well as support vector machines. The results showed that the best ML predictions were obtained during the maize blister (R2)-Dough (R4) growth period when the prediction models managed to explain 88-95% of TAB and 88-97% TBMP variation. However, for the practical usage of farmers, the earliest suitable timing for adequate TAB and TBMP prediction in the Nordic-Baltic area is stage V7-V10. We conclude that UAV techniques in combination with ML models were successfully applied for maize TAB and TBMP estimation, but similar research should be continued for further improvements.
Collapse
Affiliation(s)
- Ardas Kavaliauskas
- Institute of Agriculture, Lithuanian Research Centre for Agriculture and Forestry, Instituto Ave. 1, 58344 Akademija, Lithuania
| | - Renaldas Žydelis
- Institute of Agriculture, Lithuanian Research Centre for Agriculture and Forestry, Instituto Ave. 1, 58344 Akademija, Lithuania
| | - Fabio Castaldi
- Institute of BioEconomy, National Research Council of Italy (CNR), Via Giovanni Caproni 8, 50145 Firenze, Italy
| | - Ona Auškalnienė
- Institute of Agriculture, Lithuanian Research Centre for Agriculture and Forestry, Instituto Ave. 1, 58344 Akademija, Lithuania
| | - Virmantas Povilaitis
- Institute of Agriculture, Lithuanian Research Centre for Agriculture and Forestry, Instituto Ave. 1, 58344 Akademija, Lithuania
| |
Collapse
|
37
|
Lu CH, Li BQ, Jing Q, Pei D, Huang XY. A classification and identification model of extra virgin olive oil adulterated with other edible oils based on pigment compositions and support vector machine. Food Chem 2023; 420:136161. [PMID: 37080110 DOI: 10.1016/j.foodchem.2023.136161] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 04/04/2023] [Accepted: 04/11/2023] [Indexed: 04/22/2023]
Abstract
Adulteration identification of extra virgin olive oil (EVOO) is a vital issue in the olive oil industry. In this study, chromatographic fingerprint data of pigments combined with machine learning methodologies were successfully identified and classified EVOO, refined-pomace olive oil (R-POO), rapeseed oil (RO), soybean oil (SO), peanut oil (PO), sunflower oil (SFO), flaxseed oil (FO), corn oil (CO), extra virgin olive oil adulterated with rapeseed oil (EVOO-RO) and extra virgin olive oil adulterated with corn oil (EVOO-CO). Support vector machine (SVM) classification of EVOO, other edible oils, and EVOO adulteration identification achieved 100% accuracy for the training set sample and 94.44% accuracy for the test set sample. As a result, this SVM model could identify effectively the adulteration EVOO with the limit of 1% RO and 1% CO. Therefore, the excellent classification and predictive power of this model indicated pigments could be used as potential markers for identifying EVOO adulteration.
Collapse
Affiliation(s)
- Cong-Hui Lu
- CAS Key Laboratory of Chemistry of Northwestern Plant Resources and Key Laboratory of Natural Medicine of Gansu Province, Lanzhou Institute of Chemical Physics, Chinese Academy of Sciences, Lanzhou 730000, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bao-Qiong Li
- School of Biotechnology and Health Sciences, Wuyi University, Jiangmen 529020, China.
| | - Quan Jing
- CAS Key Laboratory of Chemistry of Northwestern Plant Resources and Key Laboratory of Natural Medicine of Gansu Province, Lanzhou Institute of Chemical Physics, Chinese Academy of Sciences, Lanzhou 730000, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dong Pei
- CAS Key Laboratory of Chemistry of Northwestern Plant Resources and Key Laboratory of Natural Medicine of Gansu Province, Lanzhou Institute of Chemical Physics, Chinese Academy of Sciences, Lanzhou 730000, China; Yunnan Olive Health Industry Innovation Research and Development Co., Ltd, Lijiang 674100, China.
| | - Xin-Yi Huang
- CAS Key Laboratory of Chemistry of Northwestern Plant Resources and Key Laboratory of Natural Medicine of Gansu Province, Lanzhou Institute of Chemical Physics, Chinese Academy of Sciences, Lanzhou 730000, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
38
|
Miao J, Chen Z, Zhang Z, Wang Z, Wang Q, Zhang Z, Pan Y. A web tool for the global identification of pig breeds. Genet Sel Evol 2023; 55:18. [PMID: 36944938 PMCID: PMC10029154 DOI: 10.1186/s12711-023-00788-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 02/14/2023] [Indexed: 03/23/2023] Open
Abstract
BACKGROUND Natural and artificial selection for more than 9000 years have led to a variety of domestic pig breeds. Accurate identification of pig breeds is important for breed conservation, sustainable breeding, pork traceability, and local resource registration. RESULTS We evaluated the performance of four selectors and six classifiers for breed identification using a wide range of pig breeds (N = 91). The internal cross-validation and external independent testing showed that partial least squares regression (PLSR) was the most effective selector and partial least squares-discriminant analysis (PLS-DA) was the most powerful classifier for breed identification among many breeds. Five-fold cross-validation indicated that using PLSR as the selector and PLS-DA as the classifier to discriminate 91 pig breeds yielded 98.4% accuracy with only 3K single nucleotide polymorphisms (SNPs). We also constructed a reference dataset with 124 pig breeds and used it to develop the web tool iDIGs ( http://alphaindex.zju.edu.cn/iDIGs_en/ ) as a comprehensive application for global pig breed identification. iDIGs allows users to (1) identify pig breeds without a reference population and (2) design small panels to discriminate several specific pig breeds. CONCLUSIONS In this study, we proved that breed identification among a wide range of pig breeds is feasible and we developed a web tool for such pig breed identification.
Collapse
Affiliation(s)
- Jian Miao
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Zitao Chen
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Zhenyang Zhang
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Zhen Wang
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Qishan Wang
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
- Hainan Institute of Zhejiang University, Building 11, Yongyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya, 572025, Hainan, China
| | - Zhe Zhang
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| | - Yuchun Pan
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
- Hainan Institute of Zhejiang University, Building 11, Yongyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya, 572025, Hainan, China.
| |
Collapse
|
39
|
Lakhouit A, Shaban M, Alatawi A, Abbas SYH, Asiri E, Al Juhni T, Elsawy M. Machine-learning approaches in geo-environmental engineering: Exploring smart solid waste management. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 330:117174. [PMID: 36586367 DOI: 10.1016/j.jenvman.2022.117174] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 12/19/2022] [Accepted: 12/28/2022] [Indexed: 06/17/2023]
Abstract
Over the past few decades, increased attention has been paid to domestic waste (DW) generation. DW comprises a large percentage of municipal solid waste (MSW), and its handling and processing involves serious technical issues while also consuming a major portion of municipal budgets. The accurate estimation, prediction, and characterization of DW is an ongoing challenge for many cities, municipalities, and local governments as they strive to implement sustainable strategies for MSW. The main objective of the present study is to estimate and correctly predict DW quantities using machine-learning (ML) algorithms. Several different ML algorithms are used in the research, including linear regression, regression trees, Gaussian process regression, support vector machine, and autoregressive integrated moving average methods for time series analysis. Two case studies are presented in this paper. In the first, domestic waste data covering the period from 2010 to 2021 were collected from the Saudi and Bahrain authorities, and in the second, the domestic waste-generating behavior of a family of eleven members was followed for one month. The results show that the biodegradable and non-biodegradable wastes generated by the family were in the range of 1.7-7.9 kg and 0.0-2.0 kg, respectively, and promising outcomes were obtained using an appropriate selection of input predictors in conjunction with time series analysis. The trained models are validated and tested using several types of evaluation metrics, including calculated residuals, mean square error, root mean square error, and coefficient determination (R2-Score). The latter values are in the range of 0.67-0.85 for the training and testing datasets for many of the predicted waste quantities. The results obtained from the study show that these algorithms can be used to reduce the environmental, economic, and societal impacts of waste by designing a smart waste management engineering system.
Collapse
Affiliation(s)
- Abderrahim Lakhouit
- Department of Civil Engineering, Faculty of Engineering, University of Tabuk, Tabuk 71421, Saudi Arabia.
| | - Mahmoud Shaban
- Department of Electrical Engineering, Faculty of Engineering, Aswan University, Aswan 81542, Egypt; Department of Electrical Engineering, College of Engineering, Qassim University, Unaizah 56452, Saudi Arabia
| | - Aishah Alatawi
- Department of Biology, Faculty of Science, University of Tabuk, Tabuk 71421, Saudi Arabia
| | - Sumaya Y H Abbas
- Department of Natural Resources and Environment College of Graduate Studies Arabian Gulf University, Bahrain
| | - Emad Asiri
- Department of Civil Engineering, Faculty of Engineering, University of Tabuk, Tabuk 71421, Saudi Arabia
| | - Tareq Al Juhni
- Department of Civil Engineering, Faculty of Engineering, University of Tabuk, Tabuk 71421, Saudi Arabia
| | - Mohamed Elsawy
- Department of Civil Engineering, Faculty of Engineering, University of Tabuk, Tabuk 71421, Saudi Arabia; Geotechnical and Foundations Engineering, Department of Civil Engineering, Faculty of Engineering, Aswan University, 81542, Egypt
| |
Collapse
|
40
|
Jafari SM, Nikoo MR, Sadegh M, Chen M, Gandomi AH. Non-parametric severity-duration-frequency analysis of drought based on satellite-based product and model fusion techniques. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:42087-42107. [PMID: 36645590 DOI: 10.1007/s11356-023-25235-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 01/06/2023] [Indexed: 06/17/2023]
Abstract
Climate change has increased the severity and frequency of droughts over the last decades. To alleviate the adverse impacts of droughts, an effective planning and management framework requires high-resolution spatiotemporal data. TRMM multi-satellite precipitation analysis (TMPA) dataset provides sufficient accuracy with fine spatio-temporal resolution. However, it only covers a short temporal span, which limits its applicability for drought studies. This paper presents a methodology for efficient and accurate temporal extension of TMPA using four artificial intelligence (AI)-based models. To improve AI-based model precipitation estimations, fusion techniques including Orness, Orlike, and genetic algorithm (GA)-based weighting methods were employed. Results show that fusion approaches provide more accurate estimates of precipitation. Different timescales of n-SPI time series and drought spatial maps were prepared to visually evaluate the performance of long-term TMPA (LT-TMPA) alongside statistical error indices. The results confirm that this dataset is effective for meteorological drought monitoring over southern Iran. Finally, drought risk assessment was carried out to determine the spatiotemporal characteristics of droughts through severity-duration-frequency (SDF) contour maps. In contrast to the traditional SDF curves, SDF contour maps provide a superior understanding of drought for policymakers since they preserve spatial information.
Collapse
Affiliation(s)
- Seyedeh Mahboobeh Jafari
- School of Environment, Department of Natural Resources and Environmental Studies, University of Northern British Columbia, Prince George, British Columbia, Canada
| | - Mohammad Reza Nikoo
- Department of Civil and Architectural Engineering, Sultan Qaboos University, Muscat, Oman.
| | - Mojtaba Sadegh
- Department of Civil Engineering, Boise State University, Boise, ID, USA
| | - Mingjie Chen
- Water Research Center, Sultan Qaboos University, Muscat, Oman
| | - Amir H Gandomi
- Department of Engineering and IT, University of Technology Sydney, Ultimo, NSW, 2007, Australia
- University Research and Innovation Center (EKIK), Óbuda University, 1034, Budapest, Hungary
| |
Collapse
|
41
|
Chen X, Lin S, Zheng Y, He L, Fang Y. Long-term trajectories of depressive symptoms and machine learning techniques for fall prediction in older adults:Evidence from the China Health and Retirement Longitudinal Study (CHARLS). Arch Gerontol Geriatr 2023; 111:105012. [PMID: 37030148 DOI: 10.1016/j.archger.2023.105012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 03/27/2023] [Accepted: 03/29/2023] [Indexed: 04/01/2023]
Abstract
BACKGROUND Falls are the most common adverse outcome of depression in older adults, yet a accurate risk prediction model for falls stratified by distinct long-term trajectories of depressive symptoms is still lacking. METHODS We collected the data of 1617 participants from the China Health and Retirement Longitudinal Study register, spanning between 2011 and 2018. The 36 input variables included in the baseline survey were regarded as candidate features. The trajectories of depressive symptoms were classified by the latent class growth model and growth mixture model. Three data balancing technologies and four machine learning algorithms were utilized to develop predictive models for fall classification of depressive prognosis. RESULTS Depressive symptom trajectories were divided into four categories, i.e., non-symptoms, new-onset increasing symptoms, slowly decreasing symptoms, and persistent high symptoms. The random forest-TomekLinks model achieved the best performance among the case and incident models with an AUC-ROC of 0.844 and 0.731, respectively. In the chronic model, the gradient boosting decision tree-synthetic minority oversampling technique obtained an AUC-ROC of 0.783. In the three models, the depressive symptom score was the most crucial component. The lung function was a common and significant feature in both the case and the chronic models. CONCLUSIONS This study suggests that the ideal model has a good chance of identifying older persons with a high risk of falling stratified by long-term trajectories of depressive symptoms. Baseline depressive symptom score, lung function, income, and injury experience are influential factors associated with falls of depression evolution.
Collapse
|
42
|
Olatunji SO, Alsheikh N, Alnajrani L, Alanazy A, Almusairii M, Alshammasi S, Alansari A, Zaghdoud R, Alahmadi A, Basheer Ahmed MI, Ahmed MS, Alhiyafi J. Comprehensible Machine-Learning-Based Models for the Pre-Emptive Diagnosis of Multiple Sclerosis Using Clinical Data: A Retrospective Study in the Eastern Province of Saudi Arabia. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:4261. [PMID: 36901273 PMCID: PMC10002108 DOI: 10.3390/ijerph20054261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 02/22/2023] [Accepted: 02/24/2023] [Indexed: 06/18/2023]
Abstract
Multiple Sclerosis (MS) is characterized by chronic deterioration of the nervous system, mainly the brain and the spinal cord. An individual with MS develops the condition when the immune system begins attacking nerve fibers and the myelin sheathing that covers them, affecting the communication between the brain and the rest of the body and eventually causing permanent damage to the nerve. Patients with MS (pwMS) might experience different symptoms depending on which nerve was damaged and how much damage it has sustained. Currently, there is no cure for MS; however, there are clinical guidelines that help control the disease and its accompanying symptoms. Additionally, no specific laboratory biomarker can precisely identify the presence of MS, leaving specialists with a differential diagnosis that relies on ruling out other possible diseases with similar symptoms. Since the emergence of Machine Learning (ML) in the healthcare industry, it has become an effective tool for uncovering hidden patterns that aid in diagnosing several ailments. Several studies have been conducted to diagnose MS using ML and Deep Learning (DL) models trained using MRI images, achieving promising results. However, complex and expensive diagnostic tools are needed to collect and examine imaging data. Thus, the intention of this study is to implement a cost-effective, clinical data-driven model that is capable of diagnosing pwMS. The dataset was obtained from King Fahad Specialty Hospital (KFSH) in Dammam, Saudi Arabia. Several ML algorithms were compared, namely Support Vector Machine (SVM), Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), and Extra Trees (ET). The results indicated that the ET model outpaced the rest with an accuracy of 94.74%, recall of 97.26%, and precision of 94.67%.
Collapse
Affiliation(s)
- Sunday O. Olatunji
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Nawal Alsheikh
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Lujain Alnajrani
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Alhatoon Alanazy
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Meshael Almusairii
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Salam Alshammasi
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Aisha Alansari
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Rim Zaghdoud
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Alaa Alahmadi
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Mohammed Imran Basheer Ahmed
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Mohammed Salih Ahmed
- College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Jamal Alhiyafi
- Department of Computer Science, Kettering University, Flint, MI 48504, USA
| |
Collapse
|
43
|
Keshtegar B, Piri J, Asnida Abdullah R, Hasanipanah M, Muayad Sabri Sabri M, Nguyen Le B. Intelligent ground vibration prediction in surface mines using an efficient soft computing method based on field data. Front Public Health 2023; 10:1094771. [PMID: 36817184 PMCID: PMC9929182 DOI: 10.3389/fpubh.2022.1094771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 11/29/2022] [Indexed: 02/04/2023] Open
Abstract
Ground vibration induced by blasting operations is considered one of the most common environmental effects of mining projects. A strong ground vibration can destroy buildings and structures, hence its prediction and minimization are of high importance. The aim of this study is to estimate the ground vibration through a hybrid soft computing (SC) method, called RSM-SVR, which comprises two main regression techniques: the response surface model (RSM) and support vector regression (SVR). The RSM-SVR model applies an RSM in the first calibrating process and an SVR in the second calibrating process to improve the accuracy of the ground vibration predictions. The predicted results of an RSM, which are obtained using the input data of problems, are used as the input dataset for the regression process of an SVR. The effectiveness and agreement of the RSM-SVR model were compared to those of an SVR optimized with the particle swarm optimization (PSO) and genetic algorithm (GA), RSM, and multivariate linear regression (MLR) based on several statistical factors. The findings confirmed that the RSM-SVR model was considerably superior to other models in terms of accuracy. The amounts of coefficient of determination (R 2) were 0.896, 0.807, 0.782, 0.752, 0.711, and 0.664 obtained from the RSM-SVR, PSO-SVR, GA-SVR, MLR, SVR, and RSM models, respectively.
Collapse
Affiliation(s)
- Behrooz Keshtegar
- Department of Civil Engineering, Faculty of Engineering, University of Zabol, Zabol, Iran
| | - Jamshid Piri
- Department of Water Engineering, Faculty of Water and Soil, University of Zabol, Zabol, Iran
| | - Rini Asnida Abdullah
- Department of Geotechnics and Transportation, Faculty of Civil Engineering, Universiti Teknologi Malaysia, Johor Bahru, Malaysia
| | - Mahdi Hasanipanah
- Department of Geotechnics and Transportation, Faculty of Civil Engineering, Universiti Teknologi Malaysia, Johor Bahru, Malaysia,Institute of Research and Development, Duy Tan University, Da Nang, Vietnam,*Correspondence: Mahdi Hasanipanah ✉
| | | | - Binh Nguyen Le
- Institute of Research and Development, Duy Tan University, Da Nang, Vietnam,School of Engineering and Technology, Duy Tan University, Da Nang, Vietnam
| |
Collapse
|
44
|
Guo H, Zhou X, Dong Y, Wang Y, Li S. On the use of machine learning methods to improve the estimation of gross primary productivity of maize field with drip irrigation. Ecol Modell 2023. [DOI: 10.1016/j.ecolmodel.2022.110250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
45
|
Early Prediction in Classification of Cardiovascular Diseases with Machine Learning, Neuro-Fuzzy and Statistical Methods. BIOLOGY 2023; 12:biology12010117. [PMID: 36671809 PMCID: PMC9855428 DOI: 10.3390/biology12010117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 01/06/2023] [Accepted: 01/08/2023] [Indexed: 01/15/2023]
Abstract
Timely and accurate detection of cardiovascular diseases (CVDs) is critically important to minimize the risk of a myocardial infarction. Relations between factors of CVDs are complex, ill-defined and nonlinear, justifying the use of artificial intelligence tools. These tools aid in predicting and classifying CVDs. In this article, we propose a methodology using machine learning (ML) approaches to predict, classify and improve the diagnostic accuracy of CVDs, including support vector regression (SVR), multivariate adaptive regression splines, the M5Tree model and neural networks for the training process. Moreover, adaptive neuro-fuzzy and statistical approaches, nearest neighbor/naive Bayes classifiers and adaptive neuro-fuzzy inference system (ANFIS) are used to predict seventeen CVD risk factors. Mixed-data transformation and classification methods are employed for categorical and continuous variables predicting CVD risk. We compare our hybrid models and existing ML techniques on a CVD real dataset collected from a hospital. A sensitivity analysis is performed to determine the influence and exhibit the essential variables with regard to CVDs, such as the patient's age, cholesterol level and glucose level. Our results report that the proposed methodology outperformed well known statistical and ML approaches, showing their versatility and utility in CVD classification. Our investigation indicates that the prediction accuracy of ANFIS for the training process is 96.56%, followed by SVR with 91.95% prediction accuracy. Our study includes a comprehensive comparison of results obtained for the mentioned methods.
Collapse
|
46
|
Trinklein TJ, Cain CN, Ochoa GS, Schöneich S, Mikaliunaite L, Synovec RE. Recent Advances in GC×GC and Chemometrics to Address Emerging Challenges in Nontargeted Analysis. Anal Chem 2023; 95:264-286. [PMID: 36625122 DOI: 10.1021/acs.analchem.2c04235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Affiliation(s)
- Timothy J Trinklein
- Department of Chemistry, University of Washington, Box 351700, Seattle, Washington 98195-1700, United States
| | - Caitlin N Cain
- Department of Chemistry, University of Washington, Box 351700, Seattle, Washington 98195-1700, United States
| | - Grant S Ochoa
- Department of Chemistry, University of Washington, Box 351700, Seattle, Washington 98195-1700, United States
| | - Sonia Schöneich
- Department of Chemistry, University of Washington, Box 351700, Seattle, Washington 98195-1700, United States
| | - Lina Mikaliunaite
- Department of Chemistry, University of Washington, Box 351700, Seattle, Washington 98195-1700, United States
| | - Robert E Synovec
- Department of Chemistry, University of Washington, Box 351700, Seattle, Washington 98195-1700, United States
| |
Collapse
|
47
|
Li D, Ren X, Su Y. Predicting COVID-19 using lioness optimization algorithm and graph convolution network. Soft comput 2023; 27:5437-5501. [PMID: 36686544 PMCID: PMC9838306 DOI: 10.1007/s00500-022-07778-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/21/2022] [Indexed: 01/11/2023]
Abstract
In this paper, a graph convolution network prediction model based on the lioness optimization algorithm (LsOA-GCN) is proposed to predict the cumulative number of confirmed COVID-19 cases in 17 regions of Hubei Province from March 23 to March 29, 2020, according to the transmission characteristics of COVID-19. On the one hand, Spearman correlation analysis with delay days and LsOA are used to capture the dynamic changes of feature information to obtain the temporal features. On the other hand, the graph convolutional network is used to capture the topological structure of the city network, so as to obtain spatial information and finally realize the prediction task. Then, we evaluate this model through performance evaluation indicators and statistical test methods and compare the results of LsOA-GCN with 10 representative prediction methods in the current epidemic prediction study. The experimental results show that the LsOA-GCN prediction model is significantly better than other prediction methods in all indicators and can successfully capture spatio-temporal information from feature data, thereby achieving accurate prediction of epidemic trends in different regions of Hubei Province.
Collapse
Affiliation(s)
- Dong Li
- College of Economics and Management, Xi’an University of Posts and Telecommunications, Xi’an, 710061 Shaanxi People’s Republic of China
| | - Xiaofei Ren
- College of Economics and Management, Xi’an University of Posts and Telecommunications, Xi’an, 710061 Shaanxi People’s Republic of China
| | - Yunze Su
- College of Economics and Management, Xi’an University of Posts and Telecommunications, Xi’an, 710061 Shaanxi People’s Republic of China
| |
Collapse
|
48
|
Li Y, Huang X, Zhao C, Ding P. A novel remaining useful life prediction method based on multi-support vector regression fusion and adaptive weight updating. ISA TRANSACTIONS 2022; 131:444-459. [PMID: 35581022 DOI: 10.1016/j.isatra.2022.04.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 04/23/2022] [Accepted: 04/23/2022] [Indexed: 06/15/2023]
Abstract
Remaining useful life prediction is of huge significance in preventing equipment malfunctions and reducing maintenance costs. Currently, machine learning algorithms have become hotspots in remaining useful life prediction due to their high flexibility and convenience. However, machine learnings require large amounts of data, and their prediction performance depends heavily on the selection of hyper-parameters. To overcome these shortcomings, a novel remaining useful life prediction method for small sample cases is proposed based on multi-support vector regression fusion. In the offline training phase, the fusion model is established, consisting of multiple support vector regression sub-models To obtain the optimal sub-model parameters, the Bayesian optimization algorithm is applied and an improved optimization target is formulated with various metrics describing regression and prediction performance. In the online prediction phase, an adaptive weight updating algorithm based on dynamic time warping is developed to measure the fitness of each sub-model and determine the corresponding weight value. The C-MAPSS engine dataset is used to test the performance of the proposed method, along with some existing machine learning methods as comparison. The proposed method only requires 30% of the training data sample to achieve high accuracy, with a root mean square error of 14.98, which is superior to other state-of-the-art methods. The results demonstrate the superiority of the proposed method.
Collapse
Affiliation(s)
- Yuxiong Li
- School of Mechanical Engineering and Automation, Northeastern University, Shenyang, 110819, PR China
| | - Xianzhen Huang
- School of Mechanical Engineering and Automation, Northeastern University, Shenyang, 110819, PR China; Key Laboratory of Vibration and Control of Aero-Propulsion Systems Ministry of Education of China, Northeastern University, Shenyang, 110819, PR China.
| | - Chengying Zhao
- School of Mechanical Engineering and Automation, Northeastern University, Shenyang, 110819, PR China
| | - Pengfei Ding
- School of Mechanical Engineering and Automation, Northeastern University, Shenyang, 110819, PR China
| |
Collapse
|
49
|
Gan N, Sun M, Lu C, Li M, Wang Y, Song Y, Ning JM, Zhang ZZ. High-speed identification system for fresh tea leaves based on phenotypic characteristics utilizing an improved genetic algorithm. JOURNAL OF THE SCIENCE OF FOOD AND AGRICULTURE 2022; 102:6858-6867. [PMID: 35654754 DOI: 10.1002/jsfa.12047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 05/27/2022] [Accepted: 06/06/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND High-quality tea requires leaves of similar size and tenderness. The grade of the fresh leaves determines the quality of the tea. The automated classification of fresh tea leaves improves resource utilization and reduces manual picking costs. The present study proposes a method based on an improved genetic algorithm for identifying fresh tea leaves in high-speed parabolic motion using the phenotypic characteristics of the leaves. During parabolic flight, light is transmitted through the tea leaves, and six types of fresh tea leaves can be quickly identified by a camera. RESULTS The influence of combinations of morphology, color, and custom corner-point morphological features on the classification results were investigated, and the necessary dimensionality of the model was tested. After feature selection and combination, the classification performance of the Naive Bayes, k-nearest neighbor, and support vector machine algorithms were compared. The recognition time of Naive Bayes was the shortest, whereas the accuracy of support vector machine had the best classification accuracy at approximately 97%. The support vector machine algorithm with only three feature dimensions (equivalent diameter, circularity, and skeleton endpoints) can meet production requirements with an accuracy rate reaching 92.5%. The proposed algorithm was tested by using the Swedish leaf and Flavia data sets, on which it achieved accuracies of 99.57% and 99.44%, respectively, demonstrating the flexibility and efficiency of the recognition scheme detailed in the present study. CONCLUSION This research provides an efficient tea leaves recognition system that can be applied to production lines to reduce manual picking costs. © 2022 Society of Chemical Industry.
Collapse
Affiliation(s)
- Ning Gan
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, China
| | - Mufang Sun
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, China
| | - Chengye Lu
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, China
| | - Menghui Li
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, China
| | - Yujie Wang
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, China
| | - Yan Song
- College of Engineering, Anhui Agricultural University, Hefei, China
| | - Jing-Ming Ning
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, China
| | - Zheng-Zhu Zhang
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, China
| |
Collapse
|
50
|
Artificial intelligence-based analytics for impacts of COVID-19 and online learning on college students’ mental health. PLoS One 2022; 17:e0276767. [DOI: 10.1371/journal.pone.0276767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 10/13/2022] [Indexed: 11/19/2022] Open
Abstract
COVID-19, the disease caused by the novel coronavirus (SARS-CoV-2), first emerged in Wuhan, China late in December 2019. Not long after, the virus spread worldwide and was declared a pandemic by the World Health Organization in March 2020. This caused many changes around the world and in the United States, including an educational shift towards online learning. In this paper, we seek to understand how the COVID-19 pandemic and the increase in online learning impact college students’ emotional wellbeing. We use several machine learning and statistical models to analyze data collected by the Faculty of Public Administration at the University of Ljubljana, Slovenia in conjunction with an international consortium of universities, other higher education institutions, and students’ associations. Our results indicate that features related to students’ academic life have the largest impact on their emotional wellbeing. Other important factors include students’ satisfaction with their university’s and government’s handling of the pandemic as well as students’ financial security.
Collapse
|