1
|
Zhong G, Xiao Y, Liu B, Zhao L, Kong X. Ordinal Regression With Pinball Loss. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11246-11260. [PMID: 37030787 DOI: 10.1109/tnnls.2023.3258464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Ordinal regression (OR) aims to solve multiclass classification problems with ordinal classes. Support vector OR (SVOR) is a typical OR algorithm and has been extensively used in OR problems. In this article, based on the characteristics of OR problems, we propose a novel pinball loss function and present an SVOR method with pinball loss (pin-SVOR). Pin-SVOR is fundamentally different from traditional SVOR with hinge loss. Traditional SVOR employs the hinge loss function, and the classifier is determined by only a few data points near the class boundary, called support vectors, which may lead to a noise sensitive and re-sampling unstable classifier. Distinctively, pin-SVOR employs the pinball loss function. It attaches an extra penalty to correctly classified data that lies inside the class, such that all the training data is involved in deciding the classifier. The data near the middle of each class has a small penalty, and that near the class boundary has a large penalty. Thus, the training data tend to lie near the middle of each class instead of on the class boundary, which leads to scatter minimization in the middle of each class and noise insensitivity. The experimental results show that pin-SVOR has better classification performance than state-of-the-art OR methods.
Collapse
|
2
|
Bian Z, Zhang J, Chung FL, Wang S. Residual Sketch Learning for a Feature-Importance-Based and Linguistically Interpretable Ensemble Classifier. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10461-10474. [PMID: 37022881 DOI: 10.1109/tnnls.2023.3242049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Motivated by both the commonly used "from wholly coarse to locally fine" cognitive behavior and the recent finding that simple yet interpretable linear regression model should be a basic component of a classifier, a novel hybrid ensemble classifier called hybrid Takagi-Sugeno-Kang fuzzy classifier (H-TSK-FC) and its residual sketch learning (RSL) method are proposed. H-TSK-FC essentially shares the virtues of both deep and wide interpretable fuzzy classifiers and simultaneously has both feature-importance-based and linguistic-based interpretabilities. RSL method is featured as follows: 1) a global linear regression subclassifier on all original features of all training samples is generated quickly by the sparse representation-based linear regression subclassifier training procedure to identify/understand the importance of each feature and partition the output residuals of the incorrectly classified training samples into several residual sketches; 2) by using both the enhanced soft subspace clustering method (ESSC) for the linguistically interpretable antecedents of fuzzy rules and the least learning machine (LLM) for the consequents of fuzzy rules on residual sketches, several interpretable Takagi-Sugeno-Kang (TSK) fuzzy subclassifiers are stacked in parallel through residual sketches and accordingly generated to achieve local refinements; and 3) the final predictions are made to further enhance H-TSK-FC's generalization capability and decide which interpretable prediction route should be used by taking the minimal-distance-based priority for all the constructed subclassifiers. In contrast to existing deep or wide interpretable TSK fuzzy classifiers, benefiting from the use of feature-importance-based interpretability, H-TSK-FC has been experimentally witnessed to have faster running speed and better linguistic interpretability (i.e., fewer rules and/or TSK fuzzy subclassifiers and smaller model complexities) yet keep at least comparable generalization capability.
Collapse
|
3
|
Atto AM. Altruistic Collaborative Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:1954-1964. [PMID: 35771785 DOI: 10.1109/tnnls.2022.3185961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This article proposes a new learning paradigm based on the concept of concordant gradients for ensemble learning strategies. In this paradigm, learners update their weights if and only if the gradients of their cost functions are mutually concordant in a sense given by paper. The objective of the proposed concordant optimization framework is robustness against uncertainties by postponing to a later epoch, the consideration of examples associated with discordant directions during a training phase. Concordance constrained collaboration is shown to be relevant, especially in intricate classification issues where exclusive class labeling involves information bias due to correlated disturbances affecting almost all training examples. The first learning paradigm applies on a gradient descent strategy based on allied agents, subjected to concordance checking before moving forward in training epochs. The second learning paradigm is related to multivariate dense neural matrix fusion, where the fusion operator is itself a learnable neural operator. In addition to these paradigms, this article proposes a new categorical probability transform to enrich the existing collection and propose an alternative scenario for integrating penalized SoftMax information. Finally, this article assesses the relevance of the above contributions with respect to several deep learning frameworks and a collaborative classification involving dependent classes.
Collapse
|
4
|
Bhadra S, Kumar CJ. Enhancing the efficacy of depression detection system using optimal feature selection from EHR. Comput Methods Biomech Biomed Engin 2024; 27:222-236. [PMID: 36820618 DOI: 10.1080/10255842.2023.2181660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 02/13/2023] [Indexed: 02/24/2023]
Abstract
Diagnosing depression at an early stage is crucial and majorly depends on the clinician's skill. The present work aims to develop an automated tool for assisting the diagnostic procedure of depression using multiple machine-learning techniques. The dataset of sample size 4184 used in this study contains biometric and demographic information of individuals with or without depression, accessed from the University of Nice Sophia-Antipolis. The Artificial Neural Network (ANN), Support Vector Machine (SVM), Random Forest (RF) and Extreme Gradient Boosting (XGBoost) are used for classifying the depressed from the control group. To enhance the computational efficiency, various feature selection algorithms like Recursive Feature Elimination (RFE), Mutual Information (MI) and three bio-inspired techniques, viz. Particle Swarm Optimization (PSO), Genetic Algorithm (GA) and Firefly Algorithms (FA) have been incorporated. To enhance the feature selection process further, majority voting is carried out in all possible combinations of three, four and five feature selection techniques. These feature selection techniques bring down the feature set size significantly to a mean of 33 from the actual size of 61 which is a reduction of 45.90%. The classification accuracy of the enhanced model varies between 84.18% and 88.46%, which is a significant improvement in performance as compared to the pre-existing models (83.76-85.89%). The proposed predictive models outperform the pre-existing classification models without feature selection and thereby enhancing both the performance and efficiency of the diagnostic process.
Collapse
Affiliation(s)
- Sweta Bhadra
- Department of Computer Science and Information Technology, Cotton University, Guwahati, India
| | - Chandan Jyoti Kumar
- Department of Computer Science and Information Technology, Cotton University, Guwahati, India
| |
Collapse
|
5
|
Alnashwan R, O'Riordan A, Sorensen H. Multiple-Perspective Data-Driven Analysis of Online Health Communities. Healthcare (Basel) 2023; 11:2723. [PMID: 37893797 PMCID: PMC10606133 DOI: 10.3390/healthcare11202723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 10/09/2023] [Indexed: 10/29/2023] Open
Abstract
The growth of online health communities and socially generated health-related content has the potential to provide considerable value for patients and healthcare providers alike. For example, members of the public can acquire medical knowledge and interact with others online. However, the volume of information-and the consequent 'noise' associated with large data volumes-can create difficulties for users. In this paper, we present a data-driven approach to better understand these data from multiple stakeholder perspectives. We utilise three techniques-sentiment analysis, content analysis, and topic analysis-to analyse user-generated medical content related to Lyme disease. We use a supervised feature-based model to identify sentiments, content analysis to identify concepts that predominate, and latent Dirichlet allocation strategy as an unsupervised generative model to identify topics represented in the discourse. We validate that applying three different analytic methods highlights differing aspects of the information different stakeholders will be interested in based on the goals of different stakeholders, expert opinion, and comparison with patient information leaflets.
Collapse
Affiliation(s)
- Rana Alnashwan
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Adrian O'Riordan
- School of Computer Science and Information Technology, University College Cork, T12 K8AF Cork, Ireland
| | - Humphrey Sorensen
- School of Computer Science and Information Technology, University College Cork, T12 K8AF Cork, Ireland
| |
Collapse
|
6
|
Ghaheri P, Nasiri H, Shateri A, Homafar A. Diagnosis of Parkinson's disease based on voice signals using SHAP and hard voting ensemble method. Comput Methods Biomech Biomed Engin 2023:1-17. [PMID: 37771234 DOI: 10.1080/10255842.2023.2263125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 09/17/2023] [Indexed: 09/30/2023]
Abstract
Parkinson's disease (PD) is the second most common progressive neurological condition after Alzheimer's. The significant number of individuals afflicted with this illness makes it essential to develop a method to diagnose the conditions in their early phases. PD is typically identified from motor symptoms or via other Neuroimaging techniques. Expensive, time-consuming, and unavailable to the general public, these methods are not very accurate. Another issue to be addressed is the black-box nature of machine learning methods that needs interpretation. These issues encourage us to develop a novel technique using Shapley additive explanations (SHAP) and Hard Voting Ensemble Method based on voice signals to diagnose PD more accurately. Another purpose of this study is to interpret the output of the model and determine the most important features in diagnosing PD. The present article uses Pearson Correlation Coefficients to understand the relationship between input features and the output. Input features with high correlation are selected and then classified by the Extreme Gradient Boosting, Light Gradient Boosting Machine, Gradient Boosting, and Bagging. Moreover, the weights in Hard Voting Ensemble Method are determined based on the performance of the mentioned classifiers. At the final stage, it uses SHAP to determine the most important features in PD diagnosis. The effectiveness of the proposed method is validated using 'Parkinson Dataset with Replicated Acoustic Features' from the UCI machine learning repository. It has achieved an accuracy of 85.42%. The findings demonstrate that the proposed method outperformed state-of-the-art approaches and can assist physicians in diagnosing Parkinson's cases.
Collapse
Affiliation(s)
- Paria Ghaheri
- Electrical and Computer Engineering Department, Semnan University, Semnan, Iran
| | - Hamid Nasiri
- Department of Computer Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
| | - Ahmadreza Shateri
- Electrical and Computer Engineering Department, Semnan University, Semnan, Iran
| | - Arman Homafar
- Electrical and Computer Engineering Department, Semnan University, Semnan, Iran
| |
Collapse
|
7
|
Atandoh PH, Lee KH. Statistical clustering of documents via stochastic blockmodels. J Appl Stat 2023; 51:1878-1893. [PMID: 39071253 PMCID: PMC11271127 DOI: 10.1080/02664763.2023.2247617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 07/23/2023] [Indexed: 07/30/2024]
Abstract
As the online market grows rapidly, people are relying more on product review when they purchase the product. Hence, many companies and researchers are interested in analyzing product review which essentially a text data. In the current literature, it is common to use only text analysis tools to analyze text dataset. But in our work, we propose a method that utilizes both text analysis method such as topic modeling and statistical network model to build network among individuals and find interesting communities. We introduce a promising framework that incorporates topic modeling technique to define the edges among the individuals and form a network and uses stochastic blockmodels (SBM) to find the communities. The power of our proposed method is demonstrated in real-world application to Amazon product review dataset.
Collapse
Affiliation(s)
- Paul H. Atandoh
- Department of Mathematics, Mercer University, Macon, GA, USA
| | - Kevin H. Lee
- Department of Statistics, Western Michigan University, Kalamazoo, MI, USA
| |
Collapse
|
8
|
Asif S, Zhao M, Chen X, Zhu Y. BMRI-NET: A Deep Stacked Ensemble Model for Multi-class Brain Tumor Classification from MRI Images. Interdiscip Sci 2023:10.1007/s12539-023-00571-1. [PMID: 37171681 DOI: 10.1007/s12539-023-00571-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Revised: 04/26/2023] [Accepted: 04/27/2023] [Indexed: 05/13/2023]
Abstract
Brain tumors are one of the most dangerous health problems for adults and children in many countries. Any failure in the diagnosis of brain tumors may lead to shortening of human life. Accurate and timely diagnosis of brain tumors provides appropriate treatment to increase the patient's chances of survival. Due to the different characteristics of tumors, one of the challenging problems is the classification of three types of brain tumors. With the advent of deep learning (DL) models, three classes of brain tumor classification have been addressed. However, the accuracy of these methods requires significant improvements in brain image classification. The main goal of this article is to design a new method for classifying the three types of brain tumors with extremely high accuracy. In this paper, we propose a novel deep stacked ensemble model called "BMRI-NET" that can detect brain tumors from MR images with high accuracy and recall. The stacked ensemble proposed in this article adapts three pre-trained models, namely DenseNe201, ResNet152V2, and InceptionResNetV2, to improve the generalization capability. We combine decisions from the three models using the stacking technique to obtain final results that are much more accurate than individual models for detecting brain tumors. The efficacy of the proposed model is evaluated on the Figshare brain MRI dataset of three types of brain tumors consisting of 3064 images. The experimental results clearly highlight the robustness of the proposed BMRI-NET model by achieving an overall classification of 98.69% and an average recall, F1-score and MCC of 98.33%, 98.40, and 97.95%, respectively. The results indicate that the proposed BMRI-NET model is superior to existing methods and can assist healthcare professionals in the diagnosis of brain tumors.
Collapse
Affiliation(s)
- Sohaib Asif
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Ming Zhao
- School of Computer Science and Engineering, Central South University, Changsha, China.
| | - Xuehan Chen
- School of Computer Science and Engineering, Central South University, Changsha, China.
| | - Yusen Zhu
- School of Mathematics, Hunan University, Changsha, China
| |
Collapse
|
9
|
Glory Precious J, Keren Evangeline I, Kirubha SPA. Brain tumour segmentation and survival prognostication using 3D radiomics features and machine learning algorithms. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2023. [DOI: 10.1080/21681163.2023.2189487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
10
|
Patel RK, Kashyap M. Machine learning- based lung disease diagnosis from CT images using Gabor features in Littlewood Paley empirical wavelet transform (LPEWT) and LLE. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2023. [DOI: 10.1080/21681163.2023.2187244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Affiliation(s)
- Rajneesh Kumar Patel
- Department of Electronics & Communication, Maulana Azad National Institute of Technology, Bhopal (M.P.), India
| | - Manish Kashyap
- Department of Electronics & Communication, Maulana Azad National Institute of Technology, Bhopal (M.P.), India
| |
Collapse
|
11
|
Han F, Liao S, Bai S, Wu R, Zhang Y, Hao Y. Integrating model explanations and hybrid priors into deep stacked networks for the "safe zone" prediction of acetabular cup. Acta Radiol 2023; 64:1130-1138. [PMID: 35989615 DOI: 10.1177/02841851221119108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
BACKGROUND Existing state-of-the-art "safe zone" prediction methods are statistics-based methods, image-matching techniques, and machine learning methods. Yet, those methods bring a tension between accuracy and interpretability. PURPOSE To explore the model explanations and estimator consensus for "safe zone" prediction. MATERIAL AND METHODS We collected the pelvic datasets from Orthopaedic Hospital, and a novel acetabular cup detection method is proposed for automatic ROI segmentation. Hybrid priors comprising both specific priors from data and general priors from experts are constructed. Specifically, specific priors are constructed based on the fine-tuned ResNet-101 convolutional neural networks (CNN) model, and general priors are constructed based on expert knowledge. Our method considers the model explanations and dynamic consensus through appending a SHapley Additive exPlanations (SHAP) module and a dynamic estimator stacking. RESULTS The proposed method achieves an accuracy of 99.40% and an area under the curve of 0.9998. Experimental results show that our model achieves superior results to the state-of-the-art conventional ensemble classifiers and deep CNN models. CONCLUSION This new screening model provides a new option for the "safe zone" prediction of acetabular cup.
Collapse
Affiliation(s)
- Fuchang Han
- School of Computer Science and Engineering, 12570Central South University, Changsha, PR China
| | - Shenghui Liao
- School of Computer Science and Engineering, 12570Central South University, Changsha, PR China
| | - Sifan Bai
- School of Computer Science and Engineering, 12570Central South University, Changsha, PR China
| | - Renzhong Wu
- School of Computer Science and Engineering, 12570Central South University, Changsha, PR China
| | - Yingqi Zhang
- Tongji Hospital, School of Medicine, 12476Tongji University, Shanghai, PR China
| | - Yongqiang Hao
- Ninth People's Hospital, 12474Shanghai Jiao Tong University School of Medicine, Shanghai, PR China
| |
Collapse
|
12
|
Bhowate VG, Reddy TH. Spark-based deep classifier framework for imbalanced data classification. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2023. [DOI: 10.1080/21681163.2023.2177821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Affiliation(s)
- Vikas Gajananrao Bhowate
- Information technology, Information Technology St. Vincent Pallotti College of Engineering & Technology Gavsi Manapur, Nagpur, India
| | - T. Hanumantha Reddy
- Computer science & Engineering, Computer Science & Engineering Rao Bahadur Y Mahabaleswarappa College of Engineering (RYMEC), Ballari, India
| |
Collapse
|
13
|
Anish TP, Joe Prathap PM. An efficient and low complex model for optimal RBM features with weighted score-based ensemble multi-disease prediction. Comput Methods Biomech Biomed Engin 2023; 26:350-372. [PMID: 36218238 DOI: 10.1080/10255842.2022.2129969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Multi-disease prediction is regarded as the capacity to simultaneously identify various diseases that are expected to be affected an individual at a certain period. These multiple diseases are seemed to be at various progression levels and need to be detected in the patient at the time of clinical visits. Diverse studies in the literature have included the predictive models for particular diseases yet, it is unable to notice humans with multiple diseases since humans are mostly suffered not only from a single disease but also from multiple diseases. Hence, this article aims to implement a novel multi-disease prediction model using an ensemble learning approach with deep features. The required data for the multi-disease prediction is collected from the standard datasets. Then, the collected data are given into the "Deep Belief Network (DBN)" approach, where the features are obtained from the RBM layers. These RBM features are tuned with the help of Deviation-based Hybrid Grasshopper Barnacles Mating Optimization (D-HGBMO) for improving the prediction performance. The optimized RBM features are considered in the ensemble learning model named Ensemble, in which the multi-disease prediction is performed with "Deep Neural Network (DNN), Extreme Learning Machine (ELM), and Long Short Term Memory." The predicted score from three classifiers is used in the optimized weighted score and thresholding-based final prediction using the same D-HGBMO for determining the accurate multi-disease prediction results. The experimental results show the effective performance of the proposed model by comparing it with the existing classifiers with the help of different quantitative measures.
Collapse
Affiliation(s)
- T P Anish
- Assistant Professor, Department of Computer Science and Engineering, R.M.K. College of Engineering and Technology, Puduvoyal, India
| | - P M Joe Prathap
- Professor, Department of Computer Science and Engineering, R.M.D. Engineering College, Kavaraipettai, India
| |
Collapse
|
14
|
Gautam AK, Bansal A. Email-Based Cyberstalking Detection On Textual Data Using Multi-Model Soft Voting Technique Of Machine Learning Approach. JOURNAL OF COMPUTER INFORMATION SYSTEMS 2023. [DOI: 10.1080/08874417.2022.2155267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
15
|
Ke SW, Tsai CF, Pan YY, Lin WC. Majority re-sampling via sub-class clustering for imbalanced datasets. J EXP THEOR ARTIF IN 2023. [DOI: 10.1080/0952813x.2023.2165715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Affiliation(s)
- Shih-Wen Ke
- Department of Information Management, National Central University, Taoyuan, Taiwan
| | - Chih-Fong Tsai
- Department of Information Management, National Central University, Taoyuan, Taiwan
| | - Yi-Ying Pan
- Department of Information Management, National Central University, Taoyuan, Taiwan
| | - Wei-Chao Lin
- Department of Information Management, Chang Gung University, Taoyuan, Taiwan
- Department of Thoracic Surgery, Chang Gung Memorial Hospital at Linkou, Taoyuan, Taiwan
| |
Collapse
|
16
|
Huang HN, Chen HM, Lin WW, Huang CJ, Chen YC, Wang YH, Yang CT. Employing feature engineering strategies to improve the performance of machine learning algorithms on echocardiogram dataset. Digit Health 2023; 9:20552076231207589. [PMID: 37915794 PMCID: PMC10617266 DOI: 10.1177/20552076231207589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Accepted: 09/28/2023] [Indexed: 11/03/2023] Open
Abstract
Objectives This study mainly uses machine learning (ML) to make predictions by inputting features during training and inference. The method of feature selection is an important factor affecting the accuracy of ML models, and the process includes data extraction, which is the collection of all data required for ML. It also needs to import the concept of feature engineering, namely, this study needs to label the raw data of the cardiac ultrasound dataset with one or more meaningful and informative labels so that the ML model can learn from it and predict more accurate target values. Therefore, this study will enhance the strategies of feature selection methods from the raw dataset, as well as the issue of data scrubbing. Methods In this study, the ultrasound dataset was cleaned and critical features were selected through data standardization, normalization, and missing features imputation in the field of feature engineering. The aim of data scrubbing was to retain and select critical features of the echocardiogram dataset while making the prediction of the ML algorithm more accurate. Results This paper mainly utilizes commonly used methods in feature engineering and finally selects four important feature values. With the ML algorithms available on the Azure platform, namely, Random Forest and CatBoost, a Voting Ensemble method is used as the training algorithm, and this study also uses visual tools to gain a clearer understanding of the raw data and to improve the accuracy of the predictive model. Conclusion This paper emphasizes feature engineering, specifically on the cleaning and analysis of missing values in the raw dataset of echocardiography and the identification of important critical features in the raw dataset. The Azure platform is used to predict patients with a history of heart disease (individuals who have been under surveillance in the past three years and those who haven't). Through data scrubbing and preprocessing methods in feature engineering, the model can more accurately predict the future occurrence of heart disease in patients.
Collapse
Affiliation(s)
- Huang-Nan Huang
- Department of Applied Mathematics, Tunghai University, Taichung City
| | - Hong-Ming Chen
- Department of Applied Mathematics, Tunghai University, Taichung City
| | - Wei-Wen Lin
- Cardiovascular Center, Taichung Veterans General Hospital, Taichung City
- Department of PostBaccalaureate Medicine, National Chung Hsing University, Taichung
- Department of Life Science, Tunghai University, Taichung City
| | - Chau-Jian Huang
- Department of Information Management, ShuZen junior College of Medicine and Management, Kaohsiung City
| | - Yung-Cheng Chen
- Department of Computer Science, Tunghai University, Taichung City
| | - Yu-Huei Wang
- Cardiovascular Center, Taichung Veterans General Hospital, Taichung City
| | - Chao-Tung Yang
- Department of Computer Science, Tunghai University, Taichung City
- Research Center for Smart Sustainable Circular Economy, Tunghai University, Taichung City
| |
Collapse
|
17
|
Liu D, Liu Z, Zhang J, Yin Y, Xi J, Wang L, Xiong J, Zhang M, Zhao T, Jin J, Hu F, Sun J, Shen J, Shen B. Classification and Prediction of Skyrmion Material Based on Machine Learning. RESEARCH (WASHINGTON, D.C.) 2023; 6:0082. [PMID: 36939441 PMCID: PMC10019916 DOI: 10.34133/research.0082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 02/08/2023] [Indexed: 02/12/2023]
Abstract
The discovery and study of skyrmion materials play an important role in basic frontier physics research and future information technology. The database of 196 materials, including 64 skyrmions, was established and predicted based on machine learning. A variety of intrinsic features are classified to optimize the model, and more than a dozen methods had been used to estimate the existence of skyrmion in magnetic materials, such as support vector machines, k-nearest neighbor, and ensembles of trees. It is found that magnetic materials can be more accurately divided into skyrmion and non-skyrmion classes by using the classification of electronic layer. Note that the rare earths are the key elements affecting the production of skyrmion. The accuracy and reliability of random undersampling bagged trees were 87.5% and 0.89, respectively, which have the potential to build a reliable machine learning model from small data. The existence of skyrmions in LaBaMnO is predicted by the trained model and verified by micromagnetic theory and experiments.
Collapse
Affiliation(s)
- Dan Liu
- Department of Physics, School of Artificial Intelligence,
Beijing Technology and Business University, Beijing 100048, P. R. China
- Address correspondence to:
| | - Zhixin Liu
- Department of Physics, School of Artificial Intelligence,
Beijing Technology and Business University, Beijing 100048, P. R. China
| | - JinE Zhang
- School of Integrated Circuit Science and Engineering,
Beihang University, Beijing 100191, China
| | - Yinong Yin
- Department of Physics, School of Artificial Intelligence,
Beijing Technology and Business University, Beijing 100048, P. R. China
| | - Jianfeng Xi
- Department of Physics, School of Artificial Intelligence,
Beijing Technology and Business University, Beijing 100048, P. R. China
| | - Lichen Wang
- Ningbo Institute of Materials, Technology & Engineering,
Chinese Academy of Sciences, Zhejiang 315201, P. R. China
| | - JieFu Xiong
- Ningbo Institute of Materials, Technology & Engineering,
Chinese Academy of Sciences, Zhejiang 315201, P. R. China
| | - Ming Zhang
- School of Physics,
Inner Mongolia University of Science and Technology, Baotou 014010, P. R. China
| | - Tongyun Zhao
- State Key Laboratory of Magnetism, Institute of Physics,
Chinese Academy of Sciences, Beijing 100190, P. R. China
| | - Jiaying Jin
- School of Materials Science and Engineering,
Zhejiang University, Hangzhou 310027, P. R. China
| | - Fengxia Hu
- State Key Laboratory of Magnetism, Institute of Physics,
Chinese Academy of Sciences, Beijing 100190, P. R. China
| | - Jirong Sun
- State Key Laboratory of Magnetism, Institute of Physics,
Chinese Academy of Sciences, Beijing 100190, P. R. China
| | - Jun Shen
- Key Laboratory of Cryogenics, Technical Institute of Physics and Chemistry,
Chinese Academy of Sciences, Beijing 100190, P. R. China
| | - Baogen Shen
- Ningbo Institute of Materials, Technology & Engineering,
Chinese Academy of Sciences, Zhejiang 315201, P. R. China
- State Key Laboratory of Magnetism, Institute of Physics,
Chinese Academy of Sciences, Beijing 100190, P. R. China
| |
Collapse
|
18
|
Zhou H, Zhang PY, Zou X, Liu J, Wang WJ. Chronic disease diagnosis model based on convolutional neural network and ensemble learning method. Digit Health 2023; 9:20552076231198643. [PMID: 37667686 PMCID: PMC10475259 DOI: 10.1177/20552076231198643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 08/15/2023] [Indexed: 09/06/2023] Open
Abstract
Introduction Chronic diseases have become one of the main causes of premature death all around the world in recent years. The diagnosis of chronic diseases is time-consuming and costly. Therefore, timely diagnosis and prediction of chronic diseases are very necessary. Methods In this paper, a new method for chronic disease diagnosis is proposed by combining convolutional neural network (CNN) and ensemble learning. This method utilizes random forest (RF) as the base classifier to improve classification performance and diagnostic accuracy, and then combines AdaBoost to successfully replace the Softmax layer of CNN to generate multiple accurate base classifiers while determining their optimal attributes, achieving high-quality classification and prediction of chronic diseases. Results To verify the effectiveness of the proposed method, real-world Electronic Medical Records dataset (C-EMRs) was used for experimental analysis. The results show that compared with other traditional machine learning methods such as CNN, K-Nearest Neighbor, and RF, the proposed method can effectively improve the accuracy of diagnosis and reduce the occurrence of missed diagnosis and misdiagnosis. Conclusions This study will provide effective information for the diagnosis of chronic diseases, assist doctors in making clinical decisions, develop targeted intervention measures, and reduce the probability of misdiagnosis.
Collapse
Affiliation(s)
- Huan Zhou
- School of Business, Hunan University of Technology, Zhuzhou, Hunan, China
| | - Pei-Ying Zhang
- School of Business, Hunan University of Technology, Zhuzhou, Hunan, China
| | - Xiao Zou
- School of Business, Hunan University of Technology, Zhuzhou, Hunan, China
| | - Jia Liu
- School of Business, Hunan University of Technology, Zhuzhou, Hunan, China
| | - Wen-Jie Wang
- School of Business, Hunan University of Technology, Zhuzhou, Hunan, China
| |
Collapse
|
19
|
Wong JJN, Fadzly N. Development of species recognition models using Google teachable machine on shorebirds and waterbirds. JOURNAL OF TAIBAH UNIVERSITY FOR SCIENCE 2022. [DOI: 10.1080/16583655.2022.2143627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Affiliation(s)
- Jenny Jenn Ney Wong
- School of Biological Sciences, Universiti Sains Malaysia, Penang, Malaysia
- Department of Evolution, Ecology and Organismal Biology, University of California, Riverside, CA, USA
| | - Nik Fadzly
- School of Biological Sciences, Universiti Sains Malaysia, Penang, Malaysia
| |
Collapse
|
20
|
Subramanian AAV, Venugopal JP. A deep ensemble network model for classifying and predicting breast cancer. Comput Intell 2022. [DOI: 10.1111/coin.12563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
21
|
Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery. PLoS One 2022; 17:e0276607. [DOI: 10.1371/journal.pone.0276607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 10/11/2022] [Indexed: 11/11/2022] Open
Abstract
High throughput technologies in genomics enable the analysis of small alterations in gene expression levels. Patterns of such deviations are an important starting point for the discovery and verification of new biomarker candidates. Identifying such patterns is a challenging task that requires sophisticated machine learning approaches. Currently, there are a variety of classification models, and a common approach is to compare the performance and select the best one for a given classification problem. Since the association between the features of a data set and the performance of a particular classification method is still not fully understood, the main contribution of this work is to provide a new methodology for predicting the prediction results of different classifiers in the field of biomarker discovery. We propose here a three-steps computational workflow that includes an analysis of the data set characteristics, the calculation of the classification accuracy and, finally, the prediction of the resulting classification error. The experiments were carried out on synthetic and microarray datasets. Using this method, we showed that the predictability strongly depends on the discriminatory ability of the features, e.g., sets of genes, in two or multi-class datasets. If a dataset has a certain discriminatory ability, this method enables prediction of the classification performance before applying a learning model. Thus, our results contribute to a better understanding of the relationship between dataset characteristics and the corresponding performance of a machine learning method, and suggest the optimal classification method for a given dataset based on its discriminatory ability.
Collapse
|
22
|
Automatic diagnosis of arrhythmia with electrocardiogram using multiple instance learning: From rhythm annotation to heartbeat prediction. Artif Intell Med 2022; 132:102379. [DOI: 10.1016/j.artmed.2022.102379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 06/30/2022] [Accepted: 08/18/2022] [Indexed: 11/22/2022]
|
23
|
Zheng Z, Wang Q, Deng D, Wang Q, Huang W. CG-Recognizer: A biosignal-based continuous gesture recognition system. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
24
|
Abdullayeva FJ. Internet of Things‐based healthcare system on patient demographic data in Health 4.0. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2022. [DOI: 10.1049/cit2.12128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Affiliation(s)
- Fargana J. Abdullayeva
- Institute of Information Technology Azerbaijan National Academy of Sciences Baku Azerbaijan
| |
Collapse
|
25
|
Apinaya Prethi K, Sangeetha M. A multi-objective optimization of resource management and minimum batch VM migration for prioritized task allocation in fog-edge-cloud computing. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-213520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Network resources and traffic priorities can be utilized to distribute requested tasks across edge nodes at the edge layer. However, due to the variety of tasks, the edge nodes have an impact on data accessibility. Resource management approaches based on Virtual Machine (VM) migration, job prioritization, and other methods were used to overcome this problem. A Minimized Upgrading Batch VM Scheduling (MSBP) has recently been developed, which reduces the number of batches required to complete a system-scale upgrade and assigns bandwidth to VM migration matrices. However, due to poor resource sharing caused by suboptimal VM utilization, the MSBP was unable to effectively ensure the global best solutions. In order to distribute resources and schedule tasks optimally during VM migration, this paper proposes the MSBP with Multi-objective Optimization of Resource Allocation (MORA) method. The major goal of this proposed methodology is to take into account different objectives and solve the Pareto-front problem to enhance lifetime of the fog-edge network. First, it formulates an NP-hard challenge for MSBP by taking into account a variety of factors such as network sustainability, path contention, network delay, and cost-efficiency. The Multi-objective Krill Herd optimization (MoKH) algorithm is then used to address the NP-hard issue using the Pareto optimality rule and produce the best solution. First, it introduces an NP-hard challenge for MSBP by accounting in network sustainability, path contention, network latency, and cost-efficiency. The Pareto optimality rule is then implemented to overcome the NP-hard problem and provide the optimum solution employing the Multi-objective Krill Herd optimization (MoKH) algorithm. This increases network lifetime and improves resource allocation cost efficiency. Finally, the simulation results show that the MSBP-MORA distributes resources more efficiently and hence increases network lifetime when compared to other traditional algorithms.
Collapse
|
26
|
Wang Y, Su J, Zhao X. Interpretability of SurvivalBoost upon Shapley Additive Explanation value on medical data. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2022.2094962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Affiliation(s)
- Yating Wang
- School of Mathematics and Statistics, Center for Data Science, Lanzhou University, Lanzhou, P.R. China
| | - Jinxia Su
- School of Mathematics and Statistics, Center for Data Science, Lanzhou University, Lanzhou, P.R. China
| | - Xuejing Zhao
- School of Mathematics and Statistics, Center for Data Science, Lanzhou University, Lanzhou, P.R. China
| |
Collapse
|
27
|
Famitha S, Moorthi M. Intelligent and novel multi-type cancer prediction model using optimized ensemble learning. Comput Methods Biomech Biomed Engin 2022; 25:1879-1903. [PMID: 35695463 DOI: 10.1080/10255842.2022.2081504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Cancer is known to be highly severe disease and gets incurable even when the treatment has started at the time of diagnosis owing to the occurrence of cancer cells. Diverse machine learning approaches are implemented for predicting the cancer recurrence that needs to be evaluated for showing the appropriate approach for cancer prediction. This paper provides intelligent optimized ensemble learning for predicting multiple types of cancers. At first, the different types of cancer data are collected and performed the data cleansing. Then, the feature extraction is done using statistical features, 'Linear Discriminant Analysis (LDA), and Principal Component Analysis (PCA)'. With these features, a new Adaptive Condition Searched-Harris hawks Whale Optimization (ACS-HWO) is used for selecting the optimal features and transformed into weighted features with meta-heuristic update. The prediction is carried out by Optimized Ensemble-based Multi-disease Detection (OEMD) with Support Vector Machine (SVM), Autoencoder, Adaboost, 'Deep Neural Network (DNN), and Recurrent Neural Network (RNN)' with high ranking strategy. The same ACS-HWO is used for improvising the weighted feature selection and optimized ensemble learning. The comparative analysis over existing models shows that the suggested method can be highly applicable for the healthcare system to ensure the consistent prediction with the multi-type of cancers.
Collapse
Affiliation(s)
- S Famitha
- Associate Professor, Computer Science and Engineering, Prathyusha Engineering College, Anna University, Tiruvallur, India
| | - M Moorthi
- Professor & HOD, BME & Medical Electronics, Saveetha Engineering College, Anna University, Chennai India
| |
Collapse
|
28
|
Deepika D, Balaji N. Effective heart disease prediction with Grey-wolf with Firefly algorithm-differential evolution (GF-DE) for feature selection and weighted ANN classification. Comput Methods Biomech Biomed Engin 2022; 25:1409-1427. [PMID: 35652537 DOI: 10.1080/10255842.2022.2078966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
In recent time, heart disease has become common leading to mortality of many individuals. Hence, early and accurate prediction of this disease is vital to reduce death rate and enhance people's lives. Concurrently, Artificial Intelligence has gained more attention at present as it permits deeper understanding of the healthcare data thereby providing accurate prediction results. This efficient prediction will solve complicated queries regarding heart diseases and hence assists clinical practitioners to adopt smart medical decisions. Hence, this study intends to predict heart disease with high accuracy by proposing an improved feature selection and enhanced classification approach. The paper employs Grey-wolf with Firefly algorithm for effective feature selection and using Differential Evolution Algorithm for tuning the hyper parameters of Artificial Neural Network (ANN). Hence, it is named as Grey Wolf Firefly algorithm with Differential Evolution (GF-DE) for better classification of the selected features. This proposed classification model trains the neural network to obtain optimal weights and tunes huge number of hyper parameters in an efficiently. To prove this, the proposed system is comparatively analysed with existing methods in terms of performance metrics like accuracy, precision, recall and F1 score for Cleveland and Statlog dataset. In addition, statistical analysis is also undertaken to analyse the significance of proposed system. Outcomes revealed the efficiency of proposed method which makes it highly suitable for heart disease prediction in an efficient manner.
Collapse
Affiliation(s)
- D Deepika
- Research Scholar, Anna University, Chennai, India
| | - N Balaji
- Professor, Computer Science and Engineering, Velammal Institute of Technology, Chennai, India
| |
Collapse
|
29
|
A Personalized Travel Route Recommendation Model Using Deep Learning in Scenic Spots Intelligent Service Robots. JOURNAL OF ROBOTICS 2022. [DOI: 10.1155/2022/3851506] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This paper proposes a personalized tourist interest demand recommendation model based on deep neural network. Firstly, the basic information data and comment text data of tourism service items are obtained by crawling the relevant website data. Furthermore, word segmentation and word vector transformation are carried out through Jieba word segmentation tool and Skip-gram model, the semantic information between different data is deeply characterized, and the problem of very high vector sparsity is solved. Then, the corresponding features are obtained by using the feature extraction ability of DNN’s in-depth learning. On this basis, the user’s score on tourism service items is predicted through the model until a personalized recommendation list is generated. Finally, through simulation experiments, the recommendation accuracy and average reciprocal ranking of the proposed algorithm model and the other two algorithms in three different databases are compared and analyzed. The results show that the overall performance of the proposed algorithm is better than the other two comparison algorithms.
Collapse
|
30
|
Xu W, Yu K, Ye J, Li H, Chen J, Yin F, Xu J, Zhu J, Li D, Shu Q. Automatic pediatric congenital heart disease classification based on heart sound signal. Artif Intell Med 2022; 126:102257. [DOI: 10.1016/j.artmed.2022.102257] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 12/31/2021] [Accepted: 02/15/2022] [Indexed: 11/02/2022]
|
31
|
Ren Z, Zhang Y, Wang S. LCDAE: Data Augmented Ensemble Framework for Lung Cancer Classification. Technol Cancer Res Treat 2022; 21:15330338221124372. [PMID: 36148908 PMCID: PMC9511553 DOI: 10.1177/15330338221124372] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 07/15/2022] [Accepted: 08/02/2022] [Indexed: 11/15/2022] Open
Abstract
Objective: The only possible solution to increase the patients' fatality rate is lung cancer early-stage detection. Recently, deep learning techniques became the most promising methods in medical image analysis compared with other numerous computer-aided diagnostic techniques. However, deep learning models always get lower performance when the model is overfitting. Methods: We present a Lung Cancer Data Augmented Ensemble (LCDAE) framework to solve the overfitting and lower performance problems in the lung cancer classification tasks. The LCDAE has 3 parts: The Lung Cancer Deep Convolutional GAN, which can synthesize images of lung cancer; A Data Augmented Ensemble model (DA-ENM), which ensembled 6 fine-tuned transfer learning models for training, testing, and validating on a lung cancer dataset; The third part is a Hybrid Data Augmentation (HDA) which combines all the data augmentation techniques in the LCDAE. Results: By comparing with existing state-of-the-art methods, the LCDAE obtains the best accuracy of 99.99%, the precision of 99.99%, and the F1-score of 99.99%. Conclusion: Our proposed LCDAE can overcome the overfitting issue for the lung cancer classification tasks by applying different data augmentation techniques, our method also has the best performance compared to state-of-the-art approaches.
Collapse
Affiliation(s)
- Zeyu Ren
- School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK
| | - Yudong Zhang
- School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK
| | - Shuihua Wang
- School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK
| |
Collapse
|
32
|
Padmakala S, Subasini CA, Karuppiah SP, Sheeba A. ESVM-SWRF: Ensemble SVM-based sample weighted random forests for liver disease classification. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2021; 37:e3525. [PMID: 34431606 DOI: 10.1002/cnm.3525] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 08/23/2021] [Indexed: 06/13/2023]
Abstract
Recently, a significant way to diagnose the disease is using the model of medical data mining. The most challenging task in the healthcare field is to face a large amount of data during disease analyzes and prediction. Once the data are transformed into valuable data by means of data mining models then the actual prediction and decision making is easier. The existing studies met few shortcomings because of higher execution time, more computational complexities, less scalability, slow convergence, and lack of providing the solution. In this article, we have proposed an ensemble SVM-based sample weighted random forests (eSVM-swRF) with novel improved colliding body optimization (NICBO) algorithm to predict liver diseases. The extraction, loading, transformation, and analysis (ELTA) are used to pre-process the patient data. The significant feature with a suitable model is generated depending upon the filter-based method. Based on eSVM-swRF, the parameter values such as penalty parameter (P), threshold (T), and mTry are optimized via a novel improved colliding boding optimization (NICBO) algorithm. The UCI dataset provides liver disease data for this study. The implementation platform of RapidMiner Studio version 7.6 with different evaluation measures is used to validate the performance of eSVM-swRF with the NICBO method. Anyway, the proposed method yields outstanding performance than other existing methods such as Particle Swarm Optimization-based Support Vector Machine (PSO-SVM), fuzzy adaptive, and neighbor weighted k-NN (FuzzyANWKNN), Naïve Bayes-based Support Vector Machine (NB-SVM), and Neural network.
Collapse
Affiliation(s)
- S Padmakala
- Department of CSE, St. Joseph's Institute of Technology, Chennai, Tamil Nadu, India
| | - C A Subasini
- Department of CSE, St. Joseph's Institute of Technology, Chennai, Tamil Nadu, India
| | - S P Karuppiah
- Departmentof MBA, St. Joseph's College of Engineering, Chennai, India
| | - Adlin Sheeba
- Department of CSE, St. Joseph's Institute of Technology, Chennai, Tamil Nadu, India
| |
Collapse
|
33
|
Rajan R, Mohan BSS. Distance Metric Learnt Kernel-Based Music Classification Using Timbral Descriptors. INT J PATTERN RECOGN 2021. [DOI: 10.1142/s0218001421510149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Automatic music genre classification based on distance metric learning (DML) is proposed in this paper. Three types of timbral descriptors, namely, mel-frequency cepstral coefficient (MFCC) features, modified group delay features (MODGDF) and low-level timbral feature sets are combined at the feature level. We experimented with k nearest neighbor (kNN) and support vector machine (SVM)-based classifiers for standard and DML kernels (DMLK) using GTZAN and Folk music dataset. Standard kernel-based kNN and SVM-based classifiers report classification accuracy (in%) of 79.03 and 90.16, respectively, on GTZAN dataset and 86.60 and 92.26, respectively, for Folk music dataset, with the best performing RBF kernel. A further improvement was observed when DML kernels were used in place of standard kernels in the kernel kNN and SVM-based classifiers with an accuracy of 84.46%, 92.74% (GTZAN), 90.00 and 96.23 (Folk music dataset) for DMLK-kNN and DMLK-SVM, respectively. The results demonstrate the potential of DML kernels in music genre classification task.
Collapse
Affiliation(s)
- Rajeev Rajan
- College of Engineering, Trivandrum, Kerala, India
- APJ Abdul Kalam Technological University, Thiruvananthapuram, Kerala, India
| | - B. S. Shajee Mohan
- Government Engineering College, Kozhikode, Kerala, India
- APJ Abdul Kalam Technological University, Thiruvananthapuram, Kerala, India
| |
Collapse
|
34
|
Machine Learning Classification Techniques for Detecting the Impact of Human Resources Outcomes on Commercial Banks Performance. APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING 2021. [DOI: 10.1155/2021/7747907] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The banking industry is a market with great competition and dynamism where organizational performance becomes paramount. Different indicators can be used to measure organizational performance and sustain competitive advantage in a global marketplace. The execution of the performance indicators is usually achieved through human resources, which stand as the core element in sustaining the organization in the highly competitive marketplace. It becomes essential to effectively manage human resources strategically and align its strategies with organizational strategies. We adopted a survey research design using a quantitative approach, distributing a structured questionnaire to 305 respondents utilizing efficient sampling techniques. The prediction of bank performance is very crucial since bad performance can result in serious problems for the bank and society, such as bankruptcy and negative influence on the country’s economy. Most researchers in the past adopted traditional statistics to build prediction models; however, due to the efficiency of machine learning algorithms, a lot of researchers now apply various machine learning algorithms to various fields, including performance prediction systems. In this study, eight different machine learning algorithms were employed to build performance models to predict the prospective performance of commercial banks in Nigeria based on human resources outcomes (employee skills, attitude, and behavior) through the Python software tool with machine learning libraries and packages. The results of the analysis clearly show that human resources outcomes are crucial in achieving organizational performance, and the models built from the eight machine learning classifier algorithms in this study predict the bank performance as superior with the accuracies of 74–81%. The feature importance was computed with the package in Scikit-learn to show comparative importance or contribution of each feature in the prediction, and employee attitude is rated far more than other features. Nigeria’s bank industry should focus more on employee attitude so that the performance can be improved to outstanding class from the current superior class.
Collapse
|
35
|
Abstract
Ad hoc information retrieval (ad hoc IR) is a challenging task consisting of ranking text documents for bag-of-words (BOW) queries. Classic approaches based on query and document text vectors use term-weighting functions to rank the documents. Some of these methods’ limitations consist of their inability to work with polysemic concepts. In addition, these methods introduce fake orthogonalities between semantically related words. To address these limitations, model-based IR approaches based on topics have been explored. Specifically, topic models based on Latent Dirichlet Allocation (LDA) allow building representations of text documents in the latent space of topics, the better modeling of polysemy and avoiding the generation of orthogonal representations between related terms. We extend LDA-based IR strategies using different ensemble strategies. Model selection obeys the ensemble learning paradigm, for which we test two successful approaches widely used in supervised learning. We study Boosting and Bagging techniques for topic models, using each model as a weak IR expert. Then, we merge the ranking lists obtained from each model using a simple but effective top-k list fusion approach. We show that our proposal strengthens the results in precision and recall, outperforming classic IR models and strong baselines based on topic models.
Collapse
|
36
|
Zhao S, Meng J, Luan Y. LncRNA-Encoded Short Peptides Identification Using Feature Subset Recombination and Ensemble Learning. Interdiscip Sci 2021; 14:101-112. [PMID: 34304369 DOI: 10.1007/s12539-021-00464-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/14/2021] [Accepted: 07/16/2021] [Indexed: 11/28/2022]
Abstract
Long non-coding RNA (lncRNA), which is a type of non-coding RNA, was reported to contain short open reading frames (sORFs). SORFs-encoded short peptides (SEPs) have been demonstrated to play a crucial role in regulating the biological processes such as growth, development, and resistance response. The identification of SEPs is vital to further understanding their function. However, there is still a lack of methods for identifying SEPs effectively and rapidly. In this study, a novel method for lncRNA-encoded short peptides identification based on feature subset recombination and ensemble learning, lncPepid, is developed. lncPepid transforms the data of Zea mays and Arabidopsis thaliana into hybrid features from two aspects including sequence composition and physicochemical properties separately. It optimizes hybrid features by proposing a novel weighted iteration-based feature selection method to recombine a stable subset that characterizes SEPs effectively. Different classification models with different optimized features are constructed and tested separately. The outputs of the optimal models are integrated for ensemble classification to improve efficiency. Experimental results manifest that the geometric mean of sensitivity and specificity of lncPepid is about 70% on the identification of functional SEPs derived from multiple species. It is an effective and rapid method for the identification of lncRNA-encoded short peptides. This study can be extended to the research on SEPs from other species and have crucial implications for further findings and studies of functional genomics.
Collapse
Affiliation(s)
- Siyuan Zhao
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, China
| |
Collapse
|
37
|
A Comprehensive Analysis of Supervised Learning Techniques for Electricity Theft Detection. JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING 2021. [DOI: 10.1155/2021/9136206] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
There are many methods or algorithms applicable for detecting electricity theft. However, comparative studies on supervised learning methods for electricity theft detection are still insufficient. In this paper, comparisons based on predictive accuracy, recall, precision, AUC, and F1-score of several supervised learning methods such as decision tree (DT), artificial neural network (ANN), deep artificial neural network (DANN), and AdaBoost are presented and their performances are analyzed. A public dataset from the State Grid Corporation of China (SGCC) was used for this study. The dataset consisted of power consumption in kWh unit. Based on the analysis results, the DANN outperforms compared to other supervised learning classifiers such as ANN, AdaBoost, and DT in recall, F1-Score, and AUC. A future research direction is the experiments can be performed on other supervised learning algorithms with different types of datasets and suitable preprocessing methods can be applied to produce better performance.
Collapse
|
38
|
Radović N, Prelević V, Erceg M, Antunović T. Machine learning approach in mortality rate prediction for hemodialysis patients. Comput Methods Biomech Biomed Engin 2021; 25:111-122. [PMID: 34124977 DOI: 10.1080/10255842.2021.1937611] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Kernel support vector machine algorithm and K-means clustering algorithm are used to determine the expected mortality rate for hemodialysis patients. The national nephrology database of Montenegro has been used to conduct this research. Mortality rate prediction is realized with accuracy up to 94.12% and up to 96.77%, when a complete database is observed and when a reduced database (that contains data for the three most common basic diseases) is observed, respectively. Additionally, it is shown that just a few parameters, most of which are collected during the sole patient examination, are enough for satisfying results.
Collapse
Affiliation(s)
- Nevena Radović
- Electrical Engineering Department, University of Montenegro, Podgorica, Montenegro
| | - Vladimir Prelević
- Clinic for Nephrology, Clinical Center of Montenegro, Podgorica, Montenegro
| | - Milena Erceg
- Electrical Engineering Department, University of Montenegro, Podgorica, Montenegro
| | - Tanja Antunović
- Center for Laboratory Diagnostics, Clinical Center of Montenegro, Podgorica, Montenegro
| |
Collapse
|
39
|
Cauwenberghs N, Sabovčik F, Magnus A, Haddad F, Kuznetsova T. Proteomic profiling for detection of early-stage heart failure in the community. ESC Heart Fail 2021; 8:2928-2939. [PMID: 34050710 PMCID: PMC8318505 DOI: 10.1002/ehf2.13375] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 03/15/2021] [Accepted: 04/08/2021] [Indexed: 12/14/2022] Open
Abstract
Aims Biomarkers may provide insights into molecular mechanisms underlying heart remodelling and dysfunction. Using a targeted proteomic approach, we aimed to identify circulating biomarkers associated with early stages of heart failure. Methods and results A total of 575 community‐based participants (mean age, 57 years; 51.7% women) underwent echocardiography and proteomic profiling (CVD II panel, Olink Proteomics). We applied partial least squares‐discriminant analysis (PLS‐DA) and a machine learning algorithm [eXtreme Gradient Boosting (XGBoost)] to identify key proteins associated with echocardiographic abnormalities. We used Gaussian mixture modelling for unbiased clustering to construct phenogroups based on influential proteins in PLS‐DA and XGBoost. Of 87 proteins, 13 were important in PLS‐DA and XGBoost modelling for detection of left ventricular remodelling, left ventricular diastolic dysfunction, and/or left atrial reservoir dysfunction: placental growth factor, kidney injury molecule‐1, prostasin, angiotensin‐converting enzyme‐2, galectin‐9, cathepsin L1, matrix metalloproteinase‐7, tumour necrosis factor receptor superfamily members 10A, 10B, and 11A, interleukins 6 and 16, and α1‐microglobulin/bikunin precursor. Based on these proteins, the clustering algorithm divided the cohort into two distinct phenogroups, with each cluster grouping individuals with a similar protein profile. Participants belonging to the second cluster (n = 118) were characterized by an unfavourable cardiovascular risk profile and adverse cardiac structure and function. The adjusted risk of presenting echocardiographic abnormalities was higher in this phenogroup than in the other (P < 0.0001). Conclusions We identified proteins related to renal function, extracellular matrix remodelling, angiogenesis, and inflammation to be associated with echocardiographic signs of early‐stage heart failure. Proteomic phenomapping discriminated individuals at high risk for cardiac remodelling and dysfunction.
Collapse
Affiliation(s)
- Nicholas Cauwenberghs
- Research Unit Hypertension and Cardiovascular Epidemiology, KU Leuven Department of Cardiovascular Sciences, University of Leuven, Campus Sint Rafaël, Kapucijnenvoer 7, Box 7001, Leuven, B-3000, Belgium
| | - František Sabovčik
- Research Unit Hypertension and Cardiovascular Epidemiology, KU Leuven Department of Cardiovascular Sciences, University of Leuven, Campus Sint Rafaël, Kapucijnenvoer 7, Box 7001, Leuven, B-3000, Belgium
| | - Alessio Magnus
- Research Unit Hypertension and Cardiovascular Epidemiology, KU Leuven Department of Cardiovascular Sciences, University of Leuven, Campus Sint Rafaël, Kapucijnenvoer 7, Box 7001, Leuven, B-3000, Belgium
| | - Francois Haddad
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Stanford, CA, USA
| | - Tatiana Kuznetsova
- Research Unit Hypertension and Cardiovascular Epidemiology, KU Leuven Department of Cardiovascular Sciences, University of Leuven, Campus Sint Rafaël, Kapucijnenvoer 7, Box 7001, Leuven, B-3000, Belgium
| |
Collapse
|
40
|
Research on Ultrasonic Image Recognition Based on Optimization Immune Algorithm. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:5868949. [PMID: 34055040 PMCID: PMC8149231 DOI: 10.1155/2021/5868949] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Accepted: 04/29/2021] [Indexed: 11/30/2022]
Abstract
With the rapid development of science and technology, ultrasound has been paid more and more attention by people, and it is widely used in engineering, diagnosis, and detection. In this paper, an ultrasonic image recognition method based on immune algorithm is proposed for ultrasonic images, and its method is applied to medical ultrasound liver image recognition. Firstly, this paper grays out the ultrasound liver image and selects the region of interest of the image. Secondly, it extracts the feature based on spatial gray matrix independent matrix, spatial frequency decomposition, and fractal features. Then, the immune algorithm is used to classify and identify the normal liver, liver cirrhosis, and liver cancer ultrasound images. Finally, based on the deficiency of the immune algorithm, it is combined with the support vector machine to form an optimized immune algorithm, which improves the performance of ultrasonic liver image classification and recognition. The simulation shows that this paper can effectively classify the normal liver, liver cirrhosis, and liver cancer ultrasound images. Compared with the traditional immune algorithm, this paper combines the immune algorithm with the support vector machine, and the optimized immune algorithm can effectively improve the performance of ultrasonic liver image classification and recognition.
Collapse
|
41
|
Lei Y, Li Y. A novel scheme of domain transfer in document-level cross-domain sentiment classification. J Inf Sci 2021. [DOI: 10.1177/01655515211012329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The sentiment classification aims to learn sentiment features from the annotated corpus and automatically predict the sentiment polarity of new sentiment text. However, people have different ways of expressing feelings in different domains. Thus, there are important differences in the characteristics of sentimental distribution across different domains. At the same time, in certain specific domains, due to the high cost of corpus collection, there is no annotated corpus available for the classification of sentiment. Therefore, it is necessary to leverage or reuse existing annotated corpus for training. In this article, we proposed a new algorithm for extracting central sentiment sentences in product reviews, and improved the pre-trained language model Bidirectional Encoder Representations from Transformers (BERT) to achieve the domain transfer for cross-domain sentiment classification. We used various pre-training language models to prove the effectiveness of the newly proposed joint algorithm for text-ranking and emotional words extraction, and utilised Amazon product reviews data set to demonstrate the effectiveness of our proposed domain-transfer framework. The experimental results of 12 different cross-domain pairs showed that the new cross-domain classification method was significantly better than several popular cross-domain sentiment classification methods.
Collapse
Affiliation(s)
- Yueting Lei
- China Institute of Quality Research, China; Department of Industrial Engineering, Shanghai Jiao Tong University, China
| | - Yanting Li
- China Institute of Quality Research, China; Department of Industrial Engineering, Shanghai Jiao Tong University, China
| |
Collapse
|
42
|
Luo L, Kou R, Feng Y, Xiang J, Zhu W. Cost-Effective Machine Learning Based Clinical Pre-Test Probability Strategy for DVT Diagnosis in Neurological Intensive Care Unit. Clin Appl Thromb Hemost 2021; 27:10760296211008650. [PMID: 33928796 PMCID: PMC8114755 DOI: 10.1177/10760296211008650] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
In order to overcome the shortage of the current costly DVT diagnosis and reduce the waste of valuable healthcare resources, we proposed a new diagnostic approach based on machine learning pre-test prediction models using EHRs. We examined the sociodemographic and clinical factors in the prediction of DVT with 518 NICU admitted patients, including 189 patients who eventually developed DVT. We used cross-validation on the training data to determine the optimal parameters, and finally, the applied ROC analysis is adopted to evaluate the predictive strength of each model. Two models (GLM and SVM) with the strongest ROC were selected for DVT prediction, based on which, we optimized the current intervention and diagnostic process of DVT and examined the performance of the proposed approach through simulations. The use of machine learning based pre-test prediction models can simplify and improve the intervention and diagnostic process of patients in NICU with suspected DVT, and reduce the valuable healthcare resource occupation/usage and medical costs.
Collapse
Affiliation(s)
- Li Luo
- 533694Business School, Sichuan University, Chengdu, China
| | - Ran Kou
- 533694Business School, Sichuan University, Chengdu, China
| | - Yuquan Feng
- 533694Business School, Sichuan University, Chengdu, China
| | - Jie Xiang
- 533694Business School, Sichuan University, Chengdu, China
| | - Wei Zhu
- 439679West China School of Nursing, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
43
|
Gokten ES, Uyulan C. Prediction of the development of depression and post-traumatic stress disorder in sexually abused children using a random forest classifier. J Affect Disord 2021; 279:256-265. [PMID: 33074145 DOI: 10.1016/j.jad.2020.10.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 09/29/2020] [Accepted: 10/04/2020] [Indexed: 10/23/2022]
Abstract
BACKGROUND Depression and post-traumatic stress disorder (PTSD) are among the most common psychiatric disorders observed in children and adolescents exposed to sexual abuse. OBJECTIVE The present study aimed to investigate the effects of many factors such as the characteristics of a child, abuse, and the abuser, family type of the child, and the role of social support in the development of psychiatric disorders using machine learning techniques. PARTICIPANTS AND SETTINGS The records of 482 children and adolescents who were determined to have been sexually abused were examined to predict the development of depression and PTSD. METHODS Each child was evaluated by a child and adolescent psychiatrist in the psychiatric aspect according to the DSM-V. Through the data of both groups, a predictive model was established based on a random forest classifier. RESULTS The mean values and standard deviation of the 10-k cross-validated results were obtained as accuracy: 0.82% (+/- 0.19%), F1: 0.81% (+/- 0.19%), precision: 0.81% (+/- 0.19%), recall: 0.80% (+/- 0.19%) for children with depression; and accuracy: 0.72% (+/- 0.12%), F1: 0.71% (+/- 0.12%), precision: 0.72% (+/- 0.12%), recall: 0.71% (+/- 0.12%) for children with PTSD, respectively. ROC curves were drawn for both, and the AUC results were obtained as 0.88 for major depressive disorder and 0.76 for PTSD. CONCLUSIONS Machine learning techniques are powerful methods that can be used to predict disorders that may develop after sexual abuse. The results should be supported by studies with larger samples, which are repeated and applied to other risk groups.
Collapse
Affiliation(s)
- Emel Sari Gokten
- Assoc Prof of Child and Adolescent Psychiatry, Uskudar University Medical Faculty, Istanbul, Turkey.
| | - Caglar Uyulan
- Assist Prof of Mechatronics Engineering Department, Zonguldak Bulent Ecevit University Faculty of Engineering, Zonguldak, Turkey.
| |
Collapse
|
44
|
A Semantic Analysis and Community Detection-Based Artificial Intelligence Model for Core Herb Discovery from the Literature: Taking Chronic Glomerulonephritis Treatment as a Case Study. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:1862168. [PMID: 32952598 PMCID: PMC7481937 DOI: 10.1155/2020/1862168] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 07/14/2020] [Accepted: 08/14/2020] [Indexed: 12/22/2022]
Abstract
The Traditional Chinese Medicine (TCM) formula is the main treatment method of TCM. A formula often contains multiple herbs where core herbs play a critical therapeutic effect for treating diseases. It is of great significance to find out the core herbs in formulae for providing evidences and references for the clinical application of Chinese herbs and formulae. In this paper, we propose a core herb discovery model CHDSC based on semantic analysis and community detection to discover the core herbs for treating a certain disease from large-scale literature, which includes three stages: corpus construction, herb network establishment, and core herb discovery. In CHDSC, two artificial intelligence modules are used, where the Chinese word embedding algorithm ESSP2VEC is designed to analyse the semantics of herbs in Chinese literature based on the stroke, structure, and pinyin features of Chinese characters, and the label propagation-based algorithm LILPA is adopted to detect herb communities and core herbs in the herbal semantic network constructed from large-scale literature. To validate the proposed model, we choose chronic glomerulonephritis (CGN) as an example, search 1126 articles about how to treat CGN in TCM from the China National Knowledge Infrastructure (CNKI), and apply CHDSC to analyse the collected literature. Experimental results reveal that CHDSC discovers three major herb communities and eighteen core herbs for treating different CGN syndromes with high accuracy. The community size, degree, and closeness centrality distributions of the herb network are analysed to mine the laws of core herbs. As a result, we can observe that core herbs mainly exist in the communities with more than 25 herbs. The degree and closeness centrality of core herb nodes concentrate on the range of [15, 40] and [0.25, 0.45], respectively. Thus, semantic analysis and community detection are helpful for mining effective core herbs for treating a certain disease from large-scale literature.
Collapse
|