1
|
Fu S, Tian Y, Tang J, Liu X. Cost-sensitive learning with modified Stein loss function. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.01.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
2
|
Evaluation modeling of highway collapse hazard based on rough set and support vector machine. Sci Rep 2022; 12:18723. [PMCID: PMC9636135 DOI: 10.1038/s41598-022-23567-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 11/02/2022] [Indexed: 11/06/2022] Open
Abstract
The prediction of possibility and risk classification of collapse is an important issue in the process of highway construction in mountain area. Based on the principle of rough set and support vector machine, a landslide hazard prediction model was established. First of all, according to field investigation, an evaluation index system and a sample set of evaluation index data were established, the rough set decision table was constructed by preprocessing the original data based on the function classification of standard evaluation index, and then, the influence indexes of the collapse activity were reduced by rough set theory, and the main 9 indexes affecting the collapse activity as the key discriminant factors of support vector machine model, namely slope shape of slope, aspect of slope, slope of slope, height of slope, exposed structural face, stratum lithology, relationship between weakness face and free face, vegetation cover rate and weathering degree of rock were extracted. Then, taking the data of 13 post earthquake collapses in Yingxiu-Wolong highway of Hanchuan County measured by the authors in the field as training samples, the optimal model parameters were analyzed and calculated. When the penalty parameter \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$C$$\end{document}C is 8 and the kernel parameter \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sigma$$\end{document}σ is 0.5, the correct rate of cross-validation is 100%, and the model is optimal. At last, 4 other landslide data were tested, the discriminant results of the test sample data were compared with the results obtained by uncertainty measure and distance discriminant analysis. The results show that the discriminant results of the test sample data by RS-SVM were consistent with the results obtained by uncertainty measure and distance discriminant analysis, the accurate rate is 100%. The collapse hazard analysis model based on rough set and support vector machine can reduce the computation while ensuring the accuracy of evaluation, and better solve the small sample and nonlinear problems, can provide certain a good idea for collapse hazard evaluation in the future.
Collapse
|
3
|
Yan X, Zhu H. A novel robust support vector machine classifier with feature mapping. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
4
|
Zhang L, Wang K, Xu L, Sheng W, Kang Q. Evolving ensembles using multi-objective genetic programming for imbalanced classification. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
5
|
Ma B, Chai B, Dong H, Qi J, Wang P, Xiong T, Gong Y, Li D, Liu S, Song F. Diagnostic classification of cancers using DNA methylation of paracancerous tissues. Sci Rep 2022; 12:10646. [PMID: 35739223 PMCID: PMC9226137 DOI: 10.1038/s41598-022-14786-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 06/13/2022] [Indexed: 11/09/2022] Open
Abstract
The potential role of DNA methylation from paracancerous tissues in cancer diagnosis has not been explored until now. In this study, we built classification models using well-known machine learning models based on DNA methylation profiles of paracancerous tissues. We evaluated our methods on nine cancer datasets collected from The Cancer Genome Atlas (TCGA) and utilized fivefold cross-validation to assess the performance of models. Additionally, we performed gene ontology (GO) enrichment analysis on the basis of the significant CpG sites selected by feature importance scores of XGBoost model, aiming to identify biological pathways involved in cancer progression. We also exploited the XGBoost algorithm to classify cancer types using DNA methylation profiles of paracancerous tissues in external validation datasets. Comparative experiments suggested that XGBoost achieved better predictive performance than the other four machine learning methods in predicting cancer stage. GO enrichment analysis revealed key pathways involved, highlighting the importance of paracancerous tissues in cancer progression. Furthermore, XGBoost model can accurately classify nine different cancers from TCGA, and the feature sets selected by XGBoost can also effectively predict seven cancer types on independent GEO datasets. This study provided new insights into cancer diagnosis from an epigenetic perspective and may facilitate the development of personalized diagnosis and treatment strategies.
Collapse
Affiliation(s)
- Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China.
| | - Bingjie Chai
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Heng Dong
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Jishuang Qi
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Pengcheng Wang
- Department of Mechanical Engineering, University of Houston, Houston, TX, 77204, USA
| | - Tong Xiong
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Yi Gong
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Di Li
- Department of Neuro Intervention, Dalian Medical University Affiliated Dalian Municipal Central Hospital, Dalian, 116033, China
| | - Shuxin Liu
- Department of Nephrology, Dalian Medical University Affiliated Dalian Municipal Central Hospital, Dalian, 116033, China.
| | - Fengju Song
- Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology, Tianjin, National Clinical Research Center of Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin, 300060, China.
| |
Collapse
|
6
|
Hao PY, Chiang JH, Chen YD. Possibilistic classification by support vector networks. Neural Netw 2022; 149:40-56. [DOI: 10.1016/j.neunet.2022.02.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 12/28/2021] [Accepted: 02/06/2022] [Indexed: 10/19/2022]
|
7
|
Wang KF, An J, Wei Z, Cui C, Ma XH, Ma C, Bao HQ. Deep Learning-Based Imbalanced Classification With Fuzzy Support Vector Machine. Front Bioeng Biotechnol 2022; 9:802712. [PMID: 35127672 PMCID: PMC8815771 DOI: 10.3389/fbioe.2021.802712] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 12/20/2021] [Indexed: 12/23/2022] Open
Abstract
Imbalanced classification is widespread in the fields of medical diagnosis, biomedicine, smart city and Internet of Things. The imbalance of data distribution makes traditional classification methods more biased towards majority classes and ignores the importance of minority class. It makes the traditional classification methods ineffective in imbalanced classification. In this paper, a novel imbalance classification method based on deep learning and fuzzy support vector machine is proposed and named as DFSVM. DFSVM first uses a deep neural network to obtain an embedding representation of the data. This deep neural network is trained by using triplet loss to enhance similarities within classes and differences between classes. To alleviate the effects of imbalanced data distribution, oversampling is performed in the embedding space of the data. In this paper, we use an oversampling method based on feature and center distance, which can obtain more diverse new samples and prevent overfitting. To enhance the impact of minority class, we use a fuzzy support vector machine (FSVM) based on cost-sensitive learning as the final classifier. FSVM assigns a higher misclassification cost to minority class samples to improve the classification quality. Experiments were performed on multiple biological datasets and real-world datasets. The experimental results show that DFSVM has achieved promising classification performance.
Collapse
Affiliation(s)
- Ke-Fan Wang
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Jing An
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Zhen Wei
- School of Design, East China Normal University, Shanghai, China
- *Correspondence: Zhen Wei,
| | - Can Cui
- College of Electronic and Information Engineering, Tongji University, Shanghai, China
| | - Xiang-Hua Ma
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Chao Ma
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Han-Qiu Bao
- College of Electronic and Information Engineering, Tongji University, Shanghai, China
| |
Collapse
|
8
|
Surface Defect Detection Methods for Industrial Products: A Review. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11167657] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The comprehensive intelligent development of the manufacturing industry puts forward new requirements for the quality inspection of industrial products. This paper summarizes the current research status of machine learning methods in surface defect detection, a key part in the quality inspection of industrial products. First, according to the use of surface features, the application of traditional machine vision surface defect detection methods in industrial product surface defect detection is summarized from three aspects: texture features, color features, and shape features. Secondly, the research status of industrial product surface defect detection based on deep learning technology in recent years is discussed from three aspects: supervised method, unsupervised method, and weak supervised method. Then, the common key problems and their solutions in industrial surface defect detection are systematically summarized; the key problems include real-time problem, small sample problem, small target problem, unbalanced sample problem. Lastly, the commonly used datasets of industrial surface defects in recent years are more comprehensively summarized, and the latest research methods on the MVTec AD dataset are compared, so as to provide some reference for the further research and development of industrial surface defect detection technology.
Collapse
|
9
|
Zhao D, Wang X, Mu Y, Wang L. Experimental Study and Comparison of Imbalance Ensemble Classifiers with Dynamic Selection Strategy. ENTROPY (BASEL, SWITZERLAND) 2021; 23:822. [PMID: 34203274 PMCID: PMC8307085 DOI: 10.3390/e23070822] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 06/18/2021] [Accepted: 06/24/2021] [Indexed: 12/12/2022]
Abstract
Imbalance ensemble classification is one of the most essential and practical strategies for improving decision performance in data analysis. There is a growing body of literature about ensemble techniques for imbalance learning in recent years, the various extensions of imbalanced classification methods were established from different points of view. The present study is initiated in an attempt to review the state-of-the-art ensemble classification algorithms for dealing with imbalanced datasets, offering a comprehensive analysis for incorporating the dynamic selection of base classifiers in classification. By conducting 14 existing ensemble algorithms incorporating a dynamic selection on 56 datasets, the experimental results reveal that the classical algorithm with a dynamic selection strategy deliver a practical way to improve the classification performance for both a binary class and multi-class imbalanced datasets. In addition, by combining patch learning with a dynamic selection ensemble classification, a patch-ensemble classification method is designed, which utilizes the misclassified samples to train patch classifiers for increasing the diversity of base classifiers. The experiments' results indicate that the designed method has a certain potential for the performance of multi-class imbalanced classification.
Collapse
Affiliation(s)
- Dongxue Zhao
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Xin Wang
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Yashuang Mu
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China
| | - Lidong Wang
- School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
10
|
Tang J, Li J, Xu W, Tian Y, Ju X, Zhang J. Robust cost-sensitive kernel method with Blinex loss and its applications in credit risk evaluation. Neural Netw 2021; 143:327-344. [PMID: 34182234 DOI: 10.1016/j.neunet.2021.06.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 05/10/2021] [Accepted: 06/10/2021] [Indexed: 10/21/2022]
Abstract
Credit risk evaluation is a crucial yet challenging problem in financial analysis. It can not only help institutions reduce risk and ensure profitability, but also improve consumers' fair practices. The data-driven algorithms such as artificial intelligence techniques regard the evaluation as a classification problem and aim to classify transactions as default or non-default. Since non-default samples greatly outnumber default samples, it is a typical imbalanced learning problem and each class or each sample needs special treatment. Numerous data-level, algorithm-level and hybrid methods are presented, and cost-sensitive support vector machines (CSSVMs) are representative algorithm-level methods. Based on the minimization of symmetric and unbounded loss functions, CSSVMs impose higher penalties on the misclassification costs of minority instances using domain specific parameters. However, such loss functions as error measurement cannot have an obvious cost-sensitive generalization. In this paper, we propose a robust cost-sensitive kernel method with Blinex loss (CSKB), which can be applied in credit risk evaluation. By inheriting the elegant merits of Blinex loss function, i.e., asymmetry and boundedness, CSKB not only flexibly controls distinct costs for both classes, but also enjoys noise robustness. As a data-driven decision-making paradigm of credit risk evaluation, CSKB can achieve the "win-win" situation for both the financial institutions and consumers. We solve linear and nonlinear CSKB by Nesterov accelerated gradient algorithm and Pegasos algorithm respectively. Moreover, the generalization capability of CSKB is theoretically analyzed. Comprehensive experiments on synthetic, UCI and credit risk evaluation datasets demonstrate that CSKB compares more favorably than other benchmark methods in terms of various measures.
Collapse
Affiliation(s)
- Jingjing Tang
- School of Business Administration, Faculty of Business Administration, Southwestern University of Finance and Economics, Chengdu 611130, China.
| | - Jiahui Li
- School of Business Administration, Faculty of Business Administration, Southwestern University of Finance and Economics, Chengdu 611130, China.
| | - Weiqi Xu
- School of Business Administration, Faculty of Business Administration, Southwestern University of Finance and Economics, Chengdu 611130, China.
| | - Yingjie Tian
- School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China; Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing 100190, China; Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing 100190, China.
| | - Xuchan Ju
- College of Mathematics and Statistics, Shenzhen University, Shenzhen 518060, China.
| | - Jie Zhang
- Alibaba Group, Beijing 100102, China.
| |
Collapse
|
11
|
Tao X, Chen W, Li X, Zhang X, Li Y, Guo J. The ensemble of density-sensitive SVDD classifier based on maximum soft margin for imbalanced datasets. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106897] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
12
|
Borah P, Gupta D. Robust twin bounded support vector machines for outliers and imbalanced data. APPL INTELL 2021. [DOI: 10.1007/s10489-020-01847-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
13
|
Predicting stock price trends based on financial news articles and using a novel twin support vector machine with fuzzy hyperplane. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.106806] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
14
|
Vuttipittayamongkol P, Elyan E, Petrovski A. On the class overlap problem in imbalanced data classification. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106631] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
15
|
Peng T, Wei C, Yu F, Xu J, Zhou Q, Shi T, Hu X. Predicting nanotoxicity by an integrated machine learning and metabolomics approach. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2020; 267:115434. [PMID: 32841907 DOI: 10.1016/j.envpol.2020.115434] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Revised: 08/11/2020] [Accepted: 08/12/2020] [Indexed: 06/11/2023]
Abstract
Predicting the biological responses to engineered nanoparticles (ENPs) is critical to their environmental health assessment. The disturbances of metabolic pathways reflect the global profile of biological responses to ENPs but are difficult to predict due to the highly heterogeneous data from complicated biological systems and various ENP properties. Herein, integrating multiple machine learning models and metabolomics enabled accurate prediction of the disturbance of metabolic pathways induced by 33 ENPs. Screening nine typical properties of ENPs identified type and size as the top features determining the effects on metabolic pathways. Similarity network analysis and decision tree models overcame the highly heterogeneous data sources to visualize and judge the occurrence of metabolic pathways depending on the sorting priority features. The model accuracy was verified by animal experiments and reached 75%-100%, even for the prediction of ENPs outside of databases. The models also predicted metabolic pathway-related histopathology. This work provides an approach for the quick assessment of environmental health risks induced by known and unknown ENPs.
Collapse
Affiliation(s)
- Ting Peng
- Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education)/Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin, 300350, China
| | - Changhong Wei
- Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education)/Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin, 300350, China
| | - Fubo Yu
- Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education)/Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin, 300350, China
| | - Jing Xu
- Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education)/Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin, 300350, China
| | - Qixing Zhou
- Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education)/Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin, 300350, China
| | - Tonglei Shi
- Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education)/Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin, 300350, China
| | - Xiangang Hu
- Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education)/Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin, 300350, China.
| |
Collapse
|
16
|
|
17
|
Tao X, Li Q, Guo W, Ren C, He Q, Liu R, Zou J. Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.01.032] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|