1
|
Uncertainty analysis on support vector machine for measuring organizational factors in probabilistic risk assessment of nuclear power plants. PROGRESS IN NUCLEAR ENERGY 2022. [DOI: 10.1016/j.pnucene.2022.104411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
2
|
|
3
|
Chen X, Xiong Y, Liu Y, Chen Y, Bi S, Zhu X. m5CPred-SVM: a novel method for predicting m5C sites of RNA. BMC Bioinformatics 2020; 21:489. [PMID: 33126851 PMCID: PMC7602301 DOI: 10.1186/s12859-020-03828-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 10/21/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND As one of the most common post-transcriptional modifications (PTCM) in RNA, 5-cytosine-methylation plays important roles in many biological functions such as RNA metabolism and cell fate decision. Through accurate identification of 5-methylcytosine (m5C) sites on RNA, researchers can better understand the exact role of 5-cytosine-methylation in these biological functions. In recent years, computational methods of predicting m5C sites have attracted lots of interests because of its efficiency and low-cost. However, both the accuracy and efficiency of these methods are not satisfactory yet and need further improvement. RESULTS In this work, we have developed a new computational method, m5CPred-SVM, to identify m5C sites in three species, H. sapiens, M. musculus and A. thaliana. To build this model, we first collected benchmark datasets following three recently published methods. Then, six types of sequence-based features were generated based on RNA segments and the sequential forward feature selection strategy was used to obtain the optimal feature subset. After that, the performance of models based on different learning algorithms were compared, and the model based on the support vector machine provided the highest prediction accuracy. Finally, our proposed method, m5CPred-SVM was compared with several existing methods, and the result showed that m5CPred-SVM offered substantially higher prediction accuracy than previously published methods. It is expected that our method, m5CPred-SVM, can become a useful tool for accurate identification of m5C sites. CONCLUSION In this study, by introducing position-specific propensity related features, we built a new model, m5CPred-SVM, to predict RNA m5C sites of three different species. The result shows that our model outperformed the existing state-of-art models. Our model is available for users through a web server at https://zhulab.ahu.edu.cn/m5CPred-SVM .
Collapse
Affiliation(s)
- Xiao Chen
- School of Sciences, Anhui Agricultural University, Hefei, 230036 Anhui China
| | - Yi Xiong
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240 China
| | - Yinbo Liu
- School of Sciences, Anhui Agricultural University, Hefei, 230036 Anhui China
| | - Yuqing Chen
- School of Sciences, Anhui Agricultural University, Hefei, 230036 Anhui China
| | - Shoudong Bi
- School of Sciences, Anhui Agricultural University, Hefei, 230036 Anhui China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, 230036 Anhui China
| |
Collapse
|
4
|
Pang H, Wei S, Zhao Y, He L, Wang J, Liu B, Zhao Y. Effective attention-based network for syndrome differentiation of AIDS. BMC Med Inform Decis Mak 2020; 20:264. [PMID: 33059709 PMCID: PMC7558604 DOI: 10.1186/s12911-020-01249-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Accepted: 09/08/2020] [Indexed: 11/18/2022] Open
Abstract
Background Syndrome differentiation aims at dividing patients into several types according to their clinical symptoms and signs, which is essential for traditional Chinese medicine (TCM). Several previous works were devoted to employing the classical algorithms to classify the syndrome and achieved delightful results. However, the presence of ambiguous symptoms substantially disturbed the performance of syndrome differentiation, This disturbance is always due to the diversity and complexity of the patients’ symptoms. Methods To alleviate this issue, we proposed an algorithm based on the multilayer perceptron model with an attention mechanism (ATT-MLP). In particular, we first introduced an attention mechanism to assign different weights for different symptoms among the symptomatic features. In this manner, the symptoms of major significance were highlighted and ambiguous symptoms were restrained. Subsequently, those weighted features were further fed into an MLP to predict the syndrome type of AIDS. Results Experimental results for a real-world AIDS dataset show that our framework achieves significant and consistent improvements compared to other methods. Besides, our model can also capture the key symptoms corresponding to each type of syndrome. Conclusion In conclusion, our proposed method can learn these intrinsic correlations between symptoms and types of syndromes. Our model is able to learn the core cluster of symptoms for each type of syndrome from limited data, while assisting medical doctors to diagnose patients efficiently.
Collapse
Affiliation(s)
- Huaxin Pang
- Beijing Jiaotong University,China, No.3 Shangyuancun, Haidian District, Beijing, 100044, China
| | - Shikui Wei
- Beijing Jiaotong University,China, No.3 Shangyuancun, Haidian District, Beijing, 100044, China
| | - Yufeng Zhao
- Institute of Basic Research in Clinical Medicine/National Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, No.16 South Street,Dongzhimen,Dongcheng District, Beijing, 100700, China.
| | - Liyun He
- Institute of Basic Research in Clinical Medicine/National Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, No.16 South Street,Dongzhimen,Dongcheng District, Beijing, 100700, China
| | - Jian Wang
- China Academy of Chinese Medical Sciences, No.16 South Street,Dongzhimen,Dongcheng District, Beijing, 100700, China
| | - Baoyan Liu
- China Academy of Chinese Medical Sciences, No.16 South Street,Dongzhimen,Dongcheng District, Beijing, 100700, China
| | - Yao Zhao
- Beijing Jiaotong University,China, No.3 Shangyuancun, Haidian District, Beijing, 100044, China
| |
Collapse
|
5
|
A survey of robust optimization based machine learning with special reference to support vector machines. INT J MACH LEARN CYB 2019. [DOI: 10.1007/s13042-019-01044-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
6
|
Fang T, Zhang Z, Sun R, Zhu L, He J, Huang B, Xiong Y, Zhu X. RNAm5CPred: Prediction of RNA 5-Methylcytosine Sites Based on Three Different Kinds of Nucleotide Composition. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:739-747. [PMID: 31726390 PMCID: PMC6859278 DOI: 10.1016/j.omtn.2019.10.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Revised: 10/11/2019] [Accepted: 10/11/2019] [Indexed: 12/11/2022]
Abstract
5-methylcytosine (m5C) is one of the most common and abundant post-transcriptional modifications (PTCMs) in RNA. Recent studies showed that m5C plays important roles in many biological functions such as RNA metabolism and cell fate decision. Because most experimental methods that determine m5C sites across the transcriptome are time-consuming and expensive, it is urgent to develop accurate computational methods to identify m5C sites effectively. A benchmark dataset is important for developing and evaluating computational methods. In this work, we constructed four different datasets according to the data redundancy and imbalance. Based on these datasets, we generated three different kinds of features, i.e., KNFs (K-nucleotide frequencies), KSNPFs (K-spaced nucleotide pair frequencies), and pseDNC (pseudo-dinucleotide composition), and then used a support vector machine (SVM) to build our models. Based on the imbalanced and nonredundant dataset, Met935, we extensively studied the three kinds of features and determined an optimal combination of the features. Based on the feature combination, we built models on the three different datasets and compared them with state-of-the-art models. According to the predictive results of the stringent jackknife test, the models based on the three features, 4NF, 1SNPF, and pseDNC, are superior or comparable to other methods. To determine the best model between the models based on the imbalanced dataset Met935 and the balanced dataset Met240, we further evaluated the two models on an independent test set Test1157. Our results demonstrate that the model based on the balanced dataset Met240 achieved the highest recall (68.79%) and the highest Matthews correlation coefficient (MCC) (0.154). In addition, the model is also superior to other state-of-the-art methods according to the integrated parameter MCC on the independent test set. Thus, we selected the model based on Met240 as our final model, which was named RNAm5CPred. In addition, a web server for RNAm5CPred (http://zhulab.ahu.edu.cn/RNAm5CPred/) has been provided to facilitate experimental research.
Collapse
Affiliation(s)
- Ting Fang
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China; School of Life Sciences, Anhui University, Hefei, Anhui 230601, China
| | - Zizheng Zhang
- School of Life Sciences, Anhui University, Hefei, Anhui 230601, China
| | - Rui Sun
- Beijing Baidu Netcom Sciences and Technology Co., Ltd., Beijing, China
| | - Lin Zhu
- School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| | - Jingjing He
- School of Life Sciences, Anhui University, Hefei, Anhui 230601, China
| | - Bei Huang
- School of Life Sciences, Anhui University, Hefei, Anhui 230601, China.
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China; School of Life Sciences, Anhui University, Hefei, Anhui 230601, China.
| |
Collapse
|
7
|
Tabrizchi H, Javidi MM, Amirzadeh V. Estimates of residential building energy consumption using a multi-verse optimizer-based support vector machine with k-fold cross-validation. EVOLVING SYSTEMS 2019. [DOI: 10.1007/s12530-019-09283-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
8
|
|
9
|
Xie Z, Xu Y, Hu Q. Uncertain data classification with additive kernel support vector machine. DATA KNOWL ENG 2018. [DOI: 10.1016/j.datak.2018.07.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
10
|
Utkin LV, Zhuk YA. Interval SVM-Based Classification Algorithm Using the Uncertainty Trick. INT J ARTIF INTELL T 2017. [DOI: 10.1142/s0218213017500142] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
A new robust SVM-based algorithm of the binary classification is proposed. It is based on the so-called uncertainty trick when training data with the interval uncertainty are transformed to training data with the weight or probabilistic uncertainty. Every interval is replaced by a set of training points with the same class label such that every point inside the interval has an unknown weight from a predefined set of weights. The robust strategy dealing with the upper bound of the interval-valued expected risk produced by a set of weights is used in the SVM. An extension of the algorithm based on using the imprecise Dirichlet model is proposed for its additional robustification. Numerical examples with synthetic and real interval-valued training data illustrate the proposed algorithm and its extension.
Collapse
Affiliation(s)
- Lev V. Utkin
- Peter the Great St. Petersburg Polytechnic University, Politekhnicheskaya ul., 29, St. Petersburg 195251, Russia
| | - Yulia A. Zhuk
- ITMO University, Birzhevaya liniya, 14, Lit.A, St. Petersburg 199034, Russia
| |
Collapse
|
11
|
|
12
|
Utkin LV, Chekh AI, Zhuk YA. Binary classification SVM-based algorithms with interval-valued training data using triangular and Epanechnikov kernels. Neural Netw 2016; 80:53-66. [DOI: 10.1016/j.neunet.2016.04.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Revised: 04/06/2016] [Accepted: 04/11/2016] [Indexed: 11/24/2022]
|