1
|
Zhang S, Wang J, Li X, Liang Y. M6A-GSMS: Computational identification of N 6-methyladenosine sites with GBDT and stacking learning in multiple species. J Biomol Struct Dyn 2022; 40:12380-12391. [PMID: 34459713 DOI: 10.1080/07391102.2021.1970628] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
N6-methyladenosine (m6A) is one of the most abundant forms of RNA methylation modifications currently known. It involves a wide range of biological processes, including degradation, stability, alternative splicing, etc. Therefore, the development of convenient and efficient m6A prediction technologies are urgent. In this work, a novel predictor based on GBDT and stacking learning is developed to identify m6A sites, which is called M6A-GSMS. To achieve accurate prediction, we explore RNA sequence information from four aspects: correlation, structure, physicochemical properties and pseudo ribonucleic acid composition. After using the GBDT algorithm for feature selection, a stacking model is constructed by combining seven basic classifiers. Compared with other state-of-the-art methods, the results show that M6A-GSMS can obtain excellent performance for identifying the m6A sites. The prediction accuracy of A.thaliana, D.melanogaster, M.musculus, S.cerevisiae and Human reaches 88.4%, 60.8%, 80.5%, 92.4% and 61.8%, respectively. This method provides an effective prediction for the investigation of m6A sites. In addition, all the datasets and codes are currently available at https://github.com/Wang-Jinyue/M6A-GSMS.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, P. R. China
| | - Jinyue Wang
- School of Mathematics and Statistics, Xidian University, Xi'an, P. R. China
| | - Xinjie Li
- School of Mathematics and Statistics, Xidian University, Xi'an, P. R. China
| | - Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an, P. R. China
| |
Collapse
|
2
|
Guan ZX, Li SH, Zhang ZM, Zhang D, Yang H, Ding H. A Brief Survey for MicroRNA Precursor Identification Using Machine Learning Methods. Curr Genomics 2020; 21:11-25. [PMID: 32655294 PMCID: PMC7324890 DOI: 10.2174/1389202921666200214125102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 01/24/2020] [Accepted: 01/30/2020] [Indexed: 11/22/2022] Open
Abstract
MicroRNAs, a group of short non-coding RNA molecules, could regulate gene expression. Many diseases are associated with abnormal expression of miRNAs. Therefore, accurate identification of miRNA precursors is necessary. In the past 10 years, experimental methods, comparative genomics methods, and artificial intelligence methods have been used to identify pre-miRNAs. However, experimental methods and comparative genomics methods have their disadvantages, such as time-consuming. In contrast, machine learning-based method is a better choice. Therefore, the review summarizes the current advances in pre-miRNA recognition based on computational methods, including the construction of benchmark datasets, feature extraction methods, prediction algorithms, and the results of the models. And we also provide valid information about the predictors currently available. Finally, we give the future perspectives on the identification of pre-miRNAs. The review provides scholars with a whole background of pre-miRNA identification by using machine learning methods, which can help researchers have a clear understanding of progress of the research in this field.
Collapse
Affiliation(s)
- Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| |
Collapse
|
3
|
Fu X, Zhu W, Cai L, Liao B, Peng L, Chen Y, Yang J. Improved Pre-miRNAs Identification Through Mutual Information of Pre-miRNA Sequences and Structures. Front Genet 2019; 10:119. [PMID: 30858864 PMCID: PMC6397858 DOI: 10.3389/fgene.2019.00119] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 02/04/2019] [Indexed: 11/30/2022] Open
Abstract
Playing critical roles as post-transcriptional regulators, microRNAs (miRNAs) are a family of short non-coding RNAs that are derived from longer transcripts called precursor miRNAs (pre-miRNAs). Experimental methods to identify pre-miRNAs are expensive and time-consuming, which presents the need for computational alternatives. In recent years, the accuracy of computational methods to predict pre-miRNAs has been increasing significantly. However, there are still several drawbacks. First, these methods usually only consider base frequencies or sequence information while ignoring the information between bases. Second, feature extraction methods based on secondary structures usually only consider the global characteristics while ignoring the mutual influence of the local structures. Third, methods integrating high-dimensional feature information is computationally inefficient. In this study, we have proposed a novel mutual information-based feature representation algorithm for pre-miRNA sequences and secondary structures, which is capable of catching the interactions between sequence bases and local features of the RNA secondary structure. In addition, the feature space is smaller than that of most popular methods, which makes our method computationally more efficient than the competitors. Finally, we applied these features to train a support vector machine model to predict pre-miRNAs and compared the results with other popular predictors. As a result, our method outperforms others based on both 5-fold cross-validation and the Jackknife test.
Collapse
Affiliation(s)
- Xiangzheng Fu
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Wen Zhu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Bo Liao
- College of Information Science and Engineering, Hunan University, Changsha, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yifan Chen
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Jialiang Yang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
4
|
Ma Y, Yu Z, Han G, Li J, Anh V. Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs. BMC Bioinformatics 2018; 19:521. [PMID: 30598066 PMCID: PMC6311913 DOI: 10.1186/s12859-018-2518-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Distinction between pre-microRNAs (precursor microRNAs) and length-similar pseudo pre-microRNAs can reveal more about the regulatory mechanism of RNA biological processes. Machine learning techniques have been widely applied to deal with this challenging problem. However, most of them mainly focus on secondary structure information of pre-microRNAs, while ignoring sequence-order information and sequence evolution information. RESULTS We use new features for the machine learning algorithms to improve the classification performance by characterizing both sequence order evolution information and secondary structure graphs. We developed three steps to extract these features of pre-microRNAs. We first extract features from PSI-BLAST profiles and Hilbert-Huang transforms, which contain rich sequence evolution information and sequence-order information respectively. We then obtain properties of small molecular networks of pre-microRNAs, which contain refined secondary structure information. These structural features are carefully generated so that they can depict both global and local characteristics of pre-microRNAs. In total, our feature space covers 591 features. The maximum relevance and minimum redundancy (mRMR) feature selection method is adopted before support vector machine (SVM) is applied as our classifier. The constructed classification model is named MicroRNA -NHPred. The performance of MicroRNA -NHPred is high and stable, which is better than that of those state-of-the-art methods, achieving an accuracy of up to 94.83% on same benchmark datasets. CONCLUSIONS The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the sequences and secondary structures, which are capable of characterizing the sequence evolution information and sequence-order information, and global and local information of pre-microRNAs secondary structures. MicroRNA -NHPred is a valuable method for pre-microRNAs identification. The source codes of our method can be downloaded from https://github.com/myl446/MicroRNA-NHPred .
Collapse
Affiliation(s)
- Yuanlin Ma
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, 411105 China
| | - Zuguo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, 411105 China
- School of Electrical Engineering and Computer Science, Queensland University of Technology, GPO Box 2434, Brisbane, Q4001 Australia
| | - Guosheng Han
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, 411105 China
| | - Jinyan Li
- Advanced Analytics Institute, Faculty of Engineering & IT, University of Technology Sydney, P.O Box 123, Broadway, NSW 2007 Australia
| | - Vo Anh
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, 411105 China
- School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, Q4001 Australia
| |
Collapse
|
5
|
Fu X, Liao B, Zhu W, Cai L. New 3D graphical representation for RNA structure analysis and its application in the pre-miRNA identification of plants. RSC Adv 2018; 8:30833-30841. [PMID: 35548744 PMCID: PMC9085476 DOI: 10.1039/c8ra04138e] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 08/24/2018] [Indexed: 11/26/2022] Open
Abstract
MicroRNAs (miRNAs) are a family of short non-coding RNAs that play significant roles as post-transcriptional regulators. Consequently, various methods have been proposed to identify precursor miRNAs (pre-miRNAs), among which the comparative studies of miRNA structures are the most important. To measure and classify the structural similarity of miRNAs, we propose a new three-dimensional (3D) graphical representation of the secondary structure of miRNAs, in which an miRNA secondary structure is initially transformed into a characteristic sequence based on physicochemical properties and frequency of base. A numerical characterization of the 3D graph is used to represent the miRNA secondary structure. We then utilize a novel Euclidean distance method based on this expression to compute the distance of different miRNA sequences for the sequence similarity analysis. Finally, we use this sequence similarity analysis method to identify plant pre-miRNAs among three commonly used datasets. Results show that the method is reasonable and effective.
Collapse
Affiliation(s)
- Xiangzheng Fu
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| | - Bo Liao
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| | - Wen Zhu
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| |
Collapse
|
6
|
Uchôa Guimarães CT, Ferreira Martins NN, Cristina da Silva Oliveira K, Almeida CM, Pinheiro TM, Gigek CO, Roberto de Araújo Cavallero S, Assumpção PP, Cardoso Smith MA, Burbano RR, Calcagno DQ. Liquid biopsy provides new insights into gastric cancer. Oncotarget 2018; 9:15144-15156. [PMID: 29599934 PMCID: PMC5871105 DOI: 10.18632/oncotarget.24540] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 12/01/2017] [Indexed: 12/12/2022] Open
Abstract
Liquid biopsies have great promise for precision medicine as they provide information about primary and metastatic tumors via a minimally invasive method. In gastric cancer patients, a large number of blood-based biomarkers have been reported for their potential role in clinical practice for screening, early diagnosis, prognostic evaluation, recurrence monitoring and therapeutic efficiency follow-up. This current review focuses on blood liquid biopsies' role and their clinical implications in gastric cancer patients, with an emphasis on circulating tumor cells (CTCs), circulating tumor DNA (ctDNA) and circulating non-coding RNAs (ncRNAs). We also provide a brief discussion of the potential and limitations of liquid biopsies use and their future use in the routine clinical care of gastric cancer.
Collapse
Affiliation(s)
- Camila Tavares Uchôa Guimarães
- Residência Multiprofissional em Oncologia, Hospital Universitário João de Barros Barreto, Universidade Federal do Pará, Belém, PA, Brazil
| | | | | | - Caroline Martins Almeida
- Residência Multiprofissional em Oncologia, Hospital Universitário João de Barros Barreto, Universidade Federal do Pará, Belém, PA, Brazil
| | | | - Carolina Oliveira Gigek
- Disciplina de Genética, Universidade Federal de São Paulo, São Paulo, SP, Brazil
- Disciplina de Gastroenterologia Cirurgica, Universidade Federal de São Paulo, São Paulo, SP, Brazil
| | | | | | | | - Rommel Rodríguez Burbano
- Núcleo de Pesquisas em Oncologia, Universidade Federal do Pará, Belém, PA, Brazil
- Laboratório de Biologia Molecular, Hospital Ophir Loyola, Belém, PA, Brazil
| | - Danielle Queiroz Calcagno
- Residência Multiprofissional em Oncologia, Hospital Universitário João de Barros Barreto, Universidade Federal do Pará, Belém, PA, Brazil
- Núcleo de Pesquisas em Oncologia, Universidade Federal do Pará, Belém, PA, Brazil
| |
Collapse
|