1
|
Daniel Thomas S, Vijayakumar K, John L, Krishnan D, Rehman N, Revikumar A, Kandel Codi JA, Prasad TSK, S S V, Raju R. Machine Learning Strategies in MicroRNA Research: Bridging Genome to Phenome. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2024; 28:213-233. [PMID: 38752932 DOI: 10.1089/omi.2024.0047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2024]
Abstract
MicroRNAs (miRNAs) have emerged as a prominent layer of regulation of gene expression. This article offers the salient and current aspects of machine learning (ML) tools and approaches from genome to phenome in miRNA research. First, we underline that the complexity in the analysis of miRNA function ranges from their modes of biogenesis to the target diversity in diverse biological conditions. Therefore, it is imperative to first ascertain the miRNA coding potential of genomes and understand the regulatory mechanisms of their expression. This knowledge enables the efficient classification of miRNA precursors and the identification of their mature forms and respective target genes. Second, and because one miRNA can target multiple mRNAs and vice versa, another challenge is the assessment of the miRNA-mRNA target interaction network. Furthermore, long-noncoding RNA (lncRNA)and circular RNAs (circRNAs) also contribute to this complexity. ML has been used to tackle these challenges at the high-dimensional data level. The present expert review covers more than 100 tools adopting various ML approaches pertaining to, for example, (1) miRNA promoter prediction, (2) precursor classification, (3) mature miRNA prediction, (4) miRNA target prediction, (5) miRNA- lncRNA and miRNA-circRNA interactions, (6) miRNA-mRNA expression profiling, (7) miRNA regulatory module detection, (8) miRNA-disease association, and (9) miRNA essentiality prediction. Taken together, we unpack, critically examine, and highlight the cutting-edge synergy of ML approaches and miRNA research so as to develop a dynamic and microlevel understanding of human health and diseases.
Collapse
Affiliation(s)
- Sonet Daniel Thomas
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Krithika Vijayakumar
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Levin John
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Deepak Krishnan
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Niyas Rehman
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Amjesh Revikumar
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Kerala Genome Data Centre, Kerala Development and Innovation Strategic Council, Thiruvananthapuram, Kerala, India
| | - Jalaluddin Akbar Kandel Codi
- Department of Surgical Oncology, Yenepoya Medical College, Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | | | - Vinodchandra S S
- Department of Computer Science, University of Kerala, Thiruvananthapuram, Kerala, India
| | - Rajesh Raju
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| |
Collapse
|
2
|
Singh J, Khanna NN, Rout RK, Singh N, Laird JR, Singh IM, Kalra MK, Mantella LE, Johri AM, Isenovic ER, Fouda MM, Saba L, Fatemi M, Suri JS. GeneAI 3.0: powerful, novel, generalized hybrid and ensemble deep learning frameworks for miRNA species classification of stationary patterns from nucleotides. Sci Rep 2024; 14:7154. [PMID: 38531923 PMCID: PMC11344070 DOI: 10.1038/s41598-024-56786-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 03/11/2024] [Indexed: 03/28/2024] Open
Abstract
Due to the intricate relationship between the small non-coding ribonucleic acid (miRNA) sequences, the classification of miRNA species, namely Human, Gorilla, Rat, and Mouse is challenging. Previous methods are not robust and accurate. In this study, we present AtheroPoint's GeneAI 3.0, a powerful, novel, and generalized method for extracting features from the fixed patterns of purines and pyrimidines in each miRNA sequence in ensemble paradigms in machine learning (EML) and convolutional neural network (CNN)-based deep learning (EDL) frameworks. GeneAI 3.0 utilized five conventional (Entropy, Dissimilarity, Energy, Homogeneity, and Contrast), and three contemporary (Shannon entropy, Hurst exponent, Fractal dimension) features, to generate a composite feature set from given miRNA sequences which were then passed into our ML and DL classification framework. A set of 11 new classifiers was designed consisting of 5 EML and 6 EDL for binary/multiclass classification. It was benchmarked against 9 solo ML (SML), 6 solo DL (SDL), 12 hybrid DL (HDL) models, resulting in a total of 11 + 27 = 38 models were designed. Four hypotheses were formulated and validated using explainable AI (XAI) as well as reliability/statistical tests. The order of the mean performance using accuracy (ACC)/area-under-the-curve (AUC) of the 24 DL classifiers was: EDL > HDL > SDL. The mean performance of EDL models with CNN layers was superior to that without CNN layers by 0.73%/0.92%. Mean performance of EML models was superior to SML models with improvements of ACC/AUC by 6.24%/6.46%. EDL models performed significantly better than EML models, with a mean increase in ACC/AUC of 7.09%/6.96%. The GeneAI 3.0 tool produced expected XAI feature plots, and the statistical tests showed significant p-values. Ensemble models with composite features are highly effective and generalized models for effectively classifying miRNA sequences.
Collapse
Affiliation(s)
- Jaskaran Singh
- Department of Computer Science, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India
| | - Narendra N Khanna
- Department of Cardiology, Indraprastha APOLLO Hospitals, New Delhi, India
| | - Ranjeet K Rout
- Department of Computer Science and Engineering, NIT Srinagar, Hazratbal, Srinagar, India
| | - Narpinder Singh
- Department of Food Science, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India
| | - John R Laird
- Heart and Vascular Institute, Adventist Health St. Helena, St Helena, CA, USA
| | - Inder M Singh
- Advanced Cardiac and Vascular Institute, Sacramento, CA, USA
| | - Mannudeep K Kalra
- Department of Radiology, Massachusetts General Hospital, Boston, MA, 02115, USA
| | - Laura E Mantella
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, Canada
| | - Amer M Johri
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, Canada
| | - Esma R Isenovic
- Laboratory for Molecular Genetics and Radiobiology, University of Belgrade, Belgrade, Serbia
| | - Mostafa M Fouda
- Department of Electrical and Computer Engineering, Idaho State University, Pocatello, ID, 83209, USA
| | - Luca Saba
- Department of Neurology, University of Cagliari, Cagliari, Italy
| | - Mostafa Fatemi
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, 55905, USA
| | - Jasjit S Suri
- Stroke Monitoring and Diagnostic Division, AtheroPoint LLC, Roseville, CA, 95661, USA.
| |
Collapse
|
3
|
miR-155: An Important Role in Inflammation Response. J Immunol Res 2022; 2022:7437281. [PMID: 35434143 PMCID: PMC9007653 DOI: 10.1155/2022/7437281] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 03/19/2022] [Indexed: 12/13/2022] Open
Abstract
MicroRNAs (miRNAs) are a class of small, mature, noncoding RNA that lead to posttranscriptional gene silencing to regulate gene expression. miRNAs are instrumental in biological processes such as cell development, cell differentiation, cell proliferation, and cell apoptosis. The miRNA-mediated gene silencing is an important part of the regulation of gene expression in many kinds of diseases. miR-155, one of the best-characterized miRNAs, has been found to be closely related to physiological and pathological processes. What is more, miR-155 can be used as a potential therapeutic target for inflammatory diseases. We analyze the articles about miR-155 for nearly five years, review the advanced study on the function of miR-155 in different inflammatory cells like T cells, B cells, DCs, and macrophages, and then summarize the biological functions of miR-155 in different inflammatory cells. The widespread involvement of miR-155 in human diseases has led to a novel therapeutic approach between Chinese and Western medicine.
Collapse
|
4
|
Zhu Y, Zhang Z, Song J, Qian W, Gu X, Yang C, Shen N, Xue F, Tang Y. SARS-CoV-2-Encoded MiRNAs Inhibit Host Type I Interferon Pathway and Mediate Allelic Differential Expression of Susceptible Gene. Front Immunol 2022; 12:767726. [PMID: 35003084 PMCID: PMC8733928 DOI: 10.3389/fimmu.2021.767726] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 11/29/2021] [Indexed: 12/13/2022] Open
Abstract
Infection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), causing the rapid spread of coronavirus disease 2019 (COVID-19), has generated a public health crisis worldwide. The molecular mechanisms of SARS-CoV-2 infection and virus–host interactions are still unclear. In this study, we identified four unique microRNA-like small RNAs encoded by SARS-CoV-2. SCV2-miR-ORF1ab-1-3p and SCV2-miR-ORF1ab-2-5p play an important role in evasion of type I interferon response through targeting several genes in type I interferon signaling pathway. Particularly worth mentioning is that highly expressed SCV2-miR-ORF1ab-2-5p inhibits some key genes in the host innate immune response, such as IRF7, IRF9, STAT2, OAS1, and OAS2. SCV2-miR-ORF1ab-2-5p has also been found to mediate allelic differential expression of COVID-19-susceptible gene OAS1. In conclusion, these results suggest that SARS-CoV-2 uses its miRNAs to evade the type I interferon response and links the functional viral sequence to the susceptible genetic background of the host.
Collapse
Affiliation(s)
- Youwei Zhu
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhaoyang Zhang
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jia Song
- Institute of Molecular Medicine, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Weizhou Qian
- Institute of Molecular Medicine, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xiangqian Gu
- Department of Hepatobiliary Surgery, Wuxi People's Hospital Affiliated Nanjing Medical University, Wuxi, China
| | - Chaoyong Yang
- Institute of Molecular Medicine, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,State Key Laboratory for Physical Chemistry of Solid Surfaces, Key Laboratory for Chemical Biology of Fujian Province, Key Laboratory of Analytical Chemistry, and Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, China
| | - Nan Shen
- State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, Shanghai, China.,Collaborative Innovation Center for Translational Medicine, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,Center for Autoimmune Genomics and Etiology (CAGE), Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Feng Xue
- Department of Liver Surgery, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Yuanjia Tang
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, Shanghai, China
| |
Collapse
|
5
|
Gharbi S, Mohammadi Z, Dezaki MS, Dokanehiifard S, Dabiri S, Korsching E. Characterization of the first microRNA in human CDH1 that affects cell cycle and apoptosis and indicates breast cancers progression. J Cell Biochem 2022; 123:657-672. [PMID: 34997630 DOI: 10.1002/jcb.30211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 11/26/2021] [Accepted: 12/21/2021] [Indexed: 11/12/2022]
Abstract
The E-cadherin protein (Cadherin 1, gene: CDH1), a master regulator of the human epithelial homeostasis, contributes to the epithelial-mesenchymal transition (EMT) which confers cell migratory features to the cells. The EMT is central to many pathophysiological changes in cancer. Therefore, a better understanding of this regulatory scenario is beneficial for therapeutic regiments. The CDH1 gene is approximately 100 kbp long and consists of 16 exons with a relatively large second intron. Since none microRNA (miRNA) has been identified in CDH1 up to now we screened the CDH1 gene for promising miRNA hairpin structures in silico. Out of the 27 hairpin structures we identified, one stable RNA fold with a promising sequence motive was selected for experimental verification. The exogenous validation of the hairpin sequence was performed by transfection of HEK293T cells and the mature miRNA sequences could be verified by quantitative polymerase chain reaction. The endogenous expression of the mature miRNA provisionally named CDH1-i2-miR-1 could be confirmed in two normal (HEK293T, HUVEK) and five cancer cell lines (MCF7, MDA-MB-231, SW480, HT-29, A549). The functional characterization by the 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide assay showed a suppression of HEK293T cell proliferation. A flow cytometry-based approach showed the ability of CDH1-i2-miR-1 to arrest transfected cells on a G2/M state while annexin staining exemplified an apoptotic effect. BAX and PTEN expression levels were affected following the overexpression with the new miRNA. The in vivo expression level was assessed in 35 breast tumor tissues and their paired nonmalignant marginal part. A fourfold downregulation in the tumor specimens compared to their marginal controls could be observed. It can be concluded that the sequence of the hub gene CDH1 harbors at least one miRNA but eventually even more relevant for the pathophysiology of breast cancer.
Collapse
Affiliation(s)
- Sedigheh Gharbi
- Department of Biology, Faculty of Sciences, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Zahra Mohammadi
- Department of Biology, Faculty of Sciences, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Maryam Saedi Dezaki
- Department of Biology, Faculty of Sciences, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Sadat Dokanehiifard
- Department of Human Genetics, Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, Florida, USA
| | - Shahriar Dabiri
- Department of Pathology, Pathology and Stem Cell Research Center, Kerman University of Medical Sciences, Kerman, Iran
| | - Eberhard Korsching
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| |
Collapse
|
6
|
Abstract
Tiny single-stranded noncoding RNAs with size 19-27 nucleotides serve as microRNAs (miRNAs), which have emerged as key gene regulators in the last two decades. miRNAs serve as one of the hallmarks in regulatory pathways with critical roles in human diseases. Ever since the discovery of miRNAs, researchers have focused on how mature miRNAs are produced from precursor mRNAs. Experimental methods are faced with notorious challenges in terms of experimental design, since it is time consuming and not cost-effective. Hence, different computational methods have been employed for the identification of miRNA sequences where most of them labeled as miRNA predictors are in fact pre-miRNA predictors and provide no information about the putative miRNA location within the pre-miRNA. This chapter provides an update and the current state of the art in this area covering various methods and 15 software suites used for prediction of mature miRNA.
Collapse
Affiliation(s)
- Malik Yousef
- Department of Information System, Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel
| | - Alisha Parveen
- Rudolf‑Zenker Institute of Experimental Surgery, Rostock University Medical Center, Rostock, Germany
| | - Abhishek Kumar
- Institute of Bioinformatics, Bangalore, India. .,Manipal Academy of Higher Education (MAHE), Manipal, Karnataka, India.
| |
Collapse
|
7
|
Azadirachta indica MicroRNAs: Genome-Wide Identification, Target Transcript Prediction, and Expression Analyses. Appl Biochem Biotechnol 2021; 193:1924-1944. [PMID: 33523368 DOI: 10.1007/s12010-021-03500-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2020] [Accepted: 01/07/2021] [Indexed: 10/22/2022]
Abstract
MicroRNAs are short, endogenous, non-coding RNAs, liable for essential regulatory function. Numerous miRNAs have been identified and studied in plants with known genomic or small RNA resources. Despite the availability of genomic and transcriptomic resources, the miRNAs have not been reported in the medicinal tree Azadirachta indica (Neem) till date. Here for the first time, we report extensive identification of miRNAs and their possible targets in A. indica which might help to unravel their therapeutic potential. A comprehensive search of miRNAs in the A. indica genome by C-mii tool was performed. Overall, 123 miRNAs classified into 63 families and their stem-loop hairpin structures were predicted. The size of the A. indica (ain)-miRNAs ranged between 19 and 23 nt in length, and their corresponding ain-miRNA precursor sequence MFEI value averaged as -1.147 kcal/mol. The targets of ain-miRNAs were predicted in A. indica as well as Arabidopsis thaliana plant. The gene ontology (GO) annotation revealed the involvement of ain-miRNA targets in developmental processes, transport, stress, and metabolic processes including secondary metabolism. Stem-loop qRT-PCR was carried out for 25 randomly selected ain-miRNAs and differential expression patterns were observed in different A. indica tissues. Expression of miRNAs and its targets shows negative correlation in a dependent manner.
Collapse
|
8
|
Chen L, Heikkinen L, Wang C, Yang Y, Sun H, Wong G. Trends in the development of miRNA bioinformatics tools. Brief Bioinform 2019; 20:1836-1852. [PMID: 29982332 PMCID: PMC7414524 DOI: 10.1093/bib/bby054] [Citation(s) in RCA: 344] [Impact Index Per Article: 68.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Revised: 05/18/2018] [Indexed: 12/13/2022] Open
Abstract
MicroRNAs (miRNAs) are small noncoding RNAs that regulate gene expression via recognition of cognate sequences and interference of transcriptional, translational or epigenetic processes. Bioinformatics tools developed for miRNA study include those for miRNA prediction and discovery, structure, analysis and target prediction. We manually curated 95 review papers and ∼1000 miRNA bioinformatics tools published since 2003. We classified and ranked them based on citation number or PageRank score, and then performed network analysis and text mining (TM) to study the miRNA tools development trends. Five key trends were observed: (1) miRNA identification and target prediction have been hot spots in the past decade; (2) manual curation and TM are the main methods for collecting miRNA knowledge from literature; (3) most early tools are well maintained and widely used; (4) classic machine learning methods retain their utility; however, novel ones have begun to emerge; (5) disease-associated miRNA tools are emerging. Our analysis yields significant insight into the past development and future directions of miRNA tools.
Collapse
Affiliation(s)
- Liang Chen
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| | - Liisa Heikkinen
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| | - Changliang Wang
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| | - Yang Yang
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| | - Huiyan Sun
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Garry Wong
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| |
Collapse
|
9
|
Computational Resources for Prediction and Analysis of Functional miRNA and Their Targetome. Methods Mol Biol 2019; 1912:215-250. [PMID: 30635896 DOI: 10.1007/978-1-4939-8982-9_9] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
microRNAs are evolutionarily conserved, endogenously produced, noncoding RNAs (ncRNAs) of approximately 19-24 nucleotides (nts) in length known to exhibit gene silencing of complementary target sequence. Their deregulated expression is reported in various disease conditions and thus has therapeutic implications. In the last decade, various computational resources are published in this field. In this chapter, we have reviewed bioinformatics resources, i.e., miRNA-centered databases, algorithms, and tools to predict miRNA targets. First section has enlisted more than 75 databases, which mainly covers information regarding miRNA registries, targets, disease associations, differential expression, interactions with other noncoding RNAs, and all-in-one resources. In the algorithms section, we have compiled about 140 algorithms from eight subcategories, viz. for the prediction of precursor (pre-) and mature miRNAs. These algorithms are developed on various sequence, structure, and thermodynamic based features incorporated into different machine learning techniques (MLTs). In addition, computational identification of miRNAs from high-throughput next generation sequencing (NGS) data and their variants, viz. isomiRs, differential expression, miR-SNPs, and functional annotation, are discussed. Prediction and analysis of miRNAs and their associated targets are also evaluated under miR-targets section providing knowledge regarding novel miRNA targets and complex host-pathogen interactions. In conclusion, we have provided comprehensive review of in silico resources published in miRNA research to help scientific community be updated and choose the appropriate tool according to their needs.
Collapse
|
10
|
Adaboost-SVM-based probability algorithm for the prediction of all mature miRNA sites based on structured-sequence features. Sci Rep 2019; 9:1521. [PMID: 30728425 PMCID: PMC6365589 DOI: 10.1038/s41598-018-38048-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 12/18/2018] [Indexed: 02/07/2023] Open
Abstract
The significant role of microRNAs (miRNAs) in various biological processes and diseases has been widely studied and reported in recent years. Several computational methods associated with mature miRNA identification suffer various limitations involving canonical biological features extraction, class imbalance, and classifier performance. The proposed classifier, miRFinder, is an accurate alternative for the identification of mature miRNAs. The structured-sequence features were proposed to precisely extract miRNA biological features, and three algorithms were selected to obtain the canonical features based on the classifier performance. Moreover, the center of mass near distance training based on K-means was provided to improve the class imbalance problem. In particular, the AdaBoost-SVM algorithm was used to construct the classifier. The classifier training process focuses on incorrectly classified samples, and the integrated results use the common decision strategies of the weak classifier with different weights. In addition, the all mature miRNA sites were predicted by different classifiers based on the features of different sites. Compared with other methods, the performance of the classifiers has a high degree of efficacy for the identification of mature miRNAs. MiRFinder is freely available at https://github.com/wangying0128/miRFinder .
Collapse
|
11
|
Shukla V, Varghese VK, Kabekkodu SP, Mallya S, Satyamoorthy K. A compilation of Web-based research tools for miRNA analysis. Brief Funct Genomics 2018; 16:249-273. [PMID: 28334134 DOI: 10.1093/bfgp/elw042] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Since the discovery of microRNAs (miRNAs), a class of noncoding RNAs that regulate the gene expression posttranscriptionally in sequence-specific manner, there has been a release of number of tools useful for both basic and advanced applications. This is because of the significance of miRNAs in many pathophysiological conditions including cancer. Numerous bioinformatics tools that have been developed for miRNA analysis have their utility for detection, expression, function, target prediction and many other related features. This review provides a comprehensive assessment of web-based tools for the miRNA analysis that does not require prior knowledge of any computing languages.
Collapse
|
12
|
Computational Approaches and Related Tools to Identify MicroRNAs in a Species: A Bird’s Eye View. Interdiscip Sci 2017; 10:616-635. [DOI: 10.1007/s12539-017-0223-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2016] [Revised: 12/20/2016] [Accepted: 03/09/2017] [Indexed: 12/26/2022]
|
13
|
Sinoy S, Fayaz SM, Charles KD, Suvanish VK, Kapfhammer JP, Rajanikant GK. Amikacin Inhibits miR-497 Maturation and Exerts Post-ischemic Neuroprotection. Mol Neurobiol 2016; 54:3683-3694. [DOI: 10.1007/s12035-016-9940-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2015] [Accepted: 05/11/2016] [Indexed: 10/25/2022]
|
14
|
Improving classification of mature microRNA by solving class imbalance problem. Sci Rep 2016; 6:25941. [PMID: 27181057 PMCID: PMC4867574 DOI: 10.1038/srep25941] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Accepted: 04/22/2016] [Indexed: 11/29/2022] Open
Abstract
MicroRNAs (miRNAs) are ~20–25 nucleotides non-coding RNAs, which regulated gene expression in the post-transcriptional level. The accurate rate of identifying the start sit of mature miRNA from a given pre-miRNA remains lower. It is noting that the mature miRNA prediction is a class-imbalanced problem which also leads to the unsatisfactory performance of these methods. We improved the prediction accuracy of classifier using balanced datasets and presented MatFind which is used for identifying 5′ mature miRNAs candidates from their pre-miRNA based on ensemble SVM classifiers with idea of adaboost. Firstly, the balanced-dataset was extract based on K-nearest neighbor algorithm. Secondly, the multiple SVM classifiers were trained in orderly using the balance datasets base on represented features. At last, all SVM classifiers were combined together to form the ensemble classifier. Our results on independent testing dataset show that the proposed method is more efficient than one without treating class imbalance problem. Moreover, MatFind achieves much higher classification accuracy than other three approaches. The ensemble SVM classifiers and balanced-datasets can solve the class-imbalanced problem, as well as improve performance of classifier for mature miRNA identification. MatFind is an accurate and fast method for 5′ mature miRNA identification.
Collapse
|
15
|
Yu L, Shao C, Ye X, Meng Y, Zhou Y, Chen M. miRNA Digger: a comprehensive pipeline for genome-wide novel miRNA mining. Sci Rep 2016; 6:18901. [PMID: 26732371 PMCID: PMC4702050 DOI: 10.1038/srep18901] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 11/27/2015] [Indexed: 11/09/2022] Open
Abstract
MicroRNAs (miRNAs) are important regulators of gene expression. The recent advances in high-throughput sequencing (HTS) technique have greatly facilitated large-scale detection of the miRNAs. However, thoroughly discovery of novel miRNAs from the available HTS data sets remains a major challenge. In this study, we observed that Dicer-mediated cleavage sites for the processing of the miRNA precursors could be mapped by using degradome sequencing data in both animals and plants. In this regard, a novel tool, miRNA Digger, was developed for systematical discovery of miRNA candidates through genome-wide screening of cleavage signals based on degradome sequencing data. To test its sensitivity and reliability, miRNA Digger was applied to discover miRNAs from four organs of Arabidopsis. The results revealed that a majority of already known mature miRNAs along with their miRNA*s expressed in these four organs were successfully recovered. Notably, a total of 30 novel miRNA-miRNA* pairs that have not been registered in miRBase were discovered by miRNA Digger. After target prediction and degradome sequencing data-based validation, eleven miRNA-target interactions involving six of the novel miRNAs were identified. Taken together, miRNA Digger could be applied for sensitive detection of novel miRNAs and it could be freely downloaded from http://www.bioinfolab.cn/miRNA_Digger/index.html.
Collapse
Affiliation(s)
- Lan Yu
- College of Life Sciences, Huzhou University, Huzhou 313000, P.R. China
| | - Chaogang Shao
- College of Life Sciences, Huzhou University, Huzhou 313000, P.R. China
| | - Xinghuo Ye
- College of Life Sciences, Huzhou University, Huzhou 313000, P.R. China
| | - Yijun Meng
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou 310036, P.R. China
| | - Yincong Zhou
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| |
Collapse
|
16
|
MatPred: Computational Identification of Mature MicroRNAs within Novel Pre-MicroRNAs. BIOMED RESEARCH INTERNATIONAL 2015; 2015:546763. [PMID: 26682221 PMCID: PMC4670854 DOI: 10.1155/2015/546763] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2015] [Revised: 09/18/2015] [Accepted: 09/28/2015] [Indexed: 12/22/2022]
Abstract
Background. MicroRNAs (miRNAs) are short noncoding RNAs integral for regulating gene expression at the posttranscriptional level. However, experimental methods often fall short in finding miRNAs expressed at low levels or in specific tissues. While several computational methods have been developed for predicting the localization of mature miRNAs within the precursor transcript, the prediction accuracy requires significant improvement. Methodology/Principal Findings. Here, we present MatPred, which predicts mature miRNA candidates within novel pre-miRNA transcripts. In addition to the relative locus of the mature miRNA within the pre-miRNA hairpin loop and minimum free energy, we innovatively integrated features that describe the nucleotide-specific RNA secondary structure characteristics. In total, 94 features were extracted from the mature miRNA loci and flanking regions. The model was trained based on a radial basis function kernel/support vector machine (RBF/SVM). Our method can predict precise locations of mature miRNAs, as affirmed by experimentally verified human pre-miRNAs or pre-miRNAs candidates, thus achieving a significant advantage over existing methods. Conclusions. MatPred is a highly effective method for identifying mature miRNAs within novel pre-miRNA transcripts. Our model significantly outperformed three other widely used existing methods. Such processing prediction methods may provide important insight into miRNA biogenesis.
Collapse
|
17
|
miRLocator: Machine Learning-Based Prediction of Mature MicroRNAs within Plant Pre-miRNA Sequences. PLoS One 2015; 10:e0142753. [PMID: 26558614 PMCID: PMC4641693 DOI: 10.1371/journal.pone.0142753] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Accepted: 10/26/2015] [Indexed: 11/19/2022] Open
Abstract
MicroRNAs (miRNAs) are a class of short, non-coding RNA that play regulatory roles in a wide variety of biological processes, such as plant growth and abiotic stress responses. Although several computational tools have been developed to identify primary miRNAs and precursor miRNAs (pre-miRNAs), very few provide the functionality of locating mature miRNAs within plant pre-miRNAs. This manuscript introduces a novel algorithm for predicting miRNAs named miRLocator, which isbased on machine learning techniques and sequence and structural features extracted from miRNA:miRNA* duplexes. To address the class imbalance problem (few real miRNAs and a large number of pseudo miRNAs), the prediction models in miRLocator were optimized by considering critical (and often ignored) factors that can markedly affect the prediction accuracy of mature miRNAs, including the machine learning algorithm and the ratio between training positive and negative samples. Ten-fold cross-validation on 5854 experimentally validated miRNAs from 19 plant species showed that miRLocator performed better than the state-of-art miRNA predictor miRdup in locating mature miRNAs within plant pre-miRNAs. miRLocator will aid researchers interested in discovering miRNAs from model and non-model plant species.
Collapse
|
18
|
Improved Pre-miRNA Classification by Reducing the Effect of Class Imbalance. BIOMED RESEARCH INTERNATIONAL 2015; 2015:960108. [PMID: 26640803 PMCID: PMC4657081 DOI: 10.1155/2015/960108] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Revised: 10/18/2015] [Accepted: 10/20/2015] [Indexed: 12/03/2022]
Abstract
MicroRNAs (miRNAs) play important roles in the diverse biological processes of animals and plants. Although the prediction methods based on machine learning can identify nonhomologous and species-specific miRNAs, they suffered from severe class imbalance on real and pseudo pre-miRNAs. We propose a pre-miRNA classification method based on cost-sensitive ensemble learning and refer to it as MiRNAClassify. Through a series of iterations, the information of all the positive and negative samples is completely exploited. In each iteration, a new classification instance is trained by the equal number of positive and negative samples. In this way, the negative effect of class imbalance is efficiently relieved. The new instance primarily focuses on those samples that are easy to be misclassified. In addition, the positive samples are assigned higher cost weight than the negative samples. MiRNAClassify is compared with several state-of-the-art methods and some well-known classification models by testing the datasets about human, animal, and plant. The result of cross validation indicates that MiRNAClassify significantly outperforms other methods and models. In addition, the newly added pre-miRNAs are used to further evaluate the ability of these methods to discover novel pre-miRNAs. MiRNAClassify still achieves consistently superior performance and can discover more pre-miRNAs.
Collapse
|
19
|
Vorozheykin PS, Titov II. Web server for prediction of miRNAs and their precursors and binding sites. Mol Biol 2015. [DOI: 10.1134/s0026893315050192] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
20
|
Abstract
MicroRNAs (miRNAs) are small single-stranded noncoding RNAs that play an important role in post-transcriptional regulation of gene expression. In this paper, we present a web server for ab initio prediction of the human miRNAs and their precursors. The prediction methods are based on the hidden Markov Models and the context-structural characteristics. By taking into account the identified patterns of primary and secondary structures of the pre-miRNAs, a new HMM model is proposed and the existing context-structural Markov model is modified. The evaluation of the method performance has shown that it can accurately predict novel human miRNAs. Comparing with the existing methods we demonstrate that our method has a higher prediction quality both for human pre-miRNAs and miRNAs. The models have also showed good results in the prediction of the mouse miRNAs. The web server is available at http://wwwmgs.bionet.nsc.ru/mgs/programs/rnaanalys (mirror http://miRNA.at.nsu.ru ).
Collapse
Affiliation(s)
- Igor I Titov
- Institute of Cytology and Genetics, SB RAS, 10 Lavrentyev Avenue, Novosibirsk 630090, Russian Federation , Novosibirsk State University, 2 Pirogov Street, Novosibirsk 630090, Russian Federation
| | | |
Collapse
|
21
|
Leclercq M, Diallo AB, Blanchette M. Computational prediction of the localization of microRNAs within their pre-miRNA. Nucleic Acids Res 2013; 41:7200-11. [PMID: 23748953 PMCID: PMC3753617 DOI: 10.1093/nar/gkt466] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2013] [Revised: 04/30/2013] [Accepted: 05/05/2013] [Indexed: 12/19/2022] Open
Abstract
MicroRNAs (miRNAs) are short RNA species derived from hairpin-forming miRNA precursors (pre-miRNA) and acting as key posttranscriptional regulators. Most computational tools labeled as miRNA predictors are in fact pre-miRNA predictors and provide no information about the putative miRNA location within the pre-miRNA. Sequence and structural features that determine the location of the miRNA, and the extent to which these properties vary from species to species, are poorly understood. We have developed miRdup, a computational predictor for the identification of the most likely miRNA location within a given pre-miRNA or the validation of a candidate miRNA. MiRdup is based on a random forest classifier trained with experimentally validated miRNAs from miRbase, with features that characterize the miRNA-miRNA* duplex. Because we observed that miRNAs have sequence and structural properties that differ between species, mostly in terms of duplex stability, we trained various clade-specific miRdup models and obtained increased accuracy. MiRdup self-trains on the most recent version of miRbase and is easy to use. Combined with existing pre-miRNA predictors, it will be valuable for both de novo mapping of miRNAs and filtering of large sets of candidate miRNAs obtained from transcriptome sequencing projects. MiRdup is open source under the GPLv3 and available at http://www.cs.mcgill.ca/∼blanchem/mirdup/.
Collapse
Affiliation(s)
- Mickael Leclercq
- School of Computer Science and McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada H3A2B2 and Laboratoire de bioinformatique du département informatique, Université du Québec À Montréal, Montreal, Quebec, Canada H2X3Y7
| | - Abdoulaye Banire Diallo
- School of Computer Science and McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada H3A2B2 and Laboratoire de bioinformatique du département informatique, Université du Québec À Montréal, Montreal, Quebec, Canada H2X3Y7
| | - Mathieu Blanchette
- School of Computer Science and McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada H3A2B2 and Laboratoire de bioinformatique du département informatique, Université du Québec À Montréal, Montreal, Quebec, Canada H2X3Y7
| |
Collapse
|
22
|
miReader: Discovering Novel miRNAs in Species without Sequenced Genome. PLoS One 2013; 8:e66857. [PMID: 23805282 PMCID: PMC3689854 DOI: 10.1371/journal.pone.0066857] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2013] [Accepted: 05/11/2013] [Indexed: 12/26/2022] Open
Abstract
Along with computational approaches, NGS led technologies have caused a major impact upon the discoveries made in the area of miRNA biology, including novel miRNAs identification. However, to this date all microRNA discovery tools compulsorily depend upon the availability of reference or genomic sequences. Here, for the first time a novel approach, miReader, has been introduced which could discover novel miRNAs without any dependence upon genomic/reference sequences. The approach used NGS read data to build highly accurate miRNA models, molded through a Multi-boosting algorithm with Best-First Tree as its base classifier. It was comprehensively tested over large amount of experimental data from wide range of species including human, plants, nematode, zebrafish and fruit fly, performing consistently with >90% accuracy. Using the same tool over Illumina read data for Miscanthus, a plant whose genome is not sequenced; the study reported 21 novel mature miRNA duplex candidates. Considering the fact that miRNA discovery requires handling of high throughput data, the entire approach has been implemented in a standalone parallel architecture. This work is expected to cause a positive impact over the area of miRNA discovery in majority of species, where genomic sequence availability would not be a compulsion any more.
Collapse
|