1
|
Daniel Thomas S, Vijayakumar K, John L, Krishnan D, Rehman N, Revikumar A, Kandel Codi JA, Prasad TSK, S S V, Raju R. Machine Learning Strategies in MicroRNA Research: Bridging Genome to Phenome. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2024; 28:213-233. [PMID: 38752932 DOI: 10.1089/omi.2024.0047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2024]
Abstract
MicroRNAs (miRNAs) have emerged as a prominent layer of regulation of gene expression. This article offers the salient and current aspects of machine learning (ML) tools and approaches from genome to phenome in miRNA research. First, we underline that the complexity in the analysis of miRNA function ranges from their modes of biogenesis to the target diversity in diverse biological conditions. Therefore, it is imperative to first ascertain the miRNA coding potential of genomes and understand the regulatory mechanisms of their expression. This knowledge enables the efficient classification of miRNA precursors and the identification of their mature forms and respective target genes. Second, and because one miRNA can target multiple mRNAs and vice versa, another challenge is the assessment of the miRNA-mRNA target interaction network. Furthermore, long-noncoding RNA (lncRNA)and circular RNAs (circRNAs) also contribute to this complexity. ML has been used to tackle these challenges at the high-dimensional data level. The present expert review covers more than 100 tools adopting various ML approaches pertaining to, for example, (1) miRNA promoter prediction, (2) precursor classification, (3) mature miRNA prediction, (4) miRNA target prediction, (5) miRNA- lncRNA and miRNA-circRNA interactions, (6) miRNA-mRNA expression profiling, (7) miRNA regulatory module detection, (8) miRNA-disease association, and (9) miRNA essentiality prediction. Taken together, we unpack, critically examine, and highlight the cutting-edge synergy of ML approaches and miRNA research so as to develop a dynamic and microlevel understanding of human health and diseases.
Collapse
Affiliation(s)
- Sonet Daniel Thomas
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Krithika Vijayakumar
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Levin John
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Deepak Krishnan
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Niyas Rehman
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Amjesh Revikumar
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Kerala Genome Data Centre, Kerala Development and Innovation Strategic Council, Thiruvananthapuram, Kerala, India
| | - Jalaluddin Akbar Kandel Codi
- Department of Surgical Oncology, Yenepoya Medical College, Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | | | - Vinodchandra S S
- Department of Computer Science, University of Kerala, Thiruvananthapuram, Kerala, India
| | - Rajesh Raju
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| |
Collapse
|
2
|
Singh J, Khanna NN, Rout RK, Singh N, Laird JR, Singh IM, Kalra MK, Mantella LE, Johri AM, Isenovic ER, Fouda MM, Saba L, Fatemi M, Suri JS. GeneAI 3.0: powerful, novel, generalized hybrid and ensemble deep learning frameworks for miRNA species classification of stationary patterns from nucleotides. Sci Rep 2024; 14:7154. [PMID: 38531923 PMCID: PMC11344070 DOI: 10.1038/s41598-024-56786-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 03/11/2024] [Indexed: 03/28/2024] Open
Abstract
Due to the intricate relationship between the small non-coding ribonucleic acid (miRNA) sequences, the classification of miRNA species, namely Human, Gorilla, Rat, and Mouse is challenging. Previous methods are not robust and accurate. In this study, we present AtheroPoint's GeneAI 3.0, a powerful, novel, and generalized method for extracting features from the fixed patterns of purines and pyrimidines in each miRNA sequence in ensemble paradigms in machine learning (EML) and convolutional neural network (CNN)-based deep learning (EDL) frameworks. GeneAI 3.0 utilized five conventional (Entropy, Dissimilarity, Energy, Homogeneity, and Contrast), and three contemporary (Shannon entropy, Hurst exponent, Fractal dimension) features, to generate a composite feature set from given miRNA sequences which were then passed into our ML and DL classification framework. A set of 11 new classifiers was designed consisting of 5 EML and 6 EDL for binary/multiclass classification. It was benchmarked against 9 solo ML (SML), 6 solo DL (SDL), 12 hybrid DL (HDL) models, resulting in a total of 11 + 27 = 38 models were designed. Four hypotheses were formulated and validated using explainable AI (XAI) as well as reliability/statistical tests. The order of the mean performance using accuracy (ACC)/area-under-the-curve (AUC) of the 24 DL classifiers was: EDL > HDL > SDL. The mean performance of EDL models with CNN layers was superior to that without CNN layers by 0.73%/0.92%. Mean performance of EML models was superior to SML models with improvements of ACC/AUC by 6.24%/6.46%. EDL models performed significantly better than EML models, with a mean increase in ACC/AUC of 7.09%/6.96%. The GeneAI 3.0 tool produced expected XAI feature plots, and the statistical tests showed significant p-values. Ensemble models with composite features are highly effective and generalized models for effectively classifying miRNA sequences.
Collapse
Affiliation(s)
- Jaskaran Singh
- Department of Computer Science, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India
| | - Narendra N Khanna
- Department of Cardiology, Indraprastha APOLLO Hospitals, New Delhi, India
| | - Ranjeet K Rout
- Department of Computer Science and Engineering, NIT Srinagar, Hazratbal, Srinagar, India
| | - Narpinder Singh
- Department of Food Science, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India
| | - John R Laird
- Heart and Vascular Institute, Adventist Health St. Helena, St Helena, CA, USA
| | - Inder M Singh
- Advanced Cardiac and Vascular Institute, Sacramento, CA, USA
| | - Mannudeep K Kalra
- Department of Radiology, Massachusetts General Hospital, Boston, MA, 02115, USA
| | - Laura E Mantella
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, Canada
| | - Amer M Johri
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, Canada
| | - Esma R Isenovic
- Laboratory for Molecular Genetics and Radiobiology, University of Belgrade, Belgrade, Serbia
| | - Mostafa M Fouda
- Department of Electrical and Computer Engineering, Idaho State University, Pocatello, ID, 83209, USA
| | - Luca Saba
- Department of Neurology, University of Cagliari, Cagliari, Italy
| | - Mostafa Fatemi
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, 55905, USA
| | - Jasjit S Suri
- Stroke Monitoring and Diagnostic Division, AtheroPoint LLC, Roseville, CA, 95661, USA.
| |
Collapse
|
3
|
Abstract
Tiny single-stranded noncoding RNAs with size 19-27 nucleotides serve as microRNAs (miRNAs), which have emerged as key gene regulators in the last two decades. miRNAs serve as one of the hallmarks in regulatory pathways with critical roles in human diseases. Ever since the discovery of miRNAs, researchers have focused on how mature miRNAs are produced from precursor mRNAs. Experimental methods are faced with notorious challenges in terms of experimental design, since it is time consuming and not cost-effective. Hence, different computational methods have been employed for the identification of miRNA sequences where most of them labeled as miRNA predictors are in fact pre-miRNA predictors and provide no information about the putative miRNA location within the pre-miRNA. This chapter provides an update and the current state of the art in this area covering various methods and 15 software suites used for prediction of mature miRNA.
Collapse
Affiliation(s)
- Malik Yousef
- Department of Information System, Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel
| | - Alisha Parveen
- Rudolf‑Zenker Institute of Experimental Surgery, Rostock University Medical Center, Rostock, Germany
| | - Abhishek Kumar
- Institute of Bioinformatics, Bangalore, India. .,Manipal Academy of Higher Education (MAHE), Manipal, Karnataka, India.
| |
Collapse
|
4
|
An Approach to Identify Individual Functional Single Nucleotide Polymorphisms and Isoform MicroRNAs. BIOMED RESEARCH INTERNATIONAL 2019; 2019:6193673. [PMID: 31467902 PMCID: PMC6699389 DOI: 10.1155/2019/6193673] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 05/28/2019] [Accepted: 06/24/2019] [Indexed: 01/08/2023]
Abstract
MicroRNAs (miRNAs) and single nucleotide polymorphisms (SNPs) play important roles in disease risk and development, especially cancer. Importantly, when SNPs are located in pre-miRNAs, they affect their splicing mechanism and change the function of miRNAs. To improve disease risk assessment, we propose an approach and developed a software tool, IsomiR_Find, to identify disease/phenotype-related SNPs and isomiRs in individuals. Our approach is based on the individual's samples, with SNP information extracted from the 1000 Genomes Project. SNPs were mapped to pre-miRNAs based on whole-genome coordinates and then SNP-pre-miRNA sequences were constructed. Moreover, we developed matpred2, a software tool to identify the four splicing sites of mature miRNAs. Using matpred2, we identified isomiRs and then verified them by searching within individual miRNA sequencing data. Our approach yielded biomarkers for biological experiments, mined functions of miRNAs and SNPs, improved disease risk assessment, and provided a way to achieve individualized precision medicine.
Collapse
|
5
|
Computational Resources for Prediction and Analysis of Functional miRNA and Their Targetome. Methods Mol Biol 2019; 1912:215-250. [PMID: 30635896 DOI: 10.1007/978-1-4939-8982-9_9] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
microRNAs are evolutionarily conserved, endogenously produced, noncoding RNAs (ncRNAs) of approximately 19-24 nucleotides (nts) in length known to exhibit gene silencing of complementary target sequence. Their deregulated expression is reported in various disease conditions and thus has therapeutic implications. In the last decade, various computational resources are published in this field. In this chapter, we have reviewed bioinformatics resources, i.e., miRNA-centered databases, algorithms, and tools to predict miRNA targets. First section has enlisted more than 75 databases, which mainly covers information regarding miRNA registries, targets, disease associations, differential expression, interactions with other noncoding RNAs, and all-in-one resources. In the algorithms section, we have compiled about 140 algorithms from eight subcategories, viz. for the prediction of precursor (pre-) and mature miRNAs. These algorithms are developed on various sequence, structure, and thermodynamic based features incorporated into different machine learning techniques (MLTs). In addition, computational identification of miRNAs from high-throughput next generation sequencing (NGS) data and their variants, viz. isomiRs, differential expression, miR-SNPs, and functional annotation, are discussed. Prediction and analysis of miRNAs and their associated targets are also evaluated under miR-targets section providing knowledge regarding novel miRNA targets and complex host-pathogen interactions. In conclusion, we have provided comprehensive review of in silico resources published in miRNA research to help scientific community be updated and choose the appropriate tool according to their needs.
Collapse
|
6
|
Adaboost-SVM-based probability algorithm for the prediction of all mature miRNA sites based on structured-sequence features. Sci Rep 2019; 9:1521. [PMID: 30728425 PMCID: PMC6365589 DOI: 10.1038/s41598-018-38048-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 12/18/2018] [Indexed: 02/07/2023] Open
Abstract
The significant role of microRNAs (miRNAs) in various biological processes and diseases has been widely studied and reported in recent years. Several computational methods associated with mature miRNA identification suffer various limitations involving canonical biological features extraction, class imbalance, and classifier performance. The proposed classifier, miRFinder, is an accurate alternative for the identification of mature miRNAs. The structured-sequence features were proposed to precisely extract miRNA biological features, and three algorithms were selected to obtain the canonical features based on the classifier performance. Moreover, the center of mass near distance training based on K-means was provided to improve the class imbalance problem. In particular, the AdaBoost-SVM algorithm was used to construct the classifier. The classifier training process focuses on incorrectly classified samples, and the integrated results use the common decision strategies of the weak classifier with different weights. In addition, the all mature miRNA sites were predicted by different classifiers based on the features of different sites. Compared with other methods, the performance of the classifiers has a high degree of efficacy for the identification of mature miRNAs. MiRFinder is freely available at https://github.com/wangying0128/miRFinder .
Collapse
|
7
|
Improving classification of mature microRNA by solving class imbalance problem. Sci Rep 2016; 6:25941. [PMID: 27181057 PMCID: PMC4867574 DOI: 10.1038/srep25941] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Accepted: 04/22/2016] [Indexed: 11/29/2022] Open
Abstract
MicroRNAs (miRNAs) are ~20–25 nucleotides non-coding RNAs, which regulated gene expression in the post-transcriptional level. The accurate rate of identifying the start sit of mature miRNA from a given pre-miRNA remains lower. It is noting that the mature miRNA prediction is a class-imbalanced problem which also leads to the unsatisfactory performance of these methods. We improved the prediction accuracy of classifier using balanced datasets and presented MatFind which is used for identifying 5′ mature miRNAs candidates from their pre-miRNA based on ensemble SVM classifiers with idea of adaboost. Firstly, the balanced-dataset was extract based on K-nearest neighbor algorithm. Secondly, the multiple SVM classifiers were trained in orderly using the balance datasets base on represented features. At last, all SVM classifiers were combined together to form the ensemble classifier. Our results on independent testing dataset show that the proposed method is more efficient than one without treating class imbalance problem. Moreover, MatFind achieves much higher classification accuracy than other three approaches. The ensemble SVM classifiers and balanced-datasets can solve the class-imbalanced problem, as well as improve performance of classifier for mature miRNA identification. MatFind is an accurate and fast method for 5′ mature miRNA identification.
Collapse
|