1
|
Nath A, Bora U. RNAinsecta: A tool for prediction of precursor microRNA in insects and search for their target in the model organism Drosophila melanogaster. PLoS One 2023; 18:e0287323. [PMID: 37812647 PMCID: PMC10561860 DOI: 10.1371/journal.pone.0287323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 06/03/2023] [Indexed: 10/11/2023] Open
Abstract
INTRODUCTION AND BACKGROUND Pre-MicroRNAs are the hairpin loops from which microRNAs are produced that have been found to negatively regulate gene expression in several organisms. In insects, microRNAs participate in several biological processes including metamorphosis, reproduction, immune response, etc. Numerous tools have been designed in recent years to predict novel pre-microRNA using binary machine learning classifiers where prediction models are trained with true and pseudo pre-microRNA hairpin loops. Currently, there are no existing tool that is exclusively designed for insect pre-microRNA detection. AIM Application of machine learning algorithms to develop an open source tool for prediction of novel precursor microRNA in insects and search for their miRNA targets in the model insect organism, Drosophila melanogaster. METHODS Machine learning algorithms such as Random Forest, Support Vector Machine, Logistic Regression and K-Nearest Neighbours were used to train insect true and false pre-microRNA features with 10-fold Cross Validation on SMOTE and Near-Miss datasets. miRNA targets IDs were collected from miRTarbase and their corresponding transcripts were collected from FlyBase. We used miRanda algorithm for the target searching. RESULTS In our experiment, SMOTE performed significantly better than Near-Miss for which it was used for modelling. We kept the best performing parameters after obtaining initial mean accuracy scores >90% of Cross Validation. The trained models on Support Vector Machine achieved accuracy of 92.19% while the Random Forest attained an accuracy of 80.28% on our validation dataset. These models are hosted online as web application called RNAinsecta. Further, searching target for the predicted pre-microRNA in Drosophila melanogaster has been provided in RNAinsecta.
Collapse
Affiliation(s)
- Adhiraj Nath
- Department of BSBE, IIT Guwahati, North Guwahati, Assam, India
| | - Utpal Bora
- Department of BSBE, IIT Guwahati, North Guwahati, Assam, India
| |
Collapse
|
2
|
Loganathan T, Doss C GP. Non-coding RNAs in human health and disease: potential function as biomarkers and therapeutic targets. Funct Integr Genomics 2023; 23:33. [PMID: 36625940 PMCID: PMC9838419 DOI: 10.1007/s10142-022-00947-4] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 12/14/2022] [Accepted: 12/15/2022] [Indexed: 01/11/2023]
Abstract
Human diseases have been a critical threat from the beginning of human history. Knowing the origin, course of action and treatment of any disease state is essential. A microscopic approach to the molecular field is a more coherent and accurate way to explore the mechanism, progression, and therapy with the introduction and evolution of technology than a macroscopic approach. Non-coding RNAs (ncRNAs) play increasingly important roles in detecting, developing, and treating all abnormalities related to physiology, pathology, genetics, epigenetics, cancer, and developmental diseases. Noncoding RNAs are becoming increasingly crucial as powerful, multipurpose regulators of all biological processes. Parallel to this, a rising amount of scientific information has revealed links between abnormal noncoding RNA expression and human disorders. Numerous non-coding transcripts with unknown functions have been found in addition to advancements in RNA-sequencing methods. Non-coding linear RNAs come in a variety of forms, including circular RNAs with a continuous closed loop (circRNA), long non-coding RNAs (lncRNA), and microRNAs (miRNA). This comprises specific information on their biogenesis, mode of action, physiological function, and significance concerning disease (such as cancer or cardiovascular diseases and others). This study review focuses on non-coding RNA as specific biomarkers and novel therapeutic targets.
Collapse
Affiliation(s)
- Tamizhini Loganathan
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology (VIT), Vellore- 632014, Tamil Nadu, India
| | - George Priya Doss C
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology (VIT), Vellore- 632014, Tamil Nadu, India.
| |
Collapse
|
3
|
Yan C, Ding C, Duan G. PMMS: Predicting essential miRNAs based on multi-head self-attention mechanism and sequences. Front Med (Lausanne) 2022; 9:1015278. [DOI: 10.3389/fmed.2022.1015278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 10/25/2022] [Indexed: 11/18/2022] Open
Abstract
Increasing evidence has proved that miRNA plays a significant role in biological progress. In order to understand the etiology and mechanisms of various diseases, it is necessary to identify the essential miRNAs. However, it is time-consuming and expensive to identify essential miRNAs by using traditional biological experiments. It is critical to develop computational methods to predict potential essential miRNAs. In this study, we provided a new computational method (called PMMS) to identify essential miRNAs by using multi-head self-attention and sequences. First, PMMS computes the statistic and structure features and extracts the static feature by concatenating them. Second, PMMS extracts the deep learning original feature (BiLSTM-based feature) by using bi-directional long short-term memory (BiLSTM) and pre-miRNA sequences. In addition, we further obtained the multi-head self-attention feature (MS-based feature) based on BiLSTM-based feature and multi-head self-attention mechanism. By considering the importance of the subsequence of pre-miRNA to the static feature of miRNA, we obtained the deep learning final feature (WA-based feature) based on the weighted attention mechanism. Finally, we concatenated WA-based feature and static feature as an input to the multilayer perceptron) model to predict essential miRNAs. We conducted five-fold cross-validation to evaluate the prediction performance of PMMS. The areas under the ROC curves (AUC), the F1-score, and accuracy (ACC) are used as performance metrics. From the experimental results, PMMS obtained best prediction performances (AUC: 0.9556, F1-score: 0.9030, and ACC: 0.9097). It also outperformed other compared methods. The experimental results also illustrated that PMMS is an effective method to identify essential miRNA.
Collapse
|
4
|
R E, Jain DK, Kotecha K, Pandya S, Reddy SS, E R, Varadarajan V, Mahanti A, V S. Hybrid Deep Neural Network for Handling Data Imbalance in Precursor MicroRNA. Front Public Health 2022; 9:821410. [PMID: 35004605 PMCID: PMC8733243 DOI: 10.3389/fpubh.2021.821410] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 12/03/2021] [Indexed: 11/13/2022] Open
Abstract
Over the last decade, the field of bioinformatics has been increasing rapidly. Robust bioinformatics tools are going to play a vital role in future progress. Scientists working in the field of bioinformatics conduct a large number of researches to extract knowledge from the biological data available. Several bioinformatics issues have evolved as a result of the creation of massive amounts of unbalanced data. The classification of precursor microRNA (pre miRNA) from the imbalanced RNA genome data is one such problem. The examinations proved that pre miRNAs (precursor microRNAs) could serve as oncogene or tumor suppressors in various cancer types. This paper introduces a Hybrid Deep Neural Network framework (H-DNN) for the classification of pre miRNA in imbalanced data. The proposed H-DNN framework is an integration of Deep Artificial Neural Networks (Deep ANN) and Deep Decision Tree Classifiers. The Deep ANN in the proposed H-DNN helps to extract the meaningful features and the Deep Decision Tree Classifier helps to classify the pre miRNA accurately. Experimentation of H-DNN was done with genomes of animals, plants, humans, and Arabidopsis with an imbalance ratio up to 1:5000 and virus with a ratio of 1:400. Experimental results showed an accuracy of more than 99% in all the cases and the time complexity of the proposed H-DNN is also very less when compared with the other existing approaches.
Collapse
Affiliation(s)
- Elakkiya R
- School of Computing, SASTRA Deemed University, Thanjavur, India
| | - Deepak Kumar Jain
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Ketan Kotecha
- Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune, India
| | - Sharnil Pandya
- Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
| | | | - Rajalakshmi E
- School of Computing, SASTRA Deemed University, Thanjavur, India
| | - Vijayakumar Varadarajan
- School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia
| | | | | |
Collapse
|
5
|
Ayachit G, Shaikh I, Pandya H, Das J. Salient Features, Data and Algorithms for MicroRNA Screening from Plants: A Review on the Gains and Pitfalls of Machine Learning Techniques. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200601121756] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The era of big data and high-throughput genomic technology has enabled scientists to
have a clear view of plant genomic profiles. However, it has also led to a massive need for
computational tools and strategies to interpret this data. In this scenario of huge data inflow,
machine learning (ML) approaches are emerging to be the most promising for analysing
heterogeneous and unstructured biological datasets. Extending its application to healthcare and
agriculture, ML approaches are being useful for microRNA (miRNA) screening as well.
Identification of miRNAs is a crucial step towards understanding post-transcriptional gene
regulation and miRNA-related pathology. The use of ML tools is becoming indispensable in
analysing such data and identifying species-specific, non-conserved miRNA. However, these
techniques have their own benefits and lacunas. In this review, we will discuss the current scenario
and pitfalls of ML-based tools for plant miRNA identification and provide some insights into the
important features, the need for deep learning models and direction in which studies are needed.
Collapse
Affiliation(s)
- Garima Ayachit
- Department of Botany, Bioinformatics and Climate Change, University School of Sciences, Gujarat University, Navrangpura, Ahmedabad – 380009, India
| | - Inayatullah Shaikh
- Gujarat State Biotechnology Mission, Department of Science and Technology, Government of Gujarat, Gandhinagar, Gujarat 382011, India
| | - Himanshu Pandya
- Department of Botany, Bioinformatics and Climate Change, University School of Sciences, Gujarat University, Navrangpura, Ahmedabad – 380009, India
| | - Jayashankar Das
- Gujarat State Biotechnology Mission, Department of Science and Technology, Government of Gujarat, Gandhinagar, Gujarat 382011, India
| |
Collapse
|
6
|
Bugnon LA, Yones C, Milone DH, Stegmayer G. Genome-wide discovery of pre-miRNAs: comparison of recent approaches based on machine learning. Brief Bioinform 2020; 22:5894456. [PMID: 34020552 DOI: 10.1093/bib/bbaa184] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Revised: 07/13/2020] [Accepted: 07/18/2020] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION The genome-wide discovery of microRNAs (miRNAs) involves identifying sequences having the highest chance of being a novel miRNA precursor (pre-miRNA), within all the possible sequences in a complete genome. The known pre-miRNAs are usually just a few in comparison to the millions of candidates that have to be analyzed. This is of particular interest in non-model species and recently sequenced genomes, where the challenge is to find potential pre-miRNAs only from the sequenced genome. The task is unfeasible without the help of computational methods, such as deep learning. However, it is still very difficult to find an accurate predictor, with a low false positive rate in this genome-wide context. Although there are many available tools, these have not been tested in realistic conditions, with sequences from whole genomes and the high class imbalance inherent to such data. RESULTS In this work, we review six recent methods for tackling this problem with machine learning. We compare the models in five genome-wide datasets: Arabidopsis thaliana, Caenorhabditis elegans, Anopheles gambiae, Drosophila melanogaster, Homo sapiens. The models have been designed for the pre-miRNAs prediction task, where there is a class of interest that is significantly underrepresented (the known pre-miRNAs) with respect to a very large number of unlabeled samples. It was found that for the smaller genomes and smaller imbalances, all methods perform in a similar way. However, for larger datasets such as the H. sapiens genome, it was found that deep learning approaches using raw information from the sequences reached the best scores, achieving low numbers of false positives. AVAILABILITY The source code to reproduce these results is in: http://sourceforge.net/projects/sourcesinc/files/gwmirna Additionally, the datasets are freely available in: https://sourceforge.net/projects/sourcesinc/files/mirdata.
Collapse
Affiliation(s)
- Leandro A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| | - Cristian Yones
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|
7
|
Bugnon LA, Yones C, Milone DH, Stegmayer G. Deep Neural Architectures for Highly Imbalanced Data in Bioinformatics. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2857-2867. [PMID: 31170082 DOI: 10.1109/tnnls.2019.2914471] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In the postgenome era, many problems in bioinformatics have arisen due to the generation of large amounts of imbalanced data. In particular, the computational classification of precursor microRNA (pre-miRNA) involves a high imbalance in the classes. For this task, a classifier is trained to identify RNA sequences having the highest chance of being miRNA precursors. The big issue is that well-known pre-miRNAs are usually just a few in comparison to the hundreds of thousands of candidate sequences in a genome, which results in highly imbalanced data. This imbalance has a strong influence on most standard classifiers and, if not properly addressed, the classifier is not able to work properly in a real-life scenario. This work provides a comparative assessment of recent deep neural architectures for dealing with the large imbalanced data issue in the classification of pre-miRNAs. We present and analyze recent architectures in a benchmark framework with genomes of animals and plants, with increasing imbalance ratios up to 1:2000. We also propose a new graphical way for comparing classifiers performance in the context of high-class imbalance. The comparative results obtained show that, at a very high imbalance, deep belief neural networks can provide the best performance.
Collapse
|
8
|
Stegmayer G, Di Persia LE, Rubiolo M, Gerard M, Pividori M, Yones C, Bugnon LA, Rodriguez T, Raad J, Milone DH. Predicting novel microRNA: a comprehensive comparison of machine learning approaches. Brief Bioinform 2020; 20:1607-1620. [PMID: 29800232 DOI: 10.1093/bib/bby037] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Revised: 03/26/2018] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION The importance of microRNAs (miRNAs) is widely recognized in the community nowadays because these short segments of RNA can play several roles in almost all biological processes. The computational prediction of novel miRNAs involves training a classifier for identifying sequences having the highest chance of being precursors of miRNAs (pre-miRNAs). The big issue with this task is that well-known pre-miRNAs are usually few in comparison with the hundreds of thousands of candidate sequences in a genome, which results in high class imbalance. This imbalance has a strong influence on most standard classifiers, and if not properly addressed in the model and the experiments, not only performance reported can be completely unrealistic but also the classifier will not be able to work properly for pre-miRNA prediction. Besides, another important issue is that for most of the machine learning (ML) approaches already used (supervised methods), it is necessary to have both positive and negative examples. The selection of positive examples is straightforward (well-known pre-miRNAs). However, it is difficult to build a representative set of negative examples because they should be sequences with hairpin structure that do not contain a pre-miRNA. RESULTS This review provides a comprehensive study and comparative assessment of methods from these two ML approaches for dealing with the prediction of novel pre-miRNAs: supervised and unsupervised training. We present and analyze the ML proposals that have appeared during the past 10 years in literature. They have been compared in several prediction tasks involving two model genomes and increasing imbalance levels. This work provides a review of existing ML approaches for pre-miRNA prediction and fair comparisons of the classifiers with same features and data sets, instead of just a revision of published software tools. The results and the discussion can help the community to select the most adequate bioinformatics approach according to the prediction task at hand. The comparative results obtained suggest that from low to mid-imbalance levels between classes, supervised methods can be the best. However, at very high imbalance levels, closer to real case scenarios, models including unsupervised and deep learning can provide better performance.
Collapse
Affiliation(s)
- Georgina Stegmayer
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Leandro E Di Persia
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Mariano Rubiolo
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Matias Gerard
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Milton Pividori
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Cristian Yones
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Leandro A Bugnon
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Tadeo Rodriguez
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Jonathan Raad
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Diego H Milone
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|
9
|
Guan ZX, Li SH, Zhang ZM, Zhang D, Yang H, Ding H. A Brief Survey for MicroRNA Precursor Identification Using Machine Learning Methods. Curr Genomics 2020; 21:11-25. [PMID: 32655294 PMCID: PMC7324890 DOI: 10.2174/1389202921666200214125102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 01/24/2020] [Accepted: 01/30/2020] [Indexed: 11/22/2022] Open
Abstract
MicroRNAs, a group of short non-coding RNA molecules, could regulate gene expression. Many diseases are associated with abnormal expression of miRNAs. Therefore, accurate identification of miRNA precursors is necessary. In the past 10 years, experimental methods, comparative genomics methods, and artificial intelligence methods have been used to identify pre-miRNAs. However, experimental methods and comparative genomics methods have their disadvantages, such as time-consuming. In contrast, machine learning-based method is a better choice. Therefore, the review summarizes the current advances in pre-miRNA recognition based on computational methods, including the construction of benchmark datasets, feature extraction methods, prediction algorithms, and the results of the models. And we also provide valid information about the predictors currently available. Finally, we give the future perspectives on the identification of pre-miRNAs. The review provides scholars with a whole background of pre-miRNA identification by using machine learning methods, which can help researchers have a clear understanding of progress of the research in this field.
Collapse
Affiliation(s)
- Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| |
Collapse
|
10
|
Yan C, Wu FX, Wang J, Duan G. PESM: predicting the essentiality of miRNAs based on gradient boosting machines and sequences. BMC Bioinformatics 2020; 21:111. [PMID: 32183740 PMCID: PMC7079416 DOI: 10.1186/s12859-020-3426-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Accepted: 02/21/2020] [Indexed: 11/16/2022] Open
Abstract
Background MicroRNAs (miRNAs) are a kind of small noncoding RNA molecules that are direct posttranscriptional regulations of mRNA targets. Studies have indicated that miRNAs play key roles in complex diseases by taking part in many biological processes, such as cell growth, cell death and so on. Therefore, in order to improve the effectiveness of disease diagnosis and treatment, it is appealing to develop advanced computational methods for predicting the essentiality of miRNAs. Result In this study, we propose a method (PESM) to predict the miRNA essentiality based on gradient boosting machines and miRNA sequences. First, PESM extracts the sequence and structural features of miRNAs. Then it uses gradient boosting machines to predict the essentiality of miRNAs. We conduct the 5-fold cross-validation to assess the prediction performance of our method. The area under the receiver operating characteristic curve (AUC), F-measure and accuracy (ACC) are used as the metrics to evaluate the prediction performance. We also compare PESM with other three competing methods which include miES, Gaussian Naive Bayes and Support Vector Machine. Conclusion The results of experiments show that PESM achieves the better prediction performance (AUC: 0.9117, F-measure: 0.8572, ACC: 0.8516) than other three computing methods. In addition, the relative importance of all features also further shows that newly added features can be helpful to improve the prediction performance of methods.
Collapse
Affiliation(s)
- Cheng Yan
- Hunan Provincial Key Lab on Bioinformtics, School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China.,School of Computer and Information,Qiannan Normal University for Nationalities, Longshan Road, DuYun, 558000, China
| | - Fang-Xiang Wu
- Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9, Canada
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformtics, School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China
| | - Guihua Duan
- Hunan Provincial Key Lab on Bioinformtics, School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China.
| |
Collapse
|
11
|
Mármol-Sánchez E, Cirera S, Quintanilla R, Pla A, Amills M. Discovery and annotation of novel microRNAs in the porcine genome by using a semi-supervised transductive learning approach. Genomics 2019; 112:2107-2118. [PMID: 31816430 DOI: 10.1016/j.ygeno.2019.12.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Revised: 11/13/2019] [Accepted: 12/05/2019] [Indexed: 12/15/2022]
Abstract
Despite the broad variety of available microRNA (miRNA) prediction tools, their application to the discovery and annotation of novel miRNA genes in domestic species is still limited. In this study we designed a comprehensive pipeline (eMIRNA) for miRNA identification in the yet poorly annotated porcine genome and demonstrated the usefulness of implementing a motif search positional refinement strategy for the accurate determination of precursor miRNA boundaries. The small RNA fraction from gluteus medius skeletal muscle of 48 Duroc gilts was sequenced and used for the prediction of novel miRNA loci. Additionally, we selected the human miRNA annotation for a homology-based search of porcine miRNAs with orthologous genes in the human genome. A total of 20 novel expressed miRNAs were identified in the porcine muscle transcriptome and 27 additional novel porcine miRNAs were also detected by homology-based search using the human miRNA annotation. The existence of three selected novel miRNAs (ssc-miR-483, ssc-miR484 and ssc-miR-200a) was further confirmed by reverse transcription quantitative real-time PCR analyses in the muscle and liver tissues of Göttingen minipigs. In summary, the eMIRNA pipeline presented in the current work allowed us to expand the catalogue of porcine miRNAs and showed better performance than other commonly used miRNA prediction approaches. More importantly, the flexibility of our pipeline makes possible its application in other yet poorly annotated non-model species.
Collapse
Affiliation(s)
- Emilio Mármol-Sánchez
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain.
| | - Susanna Cirera
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Grønnegårdsvej 3, 2nd Floor, 1870 Frederiksberg C, Denmark
| | - Raquel Quintanilla
- Animal Breeding and Genetics Program, Institute for Research and Technology in Food and Agriculture (IRTA), Torre Marimon, 08140 Caldes de Montbui, Spain
| | - Albert Pla
- Department of Medical Genetics, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Marcel Amills
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain; Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| |
Collapse
|
12
|
Chen L, Heikkinen L, Wang C, Yang Y, Sun H, Wong G. Trends in the development of miRNA bioinformatics tools. Brief Bioinform 2019; 20:1836-1852. [PMID: 29982332 PMCID: PMC7414524 DOI: 10.1093/bib/bby054] [Citation(s) in RCA: 381] [Impact Index Per Article: 63.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Revised: 05/18/2018] [Indexed: 12/13/2022] Open
Abstract
MicroRNAs (miRNAs) are small noncoding RNAs that regulate gene expression via recognition of cognate sequences and interference of transcriptional, translational or epigenetic processes. Bioinformatics tools developed for miRNA study include those for miRNA prediction and discovery, structure, analysis and target prediction. We manually curated 95 review papers and ∼1000 miRNA bioinformatics tools published since 2003. We classified and ranked them based on citation number or PageRank score, and then performed network analysis and text mining (TM) to study the miRNA tools development trends. Five key trends were observed: (1) miRNA identification and target prediction have been hot spots in the past decade; (2) manual curation and TM are the main methods for collecting miRNA knowledge from literature; (3) most early tools are well maintained and widely used; (4) classic machine learning methods retain their utility; however, novel ones have begun to emerge; (5) disease-associated miRNA tools are emerging. Our analysis yields significant insight into the past development and future directions of miRNA tools.
Collapse
Affiliation(s)
- Liang Chen
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| | - Liisa Heikkinen
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| | - Changliang Wang
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| | - Yang Yang
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| | - Huiyan Sun
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Garry Wong
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| |
Collapse
|
13
|
Xu J, Hou QM, Khare T, Verma SK, Kumar V. Exploring miRNAs for developing climate-resilient crops: A perspective review. THE SCIENCE OF THE TOTAL ENVIRONMENT 2019; 653:91-104. [PMID: 30408672 DOI: 10.1016/j.scitotenv.2018.10.340] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2018] [Revised: 10/24/2018] [Accepted: 10/25/2018] [Indexed: 05/21/2023]
Abstract
Climate changes and environmental stresses have significant implications on global crop production and necessitate developing crops that can withstand an array of climate changes and environmental perturbations such as irregular water-supplies leading to drought or water-logging, hyper soil-salinity, extreme and variable temperatures, ultraviolet radiations and metal stress. Plants have intricate molecular mechanisms to cope with these dynamic environmental changes, one of the most common and effective being the reprogramming of expression of stress-responsive genes. Plant microRNAs (miRNAs) have emerged as key post-transcriptional and translational regulators of gene-expression for modulation of stress implications. Recent reports are establishing their key roles in epigenetic regulations of stress/adaptive responses as well as in providing plants genome-stability. Several stress responsive miRNAs are being identified from different crop plants and miRNA-driven RNA-interference (RNAi) is turning into a technology of choice for improving crop traits and providing phenotypic plasticity in challenging environments. Here we presents a perspective review on exploration of miRNAs as potent targets for engineering crops that can withstand multi-stress environments via loss-/gain-of-function approaches. This review also shed a light on potential roles plant miRNAs play in genome-stability and their emergence as potent target for genome-editing. Current knowledge on plant miRNAs, their biogenesis, function, their targets, and latest developments in bioinformatics approaches for plant miRNAs are discussed. Though there are recent reviews discussing primarily the individual miRNAs responsive to single stress factors, however, considering practical limitation of this approach, special emphasis is given in this review on miRNAs involved in responses and adaptation of plants to multi-stress environments including at epigenetic and/or epigenomic levels.
Collapse
Affiliation(s)
- Jin Xu
- School of Environmental Science and Safety Engineering, Tianjin University of Technology, Tianjin 300384, China
| | - Qin-Min Hou
- School of Environmental Science and Safety Engineering, Tianjin University of Technology, Tianjin 300384, China.
| | - Tushar Khare
- Department of Biotechnology, Modern College of Arts, Science and Commerce (Savitribai Phule Pune University), Ganeshkhind, Pune 411016, India
| | - Sandeep Kumar Verma
- Biotechnology Laboratory (TUBITAK Fellow), Department of Biology, Bolu Abant Izeet Baysal University, 14030 Bolu, Turkey
| | - Vinay Kumar
- Department of Biotechnology, Modern College of Arts, Science and Commerce (Savitribai Phule Pune University), Ganeshkhind, Pune 411016, India; Department of Environmental Science, Savitribai Phule Pune University, Pune 411007, India.
| |
Collapse
|
14
|
Fu X, Zhu W, Cai L, Liao B, Peng L, Chen Y, Yang J. Improved Pre-miRNAs Identification Through Mutual Information of Pre-miRNA Sequences and Structures. Front Genet 2019; 10:119. [PMID: 30858864 PMCID: PMC6397858 DOI: 10.3389/fgene.2019.00119] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 02/04/2019] [Indexed: 11/30/2022] Open
Abstract
Playing critical roles as post-transcriptional regulators, microRNAs (miRNAs) are a family of short non-coding RNAs that are derived from longer transcripts called precursor miRNAs (pre-miRNAs). Experimental methods to identify pre-miRNAs are expensive and time-consuming, which presents the need for computational alternatives. In recent years, the accuracy of computational methods to predict pre-miRNAs has been increasing significantly. However, there are still several drawbacks. First, these methods usually only consider base frequencies or sequence information while ignoring the information between bases. Second, feature extraction methods based on secondary structures usually only consider the global characteristics while ignoring the mutual influence of the local structures. Third, methods integrating high-dimensional feature information is computationally inefficient. In this study, we have proposed a novel mutual information-based feature representation algorithm for pre-miRNA sequences and secondary structures, which is capable of catching the interactions between sequence bases and local features of the RNA secondary structure. In addition, the feature space is smaller than that of most popular methods, which makes our method computationally more efficient than the competitors. Finally, we applied these features to train a support vector machine model to predict pre-miRNAs and compared the results with other popular predictors. As a result, our method outperforms others based on both 5-fold cross-validation and the Jackknife test.
Collapse
Affiliation(s)
- Xiangzheng Fu
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Wen Zhu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Bo Liao
- College of Information Science and Engineering, Hunan University, Changsha, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yifan Chen
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Jialiang Yang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
15
|
Macchiaroli N, Cucher M, Kamenetzky L, Yones C, Bugnon L, Berriman M, Olson PD, Rosenzvit MC. Identification and expression profiling of microRNAs in Hymenolepis. Int J Parasitol 2019; 49:211-223. [PMID: 30677390 DOI: 10.1016/j.ijpara.2018.07.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 07/20/2018] [Accepted: 07/23/2018] [Indexed: 02/08/2023]
Abstract
Tapeworms (cestodes) of the genus Hymenolepis are the causative agents of hymenolepiasis, a neglected zoonotic disease. Hymenolepis nana is the most prevalent human tapeworm, especially affecting children. The genomes of Hymenolepis microstoma and H. nana have been recently sequenced and assembled. MicroRNAs (miRNAs), a class of small non-coding RNAs, are principle regulators of gene expression at the post-transcriptional level and are involved in many different biological processes. In previous work, we experimentally identified miRNA genes in the cestodes Echinococcus, Taenia and Mesocestoides. However, current knowledge about miRNAs in Hymenolepis is limited. In this work we described for the first known time the expression profile of the miRNA complement in H. microstoma, and discovered miRNAs in H. nana. We found a reduced complement of 37 evolutionarily conserved miRNAs, putatively reflecting their low morphological complexity and parasitic lifestyle. We found high expression of a few miRNAs in the larval stage of H. microstoma that are conserved in other cestodes, suggesting that these miRNAs may have important roles in development, survival and for host-parasite interplay. We performed a comparative analysis of the identified miRNAs across the Cestoda and showed that most of the miRNAs in Hymenolepis are located in intergenic regions, implying that they are independently transcribed. We found a Hymenolepis-specific cluster composed of three members of the mir-36 family. Also, we found that one of the neighboring genes of mir-10 was a Hox gene as in most bilaterial species. This study provides a valuable resource for further experimental research in cestode biology that might lead to improved detection and control of these neglected parasites. The comprehensive identification and expression analysis of Hymenolepis miRNAs can help to identify novel biomarkers for diagnosis and/or novel therapeutic targets for the control of hymenolepiasis.
Collapse
Affiliation(s)
- Natalia Macchiaroli
- Instituto de Investigaciones en Microbiología y Parasitología Médicas (IMPaM), Facultad de Medicina, Universidad de Buenos Aires (UBA)-Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Buenos Aires, Argentina
| | - Marcela Cucher
- Instituto de Investigaciones en Microbiología y Parasitología Médicas (IMPaM), Facultad de Medicina, Universidad de Buenos Aires (UBA)-Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Buenos Aires, Argentina
| | - Laura Kamenetzky
- Instituto de Investigaciones en Microbiología y Parasitología Médicas (IMPaM), Facultad de Medicina, Universidad de Buenos Aires (UBA)-Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Buenos Aires, Argentina
| | - Cristian Yones
- Research Institute for Signals, Systems and Computational Intelligence, (sinc(i)), FICH-UNL-Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Santa Fe, Argentina
| | - Leandro Bugnon
- Research Institute for Signals, Systems and Computational Intelligence, (sinc(i)), FICH-UNL-Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Santa Fe, Argentina
| | - Matt Berriman
- Parasite Genomics Group, Wellcome Trust Sanger Institute, Hinxton, UK
| | - Peter D Olson
- Department of Life Sciences, The Natural History Museum, London, UK
| | - Mara Cecilia Rosenzvit
- Instituto de Investigaciones en Microbiología y Parasitología Médicas (IMPaM), Facultad de Medicina, Universidad de Buenos Aires (UBA)-Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Buenos Aires, Argentina.
| |
Collapse
|
16
|
Fu X, Liao B, Zhu W, Cai L. New 3D graphical representation for RNA structure analysis and its application in the pre-miRNA identification of plants. RSC Adv 2018; 8:30833-30841. [PMID: 35548744 PMCID: PMC9085476 DOI: 10.1039/c8ra04138e] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 08/24/2018] [Indexed: 11/26/2022] Open
Abstract
MicroRNAs (miRNAs) are a family of short non-coding RNAs that play significant roles as post-transcriptional regulators. Consequently, various methods have been proposed to identify precursor miRNAs (pre-miRNAs), among which the comparative studies of miRNA structures are the most important. To measure and classify the structural similarity of miRNAs, we propose a new three-dimensional (3D) graphical representation of the secondary structure of miRNAs, in which an miRNA secondary structure is initially transformed into a characteristic sequence based on physicochemical properties and frequency of base. A numerical characterization of the 3D graph is used to represent the miRNA secondary structure. We then utilize a novel Euclidean distance method based on this expression to compute the distance of different miRNA sequences for the sequence similarity analysis. Finally, we use this sequence similarity analysis method to identify plant pre-miRNAs among three commonly used datasets. Results show that the method is reasonable and effective.
Collapse
Affiliation(s)
- Xiangzheng Fu
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| | - Bo Liao
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| | - Wen Zhu
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| |
Collapse
|
17
|
Pérez MG, Macchiaroli N, Lichtenstein G, Conti G, Asurmendi S, Milone DH, Stegmayer G, Kamenetzky L, Cucher M, Rosenzvit MC. microRNA analysis of Taenia crassiceps cysticerci under praziquantel treatment and genome-wide identification of Taenia solium miRNAs. Int J Parasitol 2017; 47:643-653. [DOI: 10.1016/j.ijpara.2017.04.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Revised: 03/31/2017] [Accepted: 04/03/2017] [Indexed: 12/14/2022]
|