1
|
Daniel Thomas S, Vijayakumar K, John L, Krishnan D, Rehman N, Revikumar A, Kandel Codi JA, Prasad TSK, S S V, Raju R. Machine Learning Strategies in MicroRNA Research: Bridging Genome to Phenome. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2024; 28:213-233. [PMID: 38752932 DOI: 10.1089/omi.2024.0047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2024]
Abstract
MicroRNAs (miRNAs) have emerged as a prominent layer of regulation of gene expression. This article offers the salient and current aspects of machine learning (ML) tools and approaches from genome to phenome in miRNA research. First, we underline that the complexity in the analysis of miRNA function ranges from their modes of biogenesis to the target diversity in diverse biological conditions. Therefore, it is imperative to first ascertain the miRNA coding potential of genomes and understand the regulatory mechanisms of their expression. This knowledge enables the efficient classification of miRNA precursors and the identification of their mature forms and respective target genes. Second, and because one miRNA can target multiple mRNAs and vice versa, another challenge is the assessment of the miRNA-mRNA target interaction network. Furthermore, long-noncoding RNA (lncRNA)and circular RNAs (circRNAs) also contribute to this complexity. ML has been used to tackle these challenges at the high-dimensional data level. The present expert review covers more than 100 tools adopting various ML approaches pertaining to, for example, (1) miRNA promoter prediction, (2) precursor classification, (3) mature miRNA prediction, (4) miRNA target prediction, (5) miRNA- lncRNA and miRNA-circRNA interactions, (6) miRNA-mRNA expression profiling, (7) miRNA regulatory module detection, (8) miRNA-disease association, and (9) miRNA essentiality prediction. Taken together, we unpack, critically examine, and highlight the cutting-edge synergy of ML approaches and miRNA research so as to develop a dynamic and microlevel understanding of human health and diseases.
Collapse
Affiliation(s)
- Sonet Daniel Thomas
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Krithika Vijayakumar
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Levin John
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Deepak Krishnan
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Niyas Rehman
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Amjesh Revikumar
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Kerala Genome Data Centre, Kerala Development and Innovation Strategic Council, Thiruvananthapuram, Kerala, India
| | - Jalaluddin Akbar Kandel Codi
- Department of Surgical Oncology, Yenepoya Medical College, Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | | | - Vinodchandra S S
- Department of Computer Science, University of Kerala, Thiruvananthapuram, Kerala, India
| | - Rajesh Raju
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| |
Collapse
|
2
|
PlantMirP2: An Accurate, Fast and Easy-To-Use Program for Plant Pre-miRNA and miRNA Prediction. Genes (Basel) 2021; 12:genes12081280. [PMID: 34440454 PMCID: PMC8392394 DOI: 10.3390/genes12081280] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 08/19/2021] [Accepted: 08/19/2021] [Indexed: 01/01/2023] Open
Abstract
MicroRNAs (miRNAs) are a kind of short non-coding ribonucleic acid molecules that can regulate gene expression. The computational identification of plant miRNAs is of great significance to understanding biological functions. In our previous studies, we have put firstly forward and further developed a set of knowledge-based energy features to construct two plant pre-miRNA prediction tools (plantMirP and riceMirP). However, these two tools cannot be used for miRNA prediction from NGS (Next-Generation Sequencing) data. In addition, for further improving the prediction performance and accessibility, plantMirP2 has been developed. Based on the latest dataset, plantMirP2 achieves a promising performance: 0.9968 (Area Under Curve, AUC), 0.9754 (accuracy), 0.9675 (sensitivity) and 0.9876 (specificity). Additionally, the comparisons with other plant pre-miRNA tools show that plantMirP2 performs better. Finally, the webserver and stand-alone version of plantMirP2 are available.
Collapse
|
3
|
Tan K, Huang W, Liu X, Hu J, Dong S. A Hierarchical Graph Convolution Network for Representation Learning of Gene Expression Data. IEEE J Biomed Health Inform 2021; 25:3219-3229. [PMID: 33449889 DOI: 10.1109/jbhi.2021.3052008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The curse of dimensionality, which is caused by high-dimensionality and low-sample-size, is a major challenge in gene expression data analysis. However, the real situation is even worse: labelling data is laborious and time-consuming, so only a small part of the limited samples will be labelled. Having such few labelled samples further increases the difficulty of training deep learning models. Interpretability is an important requirement in biomedicine. Many existing deep learning methods are trying to provide interpretability, but rarely apply to gene expression data. Recent semi-supervised graph convolution network methods try to address these problems by smoothing the label information over a graph. However, to the best of our knowledge, these methods only utilize graphs in either the feature space or sample space, which restrict their performance. We propose a transductive semi-supervised representation learning method called a hierarchical graph convolution network (HiGCN) to aggregate the information of gene expression data in both feature and sample spaces. HiGCN first utilizes external knowledge to construct a feature graph and a similarity kernel to construct a sample graph. Then, two spatial-based GCNs are used to aggregate information on these graphs. To validate the model's performance, synthetic and real datasets are provided to lend empirical support. Compared with two recent models and three traditional models, HiGCN learns better representations of gene expression data, and these representations improve the performance of downstream tasks, especially when the model is trained on a few labelled samples. Important features can be extracted from our model to provide reliable interpretability.
Collapse
|
4
|
Yones C, Raad J, Bugnon LA, Milone DH, Stegmayer G. High precision in microRNA prediction: A novel genome-wide approach with convolutional deep residual networks. Comput Biol Med 2021; 134:104448. [PMID: 33979731 DOI: 10.1016/j.compbiomed.2021.104448] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 04/21/2021] [Accepted: 04/22/2021] [Indexed: 11/30/2022]
Abstract
MicroRNAs (miRNAs) are small non-coding RNAs that have a key role in the regulation of gene expression. The importance of miRNAs is widely acknowledged by the community nowadays and computational methods are needed for the precise prediction of novel candidates to miRNA. This task can be done by searching homologous with sequence alignment tools, but results are restricted to sequences that are very similar to the known miRNA precursors (pre-miRNAs). Besides, a very important property of pre-miRNAs, their secondary structure, is not taken into account by these methods. To fill this gap, many machine learning approaches were proposed in the last years. However, the methods are generally tested in very controlled conditions. If these methods were used under real conditions, the false positives increase and the precisions fall quite below those published. This work provides a novel approach for dealing with the computational prediction of pre-miRNAs: a convolutional deep residual neural network (mirDNN). This model was tested with several genomes of animals and plants, the full-genomes, achieving a precision up to 5 times larger than other approaches at the same recall rates. Furthermore, a novel validation methodology was used to ensure that the performance reported in this study can be effectively achieved when using mirDNN in novel species. To provide fast an easy access to mirDNN, a web demo is available at http://sinc.unl.edu.ar/web-demo/mirdnn/. The demo can process FASTA files with multiple sequences to calculate the prediction scores and generates the nucleotide importance plots. FULL SOURCE CODE: http://sourceforge.net/projects/sourcesinc/files/mirdnn and https://github.com/cyones/mirDNN. CONTACT: gstegmayer@sinc.unl.edu.ar.
Collapse
Affiliation(s)
- C Yones
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - J Raad
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - L A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - D H Milone
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - G Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina.
| |
Collapse
|
5
|
Huang S, Yoshitake K, Asaduzzaman M, Kinoshita S, Watabe S, Asakawa S. Discovery and functional understanding of MiRNAs in molluscs: a genome-wide profiling approach. RNA Biol 2021; 18:1702-1715. [PMID: 33356816 DOI: 10.1080/15476286.2020.1867798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
Abstract
Small non-coding RNAs play a pivotal role in gene regulation, repression of transposable element and viral activity in various organisms. Among the various categories of these small non-coding RNAs, microRNAs (miRNAs) guide post-translational gene regulation in cellular development, proliferation, apoptosis, oncogenesis, and differentiation. Here, we performed a genome-wide computational prediction of miRNAs to improve the understanding of miRNA observation and function in molluscs. As an initial step, hundreds of conserved miRNAs were predicted in 35 species of molluscs through genome scanning. Afterwards, the miRNAs' population, isoforms, organization, and function were characterized in detail. Furthermore, the key miRNA biogenesis factors, including AGO2, DGCR8, DICER, DROSHA, TRABP2, RAN, and XPO5, were elucidated based on homologue sequence searching. We also summarized the miRNAs' function in biomineralization, immune and stress response, as well as growth and development in molluscs. Because miRNAs play a vital role in various lifeforms, this study will provide insight into miRNA biogenesis and function in molluscs, as well as other invertebrates.
Collapse
Affiliation(s)
- Songqian Huang
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Kazutoshi Yoshitake
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Md Asaduzzaman
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Shigeharu Kinoshita
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Shugo Watabe
- School of Marine Biosciences, Kitasato University, Sagamihara, Kanagawa, Japan
| | - Shuichi Asakawa
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
6
|
Bugnon LA, Yones C, Milone DH, Stegmayer G. Genome-wide discovery of pre-miRNAs: comparison of recent approaches based on machine learning. Brief Bioinform 2020; 22:5894456. [PMID: 34020552 DOI: 10.1093/bib/bbaa184] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Revised: 07/13/2020] [Accepted: 07/18/2020] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION The genome-wide discovery of microRNAs (miRNAs) involves identifying sequences having the highest chance of being a novel miRNA precursor (pre-miRNA), within all the possible sequences in a complete genome. The known pre-miRNAs are usually just a few in comparison to the millions of candidates that have to be analyzed. This is of particular interest in non-model species and recently sequenced genomes, where the challenge is to find potential pre-miRNAs only from the sequenced genome. The task is unfeasible without the help of computational methods, such as deep learning. However, it is still very difficult to find an accurate predictor, with a low false positive rate in this genome-wide context. Although there are many available tools, these have not been tested in realistic conditions, with sequences from whole genomes and the high class imbalance inherent to such data. RESULTS In this work, we review six recent methods for tackling this problem with machine learning. We compare the models in five genome-wide datasets: Arabidopsis thaliana, Caenorhabditis elegans, Anopheles gambiae, Drosophila melanogaster, Homo sapiens. The models have been designed for the pre-miRNAs prediction task, where there is a class of interest that is significantly underrepresented (the known pre-miRNAs) with respect to a very large number of unlabeled samples. It was found that for the smaller genomes and smaller imbalances, all methods perform in a similar way. However, for larger datasets such as the H. sapiens genome, it was found that deep learning approaches using raw information from the sequences reached the best scores, achieving low numbers of false positives. AVAILABILITY The source code to reproduce these results is in: http://sourceforge.net/projects/sourcesinc/files/gwmirna Additionally, the datasets are freely available in: https://sourceforge.net/projects/sourcesinc/files/mirdata.
Collapse
Affiliation(s)
- Leandro A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| | - Cristian Yones
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|
7
|
Guan ZX, Li SH, Zhang ZM, Zhang D, Yang H, Ding H. A Brief Survey for MicroRNA Precursor Identification Using Machine Learning Methods. Curr Genomics 2020; 21:11-25. [PMID: 32655294 PMCID: PMC7324890 DOI: 10.2174/1389202921666200214125102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 01/24/2020] [Accepted: 01/30/2020] [Indexed: 11/22/2022] Open
Abstract
MicroRNAs, a group of short non-coding RNA molecules, could regulate gene expression. Many diseases are associated with abnormal expression of miRNAs. Therefore, accurate identification of miRNA precursors is necessary. In the past 10 years, experimental methods, comparative genomics methods, and artificial intelligence methods have been used to identify pre-miRNAs. However, experimental methods and comparative genomics methods have their disadvantages, such as time-consuming. In contrast, machine learning-based method is a better choice. Therefore, the review summarizes the current advances in pre-miRNA recognition based on computational methods, including the construction of benchmark datasets, feature extraction methods, prediction algorithms, and the results of the models. And we also provide valid information about the predictors currently available. Finally, we give the future perspectives on the identification of pre-miRNAs. The review provides scholars with a whole background of pre-miRNA identification by using machine learning methods, which can help researchers have a clear understanding of progress of the research in this field.
Collapse
Affiliation(s)
- Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| |
Collapse
|
8
|
Raad J, Stegmayer G, Milone DH. Complexity measures of the mature miRNA for improving pre-miRNAs prediction. Bioinformatics 2019; 36:2319-2327. [DOI: 10.1093/bioinformatics/btz940] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 12/10/2019] [Accepted: 12/17/2019] [Indexed: 12/20/2022] Open
Abstract
AbstractMotivationThe discovery of microRNA (miRNA) in the last decade has certainly changed the understanding of gene regulation in the cell. Although a large number of algorithms with different features have been proposed, they still predict an impractical amount of false positives. Most of the proposed features are based on the structure of precursors of the miRNA only, not considering the important and relevant information contained in the mature miRNA. Such new kind of features could certainly improve the performance of the predictors of new miRNAs.ResultsThis paper presents three new features that are based on the sequence information contained in the mature miRNA. We will show how these new features, when used by a classical supervised machine learning approach as well as by more recent proposals based on deep learning, improve the prediction performance in a significant way. Moreover, several experimental conditions were defined and tested to evaluate the novel features impact in situations close to genome-wide analysis. The results show that the incorporation of new features based on the mature miRNA allows to improve the detection of new miRNAs independently of the classifier used.Availability and implementationhttps://sourceforge.net/projects/sourcesinc/files/cplxmirna/.Supplementary informationSupplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jonathan Raad
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|
9
|
Mármol-Sánchez E, Cirera S, Quintanilla R, Pla A, Amills M. Discovery and annotation of novel microRNAs in the porcine genome by using a semi-supervised transductive learning approach. Genomics 2019; 112:2107-2118. [PMID: 31816430 DOI: 10.1016/j.ygeno.2019.12.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Revised: 11/13/2019] [Accepted: 12/05/2019] [Indexed: 12/15/2022]
Abstract
Despite the broad variety of available microRNA (miRNA) prediction tools, their application to the discovery and annotation of novel miRNA genes in domestic species is still limited. In this study we designed a comprehensive pipeline (eMIRNA) for miRNA identification in the yet poorly annotated porcine genome and demonstrated the usefulness of implementing a motif search positional refinement strategy for the accurate determination of precursor miRNA boundaries. The small RNA fraction from gluteus medius skeletal muscle of 48 Duroc gilts was sequenced and used for the prediction of novel miRNA loci. Additionally, we selected the human miRNA annotation for a homology-based search of porcine miRNAs with orthologous genes in the human genome. A total of 20 novel expressed miRNAs were identified in the porcine muscle transcriptome and 27 additional novel porcine miRNAs were also detected by homology-based search using the human miRNA annotation. The existence of three selected novel miRNAs (ssc-miR-483, ssc-miR484 and ssc-miR-200a) was further confirmed by reverse transcription quantitative real-time PCR analyses in the muscle and liver tissues of Göttingen minipigs. In summary, the eMIRNA pipeline presented in the current work allowed us to expand the catalogue of porcine miRNAs and showed better performance than other commonly used miRNA prediction approaches. More importantly, the flexibility of our pipeline makes possible its application in other yet poorly annotated non-model species.
Collapse
Affiliation(s)
- Emilio Mármol-Sánchez
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain.
| | - Susanna Cirera
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Grønnegårdsvej 3, 2nd Floor, 1870 Frederiksberg C, Denmark
| | - Raquel Quintanilla
- Animal Breeding and Genetics Program, Institute for Research and Technology in Food and Agriculture (IRTA), Torre Marimon, 08140 Caldes de Montbui, Spain
| | - Albert Pla
- Department of Medical Genetics, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Marcel Amills
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain; Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| |
Collapse
|
10
|
Multi-view Co-training for microRNA Prediction. Sci Rep 2019; 9:10931. [PMID: 31358877 PMCID: PMC6662744 DOI: 10.1038/s41598-019-47399-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Accepted: 07/05/2019] [Indexed: 12/13/2022] Open
Abstract
MicroRNA (miRNA) are short, non-coding RNAs involved in cell regulation at post-transcriptional and translational levels. Numerous computational predictors of miRNA been developed that generally classify miRNA based on either sequence- or expression-based features. While these methods are highly effective, they require large labelled training data sets, which are often not available for many species. Simultaneously, emerging high-throughput wet-lab experimental procedures are producing large unlabelled data sets of genomic sequence and RNA expression profiles. Existing methods use supervised machine learning and are therefore unable to leverage these unlabelled data. In this paper, we design and develop a multi-view co-training approach for the classification of miRNA to maximize the utility of unlabelled training data by taking advantage of multiple views of the problem. Starting with only 10 labelled training data, co-training is shown to significantly (p < 0.01) increase classification accuracy of both sequence- and expression-based classifiers, without requiring any new labelled training data. After 11 iterations of co-training, the expression-based view of miRNA classification experiences an average increase in AUPRC of 15.81% over six species, compared to 11.90% for self-training and 4.84% for passive learning. Similar results are observed for sequence-based classifiers with increases of 46.47%, 39.53% and 29.43%, for co-training, self-training, and passive learning, respectively. The final co-trained sequence and expression-based classifiers are integrated into a final confidence-based classifier which shows improved performance compared to both the expression (1.5%, p = 0.021) and sequence (3.7%, p = 0.006) views. This study represents the first application of multi-view co-training to miRNA prediction and shows great promise, particularly for understudied species with few available training data.
Collapse
|
11
|
Bugnon LA, Yones C, Raad J, Milone DH, Stegmayer G. Genome-wide hairpins datasets of animals and plants for novel miRNA prediction. Data Brief 2019; 25:104209. [PMID: 31453279 PMCID: PMC6700487 DOI: 10.1016/j.dib.2019.104209] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 06/16/2019] [Accepted: 06/25/2019] [Indexed: 01/19/2023] Open
Abstract
This article makes available several genome-wide datasets, which can be used for training microRNA (miRNA) classifiers. The hairpin sequences available are from the genomes of: Homo sapiens, Arabidopsis thaliana, Anopheles gambiae, Caenorhabditis elegans and Drosophila melanogaster. Each dataset provides the genome data divided into sequences and a set of computed features for predictions. Each sequence has one label: i) “positive”: meaning that it is a well-known pre-miRNA, according to miRBase v21; or ii) “unlabeled”: indicating that the sequence has not (yet) a known function and could be a possible candidate to novel pre-miRNA. Due to the fact that selecting an informative feature set is very important for a good pre-miRNA classifier, a representative feature set with large discriminative power has been calculated and it is provided, as well, for each genome. This feature set contains typical information about sequence, topology and structure. Dataset was publically shared in https://sourceforge.net/projects/sourcesinc/files/mirdata/.
Collapse
Affiliation(s)
- L A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - C Yones
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - J Raad
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - D H Milone
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - G Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|
12
|
Macchiaroli N, Cucher M, Kamenetzky L, Yones C, Bugnon L, Berriman M, Olson PD, Rosenzvit MC. Identification and expression profiling of microRNAs in Hymenolepis. Int J Parasitol 2019; 49:211-223. [PMID: 30677390 DOI: 10.1016/j.ijpara.2018.07.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 07/20/2018] [Accepted: 07/23/2018] [Indexed: 02/08/2023]
Abstract
Tapeworms (cestodes) of the genus Hymenolepis are the causative agents of hymenolepiasis, a neglected zoonotic disease. Hymenolepis nana is the most prevalent human tapeworm, especially affecting children. The genomes of Hymenolepis microstoma and H. nana have been recently sequenced and assembled. MicroRNAs (miRNAs), a class of small non-coding RNAs, are principle regulators of gene expression at the post-transcriptional level and are involved in many different biological processes. In previous work, we experimentally identified miRNA genes in the cestodes Echinococcus, Taenia and Mesocestoides. However, current knowledge about miRNAs in Hymenolepis is limited. In this work we described for the first known time the expression profile of the miRNA complement in H. microstoma, and discovered miRNAs in H. nana. We found a reduced complement of 37 evolutionarily conserved miRNAs, putatively reflecting their low morphological complexity and parasitic lifestyle. We found high expression of a few miRNAs in the larval stage of H. microstoma that are conserved in other cestodes, suggesting that these miRNAs may have important roles in development, survival and for host-parasite interplay. We performed a comparative analysis of the identified miRNAs across the Cestoda and showed that most of the miRNAs in Hymenolepis are located in intergenic regions, implying that they are independently transcribed. We found a Hymenolepis-specific cluster composed of three members of the mir-36 family. Also, we found that one of the neighboring genes of mir-10 was a Hox gene as in most bilaterial species. This study provides a valuable resource for further experimental research in cestode biology that might lead to improved detection and control of these neglected parasites. The comprehensive identification and expression analysis of Hymenolepis miRNAs can help to identify novel biomarkers for diagnosis and/or novel therapeutic targets for the control of hymenolepiasis.
Collapse
Affiliation(s)
- Natalia Macchiaroli
- Instituto de Investigaciones en Microbiología y Parasitología Médicas (IMPaM), Facultad de Medicina, Universidad de Buenos Aires (UBA)-Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Buenos Aires, Argentina
| | - Marcela Cucher
- Instituto de Investigaciones en Microbiología y Parasitología Médicas (IMPaM), Facultad de Medicina, Universidad de Buenos Aires (UBA)-Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Buenos Aires, Argentina
| | - Laura Kamenetzky
- Instituto de Investigaciones en Microbiología y Parasitología Médicas (IMPaM), Facultad de Medicina, Universidad de Buenos Aires (UBA)-Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Buenos Aires, Argentina
| | - Cristian Yones
- Research Institute for Signals, Systems and Computational Intelligence, (sinc(i)), FICH-UNL-Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Santa Fe, Argentina
| | - Leandro Bugnon
- Research Institute for Signals, Systems and Computational Intelligence, (sinc(i)), FICH-UNL-Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Santa Fe, Argentina
| | - Matt Berriman
- Parasite Genomics Group, Wellcome Trust Sanger Institute, Hinxton, UK
| | - Peter D Olson
- Department of Life Sciences, The Natural History Museum, London, UK
| | - Mara Cecilia Rosenzvit
- Instituto de Investigaciones en Microbiología y Parasitología Médicas (IMPaM), Facultad de Medicina, Universidad de Buenos Aires (UBA)-Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Buenos Aires, Argentina.
| |
Collapse
|