1
|
Fu X, Zhu W, Cai L, Liao B, Peng L, Chen Y, Yang J. Improved Pre-miRNAs Identification Through Mutual Information of Pre-miRNA Sequences and Structures. Front Genet 2019; 10:119. [PMID: 30858864 PMCID: PMC6397858 DOI: 10.3389/fgene.2019.00119] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 02/04/2019] [Indexed: 11/30/2022] Open
Abstract
Playing critical roles as post-transcriptional regulators, microRNAs (miRNAs) are a family of short non-coding RNAs that are derived from longer transcripts called precursor miRNAs (pre-miRNAs). Experimental methods to identify pre-miRNAs are expensive and time-consuming, which presents the need for computational alternatives. In recent years, the accuracy of computational methods to predict pre-miRNAs has been increasing significantly. However, there are still several drawbacks. First, these methods usually only consider base frequencies or sequence information while ignoring the information between bases. Second, feature extraction methods based on secondary structures usually only consider the global characteristics while ignoring the mutual influence of the local structures. Third, methods integrating high-dimensional feature information is computationally inefficient. In this study, we have proposed a novel mutual information-based feature representation algorithm for pre-miRNA sequences and secondary structures, which is capable of catching the interactions between sequence bases and local features of the RNA secondary structure. In addition, the feature space is smaller than that of most popular methods, which makes our method computationally more efficient than the competitors. Finally, we applied these features to train a support vector machine model to predict pre-miRNAs and compared the results with other popular predictors. As a result, our method outperforms others based on both 5-fold cross-validation and the Jackknife test.
Collapse
Affiliation(s)
- Xiangzheng Fu
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Wen Zhu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Bo Liao
- College of Information Science and Engineering, Hunan University, Changsha, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yifan Chen
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Jialiang Yang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
2
|
Grativol C, Thiebaut F, Sangi S, Montessoro P, Santos WDS, Hemerly AS, Ferreira PC. A miniature inverted-repeat transposable element, AddIn-MITE, located inside a WD40 gene is conserved in Andropogoneae grasses. PeerJ 2019; 7:e6080. [PMID: 30648010 PMCID: PMC6331000 DOI: 10.7717/peerj.6080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 11/07/2018] [Indexed: 11/25/2022] Open
Abstract
Miniature inverted-repeat transposable elements (MITEs) have been associated with genic regions in plant genomes and may play important roles in the regulation of nearby genes via recruitment of small RNAs (sRNA) to the MITEs loci. We identified eight families of MITEs in the sugarcane genome assembly with MITE-Hunter pipeline. These sequences were found to be upstream, downstream or inserted into 67 genic regions in the genome. The position of the most abundant MITE (Stowaway-like) in genic regions, which we call AddIn-MITE, was confirmed in a WD40 gene. The analysis of four monocot species showed conservation of the AddIn-MITE sequence, with a large number of copies in their genomes. We also investigated the conservation of the AddIn-MITE’ position in the WD40 genes from sorghum, maize and, in sugarcane cultivars and wild Saccharum species. In all analyzed plants, AddIn-MITE has located in WD40 intronic region. Furthermore, the role of AddIn-MITE-related sRNA in WD40 genic region was investigated. We found sRNAs preferentially mapped to the AddIn-MITE than to other regions in the WD40 gene in sugarcane. In addition, the analysis of the small RNA distribution patterns in the WD40 gene and the structure of AddIn-MITE, suggests that the MITE region is a proto-miRNA locus in sugarcane. Together, these data provide insights into the AddIn-MITE role in Andropogoneae grasses.
Collapse
Affiliation(s)
- Clicia Grativol
- Laboratório de Química e Função de Proteínas e Peptídeos/Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense, Campos dos Goytacazes, Rio de Janeiro, Brazil
| | - Flavia Thiebaut
- Laboratório de Biologia Molecular de Plantas/Instituto de Bioquímica Médica Leopoldo De Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Sara Sangi
- Laboratório de Química e Função de Proteínas e Peptídeos/Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense, Campos dos Goytacazes, Rio de Janeiro, Brazil
| | - Patricia Montessoro
- Laboratório de Biologia Molecular de Plantas/Instituto de Bioquímica Médica Leopoldo De Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Walaci da Silva Santos
- Laboratório de Química e Função de Proteínas e Peptídeos/Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense, Campos dos Goytacazes, Rio de Janeiro, Brazil
| | - Adriana S. Hemerly
- Laboratório de Biologia Molecular de Plantas/Instituto de Bioquímica Médica Leopoldo De Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Paulo C.G. Ferreira
- Laboratório de Biologia Molecular de Plantas/Instituto de Bioquímica Médica Leopoldo De Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
3
|
Yu D, Ma X, Zuo Z, Wang H, Meng Y. Classification of Transcription Boundary-Associated RNAs (TBARs) in Animals and Plants. Front Genet 2018; 9:168. [PMID: 29868116 PMCID: PMC5960741 DOI: 10.3389/fgene.2018.00168] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 04/26/2018] [Indexed: 11/13/2022] Open
Abstract
There is increasing evidence suggesting the contribution of non-coding RNAs (ncRNAs) to the phenotypic and physiological complexity of organisms. A novel ncRNA species has been identified near the transcription boundaries of protein-coding genes in eukaryotes, bacteria, and archaea. This review provides a detailed description of these transcription boundary-associated RNAs (TBARs), including their classification. Based on their genomic distribution, TBARs are divided into two major groups: promoter-associated RNAs (PARs) and terminus-associated RNAs (TARs). Depending on the sequence length, each group is further classified into long RNA species (>200 nt) and small RNA species (<200 nt). According to these rules of TBAR classification, divergent ncRNAs with confusing nomenclatures, such as promoter upstream transcripts (PROMPTs), upstream antisense RNAs (uaRNAs), stable unannotated transcripts (SUTs), cryptic unstable transcripts (CUTs), upstream non-coding transcripts (UNTs), transcription start site-associated RNAs (TSSaRNAs), transcription initiation RNAs (tiRNAs), and transcription termination site-associated RNAs (TTSaRNAs), were assigned to specific classes. Although the biogenesis pathways of PARs and TARs have not yet been clearly elucidated, previous studies indicate that some of the PARs have originated either through divergent transcription or via RNA polymerase pausing. Intriguing findings regarding the functional implications of the TBARs such as the long-range “gene looping” model, which explains their role in the transcriptional regulation of protein-coding genes, are also discussed. Altogether, this review provides a comprehensive overview of the current research status of TBARs, which will promote further investigations in this research area.
Collapse
Affiliation(s)
- Dongliang Yu
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, China
| | - Xiaoxia Ma
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, China
| | - Ziwei Zuo
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, China
| | - Huizhong Wang
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, China
| | - Yijun Meng
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, China
| |
Collapse
|
4
|
Ma X, Han N, Shao C, Meng Y. Transcriptome-Wide Discovery of PASRs (Promoter-Associated Small RNAs) and TASRs (Terminus-Associated Small RNAs) in Arabidopsis thaliana. PLoS One 2017; 12:e0169212. [PMID: 28046132 PMCID: PMC5207706 DOI: 10.1371/journal.pone.0169212] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 12/13/2016] [Indexed: 01/21/2023] Open
Abstract
Hints from animals point to the existence of two novel small RNA (sRNA) species surrounding the transcription start sites (TSSs) and the termini of the genes, respectively. In this study, we performed a comprehensive search for the two sRNA species named promoter-associated sRNAs (PASRs) and terminus-associated sRNAs (TASRs) in Arabidopsis. By using sRNA sequencing data from wild type plants and several mutants related to the sRNA biogenesis, Argonaute (AGO) 1- and AGO4-associated sRNA sequencing data, double-stranded RNA sequencing (dsRNA-seq) data, and DNA methylation profiling data, the biogenesis and action pathways of the PASRs and the TASRs were investigated. PASR and TASR peaks were identified on hundreds of the protein-coding genes. Deep analysis uncovered that some of the sRNA peaks were covered by dsRNA-seq reads, and these peaks were significantly repressed in specific mutants. Besides, certain PASRs and TASRs were preferentially recruited by AGO4, and site-specific DNA methylation signals encompassing the genomic loci of these sRNAs were also detected. Accordingly, we proposed a model that certain PASRs and TASRs were generated through a specific Pol IV-, RDR-, DCL-dependent pathway, and they were associated with AGO4 to perform site-specific DNA methylation on their host genes. The above results indicate the existence of PASRs and TASRs in plants. The proposed biogenesis pathway and action mode of the PASRs and TASRs could facilitate us to perform in-depth functional studies on these novel sRNA species.
Collapse
Affiliation(s)
- Xiaoxia Ma
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, PR China
| | - Ning Han
- Key Laboratory for Cell and Gene Engineering of Zhejiang Province, Institute of Genetics, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, PR China
| | - Chaogang Shao
- College of Life Sciences, Huzhou University, Huzhou, PR China
| | - Yijun Meng
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, PR China
- * E-mail:
| |
Collapse
|
5
|
BP Neural Network Could Help Improve Pre-miRNA Identification in Various Species. BIOMED RESEARCH INTERNATIONAL 2016; 2016:9565689. [PMID: 27635401 PMCID: PMC5011242 DOI: 10.1155/2016/9565689] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Revised: 07/05/2016] [Accepted: 07/17/2016] [Indexed: 01/21/2023]
Abstract
MicroRNAs (miRNAs) are a set of short (21–24 nt) noncoding RNAs that play significant regulatory roles in cells. In the past few years, research on miRNA-related problems has become a hot field of bioinformatics because of miRNAs' essential biological function. miRNA-related bioinformatics analysis is beneficial in several aspects, including the functions of miRNAs and other genes, the regulatory network between miRNAs and their target mRNAs, and even biological evolution. Distinguishing miRNA precursors from other hairpin-like sequences is important and is an essential procedure in detecting novel microRNAs. In this study, we employed backpropagation (BP) neural network together with 98-dimensional novel features for microRNA precursor identification. Results show that the precision and recall of our method are 95.53% and 96.67%, respectively. Results further demonstrate that the total prediction accuracy of our method is nearly 13.17% greater than the state-of-the-art microRNA precursor prediction software tools.
Collapse
|
6
|
Patra D, Fasold M, Langenberger D, Steger G, Grosse I, Stadler PF. plantDARIO: web based quantitative and qualitative analysis of small RNA-seq data in plants. FRONTIERS IN PLANT SCIENCE 2014; 5:708. [PMID: 25566282 PMCID: PMC4274896 DOI: 10.3389/fpls.2014.00708] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 11/26/2014] [Indexed: 05/11/2023]
Abstract
High-throughput sequencing techniques have made it possible to assay an organism's entire repertoire of small non-coding RNAs (ncRNAs) in an efficient and cost-effective manner. The moderate size of small RNA-seq datasets makes it feasible to provide free web services to the research community that provide many basic features of a small RNA-seq analysis, including quality control, read normalization, ncRNA quantification, and the prediction of putative novel ncRNAs. DARIO is one such system that so far has been focussed on animals. Here we introduce an extension of this system to plant short non-coding RNAs (sncRNAs). It includes major modifications to cope with plant-specific sncRNA processing. The current version of plantDARIO covers analyses of mapping files, small RNA-seq quality control, expression analyses of annotated sncRNAs, including the prediction of novel miRNAs and snoRNAs from unknown expressed loci and expression analyses of user-defined loci. At present Arabidopsis thaliana, Beta vulgaris, and Solanum lycopersicum are covered. The web tool links to a plant specific visualization browser to display the read distribution of the analyzed sample. The easy-to-use platform of plantDARIO quantifies RNA expression of annotated sncRNAs from different sncRNA databases together with new sncRNAs, annotated by our group. The plantDARIO website can be accessed at http://plantdario.bioinf.uni-leipzig.de/.
Collapse
Affiliation(s)
- Deblina Patra
- Institut für Informatik, Martin-Luther-Universität Halle-WittenbergHalle (Saale), Germany
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University LeipzigLeipzig, Germany
| | - Mario Fasold
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University LeipzigLeipzig, Germany
- ecSeq BioinformaticsLeipzig, Germany
| | - David Langenberger
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University LeipzigLeipzig, Germany
- ecSeq BioinformaticsLeipzig, Germany
| | - Gerhard Steger
- Institut für Pysikalische Biologie, Heinrich-Heine-UniversitätDüsseldorf, Germany
| | - Ivo Grosse
- Institut für Informatik, Martin-Luther-Universität Halle-WittenbergHalle (Saale), Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-LeipzigLeipzig, Germany
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University LeipzigLeipzig, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-LeipzigLeipzig, Germany
- Max Planck Institute for Mathematics in the SciencesLeipzig, Germany
- Fraunhofer Institute for Cell Therapy and ImmunologyLeipzig, Germany
- Department of Theoretical Chemistry of the University of ViennaVienna, Austria
- Center for RNA in Technology and Health, University of CopenhagenFrederiksberg, Denmark
- Santa Fe InstituteSanta Fe, USA
- *Correspondence: Peter F. Stadler, Bioinformatics Group, Department of Computer Science, University Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany e-mail:
| |
Collapse
|
7
|
Allmer J, Yousef M. Computational methods for ab initio detection of microRNAs. Front Genet 2012; 3:209. [PMID: 23087705 PMCID: PMC3467617 DOI: 10.3389/fgene.2012.00209] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2012] [Accepted: 09/26/2012] [Indexed: 12/21/2022] Open
Abstract
MicroRNAs are small RNA sequences of 18–24 nucleotides in length, which serve as templates to drive post-transcriptional gene silencing. The canonical microRNA pathway starts with transcription from DNA and is followed by processing via the microprocessor complex, yielding a hairpin structure. Which is then exported into the cytosol where it is processed by Dicer and then incorporated into the RNA-induced silencing complex. All of these biogenesis steps add to the overall specificity of miRNA production and effect. Unfortunately, their modes of action are just beginning to be elucidated and therefore computational prediction algorithms cannot model the process but are usually forced to employ machine learning approaches. This work focuses on ab initio prediction methods throughout; and therefore homology-based miRNA detection methods are not discussed. Current ab initio prediction algorithms, their ties to data mining, and their prediction accuracy are detailed.
Collapse
Affiliation(s)
- Jens Allmer
- Department of Molecular Biology and Genetics, Izmir Institute of Technology Urla, Turkey
| | | |
Collapse
|