1
|
Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024; 2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Bioinformatics and Computational Biology research group, University of Vienna, Vienna, Austria
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad National de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
2
|
Escamilla-Gutiérrez A, Córdova-Espinoza MG, Sánchez-Monciváis A, Tecuatzi-Cadena B, Regalado-García AG, Medina-Quero K. In silico selection of aptamers for bacterial toxins detection. J Biomol Struct Dyn 2023; 41:10909-10918. [PMID: 36546716 DOI: 10.1080/07391102.2022.2159529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 12/10/2022] [Indexed: 12/24/2022]
Abstract
The most commonly used toxins in biological warfare are staphylococcal enterotoxin B (3SEB), cholera toxin (1XTC), and botulinum toxin (3BTA). Uncovering novel strategies for identifying these toxins is paramount; therefore, aptamers are used for this purpose. Aptamers are single-stranded DNA or RNA oligonucleotides selected via Systematic Evolution of Ligands by Exponential Enrichment (SELEX) with high binding affinity and specificity against target molecules. However, SELEX in vitro is tedious; hence, adopting alternative in silico molecular docking approaches is necessary. We aimed to conduct molecular docking with accessible tools and obtain RNA aptamers. First, 4,820,095 sequences obtained from an initial library of 9.5 × 109 Python script sequences were used. The GraphClust program was used to create representative groups or clusters, and the DoGSiteScorer (https://proteins.plus/) was used to conduct binding site detection of the proteins: 5DO4 (thrombin), 3SEB, 1XTC, and 3BTA. rDock, HDock, and PatchDock were adopted, combining different docking program results (consensus scoring), to improve receptor-ligand prediction. An analysis of the poses and root mean square deviation (RMSD) was performed, and 468 structurally different aptamers were obtained. The DoGSiteScorer program predicted the binding site of each protein to direct the interaction with the aptamer. Candidate aptamers for 3SEB, 1XTC, and 3BTA were selected according to the pose value considering the closeness of the interaction with a lower mean of 45.923 Å, 45.854 Å, and 72.490 Å, respectively.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Alejandro Escamilla-Gutiérrez
- Laboratorio de Bacteriología Médica, Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Ciudad de México, México
- Hospital General, Instituto Mexicano del Seguro Social IMSS, Ciudad de México, México
| | - María Guadalupe Córdova-Espinoza
- Laboratorio de Bacteriología Médica, Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Ciudad de México, México
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| | - Anahí Sánchez-Monciváis
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| | - Brenda Tecuatzi-Cadena
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| | - Ana Gabriela Regalado-García
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| | - Karen Medina-Quero
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| |
Collapse
|
3
|
Sato K, Hamada M. Recent trends in RNA informatics: a review of machine learning and deep learning for RNA secondary structure prediction and RNA drug discovery. Brief Bioinform 2023; 24:bbad186. [PMID: 37232359 PMCID: PMC10359090 DOI: 10.1093/bib/bbad186] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/24/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023] Open
Abstract
Computational analysis of RNA sequences constitutes a crucial step in the field of RNA biology. As in other domains of the life sciences, the incorporation of artificial intelligence and machine learning techniques into RNA sequence analysis has gained significant traction in recent years. Historically, thermodynamics-based methods were widely employed for the prediction of RNA secondary structures; however, machine learning-based approaches have demonstrated remarkable advancements in recent years, enabling more accurate predictions. Consequently, the precision of sequence analysis pertaining to RNA secondary structures, such as RNA-protein interactions, has also been enhanced, making a substantial contribution to the field of RNA biology. Additionally, artificial intelligence and machine learning are also introducing technical innovations in the analysis of RNA-small molecule interactions for RNA-targeted drug discovery and in the design of RNA aptamers, where RNA serves as its own ligand. This review will highlight recent trends in the prediction of RNA secondary structure, RNA aptamers and RNA drug discovery using machine learning, deep learning and related technologies, and will also discuss potential future avenues in the field of RNA informatics.
Collapse
Affiliation(s)
- Kengo Sato
- School of System Design and Technology, Tokyo Denki University, 5 Senju Asahi-cho, Adachi-ku, Tokyo 120-8551, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL) , National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Graduate School of Medicine, Nippon Medical School, 1-1-5, Sendagi, Bunkyo-ku, Tokyo 113-8602, Japan
| |
Collapse
|
4
|
Wang H, Lu X, Zheng H, Wang W, Zhang G, Wang S, Lin P, Zhuang Y, Chen C, Chen Q, Qu J, Xu L. RNAsmc: A integrated tool for comparing RNA secondary structure and evaluating allosteric effects. Comput Struct Biotechnol J 2023; 21:965-973. [PMID: 36733704 PMCID: PMC9876829 DOI: 10.1016/j.csbj.2023.01.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 01/06/2023] [Accepted: 01/07/2023] [Indexed: 01/11/2023] Open
Abstract
RNA structure plays a crucial role in gene regulation, in RNA stability and the essential biological processes. RNA secondary structure (RSS) motifs are the basic building blocks for investigating the biological mechanisms of structure. Here, we present a strategy for structural motif-based dynamic alignment, namely, RNA secondary-structural motif-comparing (RNAsmc), to identify structural motifs and quantitatively evaluate their underlying molecular functions. RNAsmc also has strong robustness to sequence length, folding protocol and RNA structural profile by chemical probing. Notably, it is also applicable to quantify structural variation in special RNA editing events (SNVs or SNPs, fragment insertion or deletion, etc.). The findings indicate that RNAsmc can uncover the heterogeneity of RNA secondary structure and score for similarities among components, which provides an impetus to cluster RNA families and evaluate allosteric effects. We find that RNAsmc exhibits remarkable detection efficiency for experimentally-derived RiboSNitches. Finally, the pipeline was assembled into an R software package to serve as an automated toolkit to explore, align, and cluster RSS. It is freely available for download at https://CRAN.R-project.org/package=RNAsmc.
Collapse
Affiliation(s)
- Hong Wang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- Center of Optometry International Innovation of Wenzhou, Eye Valley, Wenzhou 325027, China
| | - Xiaoyan Lu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Hewei Zheng
- Wekemo Tech Group Co., Ltd. Shenzhen 518000, China
| | - Wencan Wang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- Wenzhou Realdata Medical Research Co., Ltd, Wenzhou 325027, China
| | - Guosi Zhang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Siyu Wang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Peng Lin
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Youyuan Zhuang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Chong Chen
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Qi Chen
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Jia Qu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- Center of Optometry International Innovation of Wenzhou, Eye Valley, Wenzhou 325027, China
- Corresponding authors at: National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Liangde Xu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- Center of Optometry International Innovation of Wenzhou, Eye Valley, Wenzhou 325027, China
- Corresponding authors at: National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| |
Collapse
|
5
|
Ono Y, Katayama K, Onuma T, Kubo K, Tsuyuzaki H, Hamada M, Sato M. Structure-based screening for functional non-coding RNAs in fission yeast identifies a factor repressing untimely initiation of sexual differentiation. Nucleic Acids Res 2022; 50:11229-11242. [PMID: 36259651 PMCID: PMC9638895 DOI: 10.1093/nar/gkac825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 09/06/2022] [Accepted: 09/14/2022] [Indexed: 12/04/2022] Open
Abstract
Non-coding RNAs (ncRNAs) ubiquitously exist in normal and cancer cells. Despite their prevalent distribution, the functions of most long ncRNAs remain uncharacterized. The fission yeast Schizosaccharomyces pombe expresses >1800 ncRNAs annotated to date, but most unconventional ncRNAs (excluding tRNA, rRNA, snRNA and snoRNA) remain uncharacterized. To discover the functional ncRNAs, here we performed a combinatory screening of computational and biological tests. First, all S. pombe ncRNAs were screened in silico for those showing conservation in sequence as well as in secondary structure with ncRNAs in closely related species. Almost a half of the 151 selected conserved ncRNA genes were uncharacterized. Twelve ncRNA genes that did not overlap with protein-coding sequences were next chosen for biological screening that examines defects in growth or sexual differentiation, as well as sensitivities to drugs and stresses. Finally, we highlighted an ncRNA transcribed from SPNCRNA.1669, which inhibited untimely initiation of sexual differentiation. A domain that was predicted as conserved secondary structure by the computational operations was essential for the ncRNA to function. Thus, this study demonstrates that in silico selection focusing on conservation of the secondary structure over species is a powerful method to pinpoint novel functional ncRNAs.
Collapse
Affiliation(s)
- Yu Ono
- Laboratory of Cytoskeletal Logistics, Department of Life Science and Medical Bioscience, School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan
| | - Kenta Katayama
- Laboratory of Cytoskeletal Logistics, Department of Life Science and Medical Bioscience, School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
| | - Tomoki Onuma
- Laboratory of Cytoskeletal Logistics, Department of Life Science and Medical Bioscience, School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan
| | - Kento Kubo
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan.,Bioinformatics Laboratory, Department of Electrical Engineering and Bioscience, School of Advanced Science and Engineering, Waseda University, 3-4-1 Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Hayato Tsuyuzaki
- Laboratory of Cytoskeletal Logistics, Department of Life Science and Medical Bioscience, School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
| | - Michiaki Hamada
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan.,Bioinformatics Laboratory, Department of Electrical Engineering and Bioscience, School of Advanced Science and Engineering, Waseda University, 3-4-1 Okubo Shinjuku-ku, Tokyo 169-8555, Japan.,Institute for Medical-oriented Structural Biology, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan
| | - Masamitsu Sato
- Laboratory of Cytoskeletal Logistics, Department of Life Science and Medical Bioscience, School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan.,Institute for Medical-oriented Structural Biology, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan.,Institute for Advanced Research of Biosystem Dynamics, Waseda Research Institute for Science and Engineering, Graduate School of Advanced Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
| |
Collapse
|
6
|
Akiyama M, Sakakibara Y. Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning. NAR Genom Bioinform 2022; 4:lqac012. [PMID: 35211670 PMCID: PMC8862729 DOI: 10.1093/nargab/lqac012] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 01/08/2022] [Accepted: 02/05/2022] [Indexed: 01/17/2023] Open
Abstract
Effective embedding is actively conducted by applying deep learning to biomolecular information. Obtaining better embeddings enhances the quality of downstream analyses, such as DNA sequence motif detection and protein function prediction. In this study, we adopt a pre-training algorithm for the effective embedding of RNA bases to acquire semantically rich representations and apply this algorithm to two fundamental RNA sequence problems: structural alignment and clustering. By using the pre-training algorithm to embed the four bases of RNA in a position-dependent manner using a large number of RNA sequences from various RNA families, a context-sensitive embedding representation is obtained. As a result, not only base information but also secondary structure and context information of RNA sequences are embedded for each base. We call this ‘informative base embedding’ and use it to achieve accuracies superior to those of existing state-of-the-art methods on RNA structural alignment and RNA family clustering tasks. Furthermore, upon performing RNA sequence alignment by combining this informative base embedding with a simple Needleman–Wunsch alignment algorithm, we succeed in calculating structural alignments with a time complexity of O(n2) instead of the O(n6) time complexity of the naive implementation of Sankoff-style algorithm for input RNA sequence of length n.
Collapse
Affiliation(s)
- Manato Akiyama
- Department of Biosciences and Informatics, Keio University, 223-8522, Japan
| | | |
Collapse
|
7
|
Rouse WB, Andrews RJ, Booher NJ, Wang J, Woodman M, Dow E, Jessop TC, Moss WN. Prediction and analysis of functional RNA structures within the integrative genomics viewer. NAR Genom Bioinform 2022; 4:lqab127. [PMID: 35047817 PMCID: PMC8759568 DOI: 10.1093/nargab/lqab127] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 12/03/2021] [Accepted: 12/22/2021] [Indexed: 12/14/2022] Open
Abstract
In recent years, interest in RNA secondary structure has exploded due to its implications in almost all biological functions and its newly appreciated capacity as a therapeutic agent/target. This surge of interest has driven the development and adaptation of many computational and biochemical methods to discover novel, functional structures across the genome/transcriptome. To further enhance efforts to study RNA secondary structure, we have integrated the functional secondary structure prediction tool ScanFold, into IGV. This allows users to directly perform structure predictions and visualize results—in conjunction with probing data and other annotations—in one program. We illustrate the utility of this new tool by mapping the secondary structural landscape of the human MYC precursor mRNA. We leverage the power of vast ‘omics’ resources by comparing individually predicted structures with published data including: biochemical structure probing, RNA binding proteins, microRNA binding sites, RNA modifications, single nucleotide polymorphisms, and others that allow functional inferences to be made and aid in the discovery of potential drug targets. This new tool offers the RNA community an easy to use tool to find, analyze, and characterize RNA secondary structures in the context of all available data, in order to find those worthy of further analyses.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Walter N Moss
- To whom correspondence should be addressed. Tel: +1 515 294 6116;
| |
Collapse
|
8
|
Wolff J, Backofen R, Grüning B. Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs. Bioinformatics 2021; 37:4006-4013. [PMID: 34021764 PMCID: PMC9502147 DOI: 10.1093/bioinformatics/btab394] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 05/03/2021] [Accepted: 05/19/2021] [Indexed: 11/23/2022] Open
Abstract
Motivation Hi-C technology provides insights into the 3D organization of the chromatin, and the single-cell Hi-C method enables researchers to gain knowledge about the chromatin state in individual cell levels. Single-cell Hi-C interaction matrices are high dimensional and very sparse. To cluster thousands of single-cell Hi-C interaction matrices, they are flattened and compiled into one matrix. Depending on the resolution, this matrix can have a few million or even billions of features; therefore, computations can be memory intensive. We present a single-cell Hi-C clustering approach using an approximate nearest neighbors method based on locality-sensitive hashing to reduce the dimensions and the computational resources. Results The presented method can process a 10 kb single-cell Hi-C dataset with 2600 cells and needs 40 GB of memory, while competitive approaches are not computable even with 1 TB of memory. It can be shown that the differentiation of the cells by their chromatin folding properties and, therefore, the quality of the clustering of single-cell Hi-C data is advantageous compared to competitive algorithms. Availability and implementation The presented clustering algorithm is part of the scHiCExplorer, is available on Github https://github.com/joachimwolff/scHiCExplorer, and as a conda package via the bioconda channel. The approximate nearest neighbors implementation is available via https://github.com/joachimwolff/sparse-neighbors-search and as a conda package via the bioconda channel. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joachim Wolff
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany.,Signalling Research Centre CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| |
Collapse
|
9
|
Li Y, Zhang Q, Liu Z, Wang C, Han S, Ma Q, Du W. Deep forest ensemble learning for classification of alignments of non-coding RNA sequences based on multi-view structure representations. Brief Bioinform 2020; 22:6046058. [PMID: 33367506 PMCID: PMC8294561 DOI: 10.1093/bib/bbaa354] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 11/02/2020] [Indexed: 11/13/2022] Open
Abstract
Non-coding RNAs (ncRNAs) play crucial roles in multiple biological processes. However, only a few ncRNAs’ functions have been well studied. Given the significance of ncRNAs classification for understanding ncRNAs’ functions, more and more computational methods have been introduced to improve the classification automatically and accurately. In this paper, based on a convolutional neural network and a deep forest algorithm, multi-grained cascade forest (GcForest), we propose a novel deep fusion learning framework, GcForest fusion method (GCFM), to classify alignments of ncRNA sequences for accurate clustering of ncRNAs. GCFM integrates a multi-view structure feature representation including sequence-structure alignment encoding, structure image representation and shape alignment encoding of structural subunits, enabling us to capture the potential specificity between ncRNAs. For the classification of pairwise alignment of two ncRNA sequences, the F-value of GCFM improves 6% than an existing alignment-based method. Furthermore, the clustering of ncRNA families is carried out based on the classification matrix generated from GCFM. Results suggest better performance (with 20% accuracy improved) than existing ncRNA clustering methods (RNAclust, Ensembleclust and CNNclust). Additionally, we apply GCFM to construct a phylogenetic tree of ncRNA and predict the probability of interactions between RNAs. Most ncRNAs are located correctly in the phylogenetic tree, and the prediction accuracy of RNA interaction is 90.63%. A web server (http://bmbl.sdstate.edu/gcfm/) is developed to maximize its availability, and the source code and related data are available at the same URL.
Collapse
Affiliation(s)
- Ying Li
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Qi Zhang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Zhaoqian Liu
- School of Mathematics, Shandong University, and now she is a visiting scholar at Ohio State University
| | | | - Siyu Han
- Department of Computer Science, Faculty of Engineering, University of Bristol
| | - Qin Ma
- Department of Biomedical Informatics, Ohio State University
| | - Wei Du
- College of Computer Science and Technology, Jilin University, Changchun, China
| |
Collapse
|
10
|
Schmidt M, Hamacher K, Reinhardt F, Lotz TS, Groher F, Suess B, Jager S. SICOR: Subgraph Isomorphism Comparison of RNA Secondary Structures. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2189-2195. [PMID: 31295116 DOI: 10.1109/tcbb.2019.2926711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
RNA aptamer selection during SELEX experiments builds on secondary structural diversity. Advanced structural comparison methods can focus this diversity. We develop SICOR, which uses probabilistic subgraph isomorphisms for graph distances between RNA secondary structure graphs. SICOR outperforms other comparison methods and is applicable to many structural comparisons in experimental design.
Collapse
|
11
|
Shi T, Niu G, Kvitt H, Zheng X, Qin Q, Sun D, Ji Z, Tchernov D. Untangling ITS2 genotypes of algal symbionts in zooxanthellate corals. Mol Ecol Resour 2020; 21:137-152. [PMID: 32876380 DOI: 10.1111/1755-0998.13250] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2020] [Revised: 08/10/2020] [Accepted: 08/17/2020] [Indexed: 11/28/2022]
Abstract
Collectively called zooxanthellae, photosynthetic dinoflagellates in the family Symbiodiniaceae are typical endosymbionts that unequivocally mediate coral responses to environmental changes. Symbiodiniaceae are genetically diverse, encompassing at least nine phylogenetically distinct genera (clades A-I). The ribosomal internal transcribed spacer 2 (ITS2) region is commonly utilized for determining Symbiodiniaceae diversity within clades. However, ITS2 is often inadvertently interpreted together with the tailing part of the ribosomal RNA genes (5.8S and 28S or equivalent), leading to unresolved taxonomy and equivocal annotations. To overcome this hurdle, we mined in GenBank and expert reference databases for ITS2 sequences of Symbiodiniaceae having explicit boundaries with adjacent rRNAs. We profiled a Hidden Markov Model of the ITS2-proximal 5.8S-28S rRNA interaction, which was shown to facilitate the delimitation of Symbiodiniaceae ITS2 from GenBank, while considerably reducing sequence ambiguity and redundancy in reference databases. The delineation of ITS2 sequences unveiled intra-clade sequence diversity and inter-clade secondary structure conservation. We compiled the clean data into a non-redundant database that archives the largest number of Symbiodiniaceae ITS2 sequences known to date with definite genotype/subclade representations and well-defined secondary structures. This database provides a fundamental reference catalog for consistent and precise genotyping of Symbiodiniaceae and a tool for automated annotation of user-supplied sequences.
Collapse
Affiliation(s)
- Tuo Shi
- Marine Genomics and Biotechnology Program, Institute of Marine Science and Technology, Shandong University, Qingdao, P. R. China.,State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, P. R. China
| | - Gaofeng Niu
- Marine Genomics and Biotechnology Program, Institute of Marine Science and Technology, Shandong University, Qingdao, P. R. China.,Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, P. R. China
| | - Hagit Kvitt
- Marine Biology Department, The Leon H. Charney School of Marine Sciences, University of Haifa, Haifa, Israel.,Israel Oceanographic and Limnological Research, National Center for Mariculture, Eilat, Israel
| | - Xinqing Zheng
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, P. R. China
| | - Qiaoyun Qin
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, P. R. China
| | - Danye Sun
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, P. R. China
| | - Zhiliang Ji
- School of Life Sciences, Xiamen University, Xiamen, P. R. China
| | - Dan Tchernov
- Marine Biology Department, The Leon H. Charney School of Marine Sciences, University of Haifa, Haifa, Israel
| |
Collapse
|
12
|
Oliver C, Mallet V, Gendron RS, Reinharz V, Hamilton W, Moitessier N, Waldispühl J. Augmented base pairing networks encode RNA-small molecule binding preferences. Nucleic Acids Res 2020; 48:7690-7699. [PMID: 32652015 PMCID: PMC7430648 DOI: 10.1093/nar/gkaa583] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 06/23/2020] [Accepted: 07/08/2020] [Indexed: 12/14/2022] Open
Abstract
RNA-small molecule binding is a key regulatory mechanism which can stabilize 3D structures and activate molecular functions. The discovery of RNA-targeting compounds is thus a current topic of interest for novel therapies. Our work is a first attempt at bringing the scalability and generalization abilities of machine learning methods to the problem of RNA drug discovery, as well as a step towards understanding the interactions which drive binding specificity. Our tool, RNAmigos, builds and encodes a network representation of RNA structures to predict likely ligands for novel binding sites. We subject ligand predictions to virtual screening and show that we are able to place the true ligand in the 71st-73rd percentile in two decoy libraries, showing a significant improvement over several baselines, and a state of the art method. Furthermore, we observe that augmenting structural networks with non-canonical base pairing data is the only representation able to uncover a significant signal, suggesting that such interactions are a necessary source of binding specificity. We also find that pre-training with an auxiliary graph representation learning task significantly boosts performance of ligand prediction. This finding can serve as a general principle for RNA structure-function prediction when data is scarce. RNAmigos shows that RNA binding data contains structural patterns with potential for drug discovery, and provides methodological insights for possible applications to other structure-function learning tasks. The source code, data and a Web server are freely available at http://rnamigos.cs.mcgill.ca.
Collapse
Affiliation(s)
- Carlos Oliver
- School of Computer Science, McGill University, Montreal H3A 0E9, Canada
- Mila - Quebec Artificial Intelligence Institute, H2S 3S1, Canada
| | - Vincent Mallet
- Institut Pasteur, Structural Bioinformatics Unit, Paris, F-75015, France
- MINES ParisTech, PSL Research University, CBIO - Centre for Computational Biology, F-75006 Paris, France
| | | | - Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Montreal H2X 3Y7, Canada
| | - William L Hamilton
- School of Computer Science, McGill University, Montreal H3A 0E9, Canada
- Mila - Quebec Artificial Intelligence Institute, H2S 3S1, Canada
| | | | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montreal H3A 0E9, Canada
| |
Collapse
|
13
|
Buongermino Pereira M, Österlund T, Eriksson KM, Backhaus T, Axelson-Fisk M, Kristiansson E. A comprehensive survey of integron-associated genes present in metagenomes. BMC Genomics 2020; 21:495. [PMID: 32689930 PMCID: PMC7370490 DOI: 10.1186/s12864-020-06830-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 06/15/2020] [Indexed: 12/19/2022] Open
Abstract
Background Integrons are genomic elements that mediate horizontal gene transfer by inserting and removing genetic material using site-specific recombination. Integrons are commonly found in bacterial genomes, where they maintain a large and diverse set of genes that plays an important role in adaptation and evolution. Previous studies have started to characterize the wide range of biological functions present in integrons. However, the efforts have so far mainly been limited to genomes from cultivable bacteria and amplicons generated by PCR, thus targeting only a small part of the total integron diversity. Metagenomic data, generated by direct sequencing of environmental and clinical samples, provides a more holistic and unbiased analysis of integron-associated genes. However, the fragmented nature of metagenomic data has previously made such analysis highly challenging. Results Here, we present a systematic survey of integron-associated genes in metagenomic data. The analysis was based on a newly developed computational method where integron-associated genes were identified by detecting their associated recombination sites. By processing contiguous sequences assembled from more than 10 terabases of metagenomic data, we were able to identify 13,397 unique integron-associated genes. Metagenomes from marine microbial communities had the highest occurrence of integron-associated genes with levels more than 100-fold higher than in the human microbiome. The identified genes had a large functional diversity spanning over several functional classes. Genes associated with defense mechanisms and mobility facilitators were most overrepresented and more than five times as common in integrons compared to other bacterial genes. As many as two thirds of the genes were found to encode proteins of unknown function. Less than 1% of the genes were associated with antibiotic resistance, of which several were novel, previously undescribed, resistance gene variants. Conclusions Our results highlight the large functional diversity maintained by integrons present in unculturable bacteria and significantly expands the number of described integron-associated genes.
Collapse
Affiliation(s)
- Mariana Buongermino Pereira
- Department of Mathematical Sciences, Chalmers University of Technology, Gothenburg, Sweden.,Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Gothenburg, Sweden
| | - Tobias Österlund
- Department of Mathematical Sciences, Chalmers University of Technology, Gothenburg, Sweden.,Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Gothenburg, Sweden
| | - K Martin Eriksson
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden.,Gothenburg Centre for Sustainable Development, Chalmers University of Technology, Gothenburg, Sweden
| | - Thomas Backhaus
- Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Gothenburg, Sweden.,Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
| | - Marina Axelson-Fisk
- Department of Mathematical Sciences, Chalmers University of Technology, Gothenburg, Sweden
| | - Erik Kristiansson
- Department of Mathematical Sciences, Chalmers University of Technology, Gothenburg, Sweden. .,Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Gothenburg, Sweden.
| |
Collapse
|
14
|
Ram-Mohan N, Meyer MM. Comparative Metatranscriptomics of Periodontitis Supports a Common Polymicrobial Shift in Metabolic Function and Identifies Novel Putative Disease-Associated ncRNAs. Front Microbiol 2020; 11:482. [PMID: 32328037 PMCID: PMC7160235 DOI: 10.3389/fmicb.2020.00482] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 03/05/2020] [Indexed: 01/08/2023] Open
Abstract
Periodontitis is an inflammatory disease that deteriorates bone supporting teeth afflicting ∼743 million people worldwide. Bacterial communities associated with disease have been classified into red, orange, purple, blue, green, and yellow complexes based on their roles in the periodontal pocket. Previous metagenomic and metatranscriptomics analyses suggest a common shift in metabolic signatures in disease vs. healthy communities with up-regulated processes including pyruvate fermentation, histidine degradation, amino acid metabolism, TonB-dependent receptors. In this work, we examine existing metatranscriptome datasets to identify the commonly differentially expressed transcripts and potential underlying RNA regulatory mechanisms behind the metabolic shifts. Raw RNA-seq reads from three studies (including 49 healthy and 48 periodontitis samples) were assembled into transcripts de novo. Analyses revealed 859 differentially expressed (DE) transcripts, 675 more- and 174 less-expressed. Only ∼20% of the DE transcripts originate from the pathogenic red/orange complexes, and ∼50% originate from organisms unaffiliated with a complex. Comparison of expression profiles revealed variations among disease samples; while specific metabolic processes are commonly up-regulated, the underlying organisms are diverse both within and across disease associated communities. Surveying DE transcripts for known ncRNAs from the Rfam database identified a large number of tRNAs and tmRNAs as well as riboswitches (FMN, glycine, lysine, and SAM) in more prevalent transcripts and the cobalamin riboswitch in both more and less prevalent transcripts. In silico discovery identified many putative ncRNAs in DE transcripts. We report 15 such putative ncRNAs having promising covariation in the predicted secondary structure and interesting genomic context. Seven of these are antisense of ribosomal proteins that are novel and may involve maintaining ribosomal protein stoichiometry during the disease associated metabolic shift. Our findings describe the role of organisms previously unaffiliated with disease and identify the commonality in progression of disease across three metatranscriptomic studies. We find that although the communities are diverse between individuals, the switch in metabolic signatures characteristic of disease is typically achieved through the contributions of several community members. Furthermore, we identify many ncRNAs (both known and putative) which may facilitate the metabolic shifts associated with periodontitis.
Collapse
Affiliation(s)
- Nikhil Ram-Mohan
- Department of Biology, Boston College, Chestnut Hill, MA, United States
| | - Michelle M Meyer
- Department of Biology, Boston College, Chestnut Hill, MA, United States
| |
Collapse
|
15
|
Miladi M, Sokhoyan E, Houwaart T, Heyne S, Costa F, Grüning B, Backofen R. GraphClust2: Annotation and discovery of structured RNAs with scalable and accessible integrative clustering. Gigascience 2019; 8:giz150. [PMID: 31808801 PMCID: PMC6897289 DOI: 10.1093/gigascience/giz150] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 08/23/2019] [Accepted: 11/20/2019] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND RNA plays essential roles in all known forms of life. Clustering RNA sequences with common sequence and structure is an essential step towards studying RNA function. With the advent of high-throughput sequencing techniques, experimental and genomic data are expanding to complement the predictive methods. However, the existing methods do not effectively utilize and cope with the immense amount of data becoming available. RESULTS Hundreds of thousands of non-coding RNAs have been detected; however, their annotation is lagging behind. Here we present GraphClust2, a comprehensive approach for scalable clustering of RNAs based on sequence and structural similarities. GraphClust2 bridges the gap between high-throughput sequencing and structural RNA analysis and provides an integrative solution by incorporating diverse experimental and genomic data in an accessible manner via the Galaxy framework. GraphClust2 can efficiently cluster and annotate large datasets of RNAs and supports structure-probing data. We demonstrate that the annotation performance of clustering functional RNAs can be considerably improved. Furthermore, an off-the-shelf procedure is introduced for identifying locally conserved structure candidates in long RNAs. We suggest the presence and the sparseness of phylogenetically conserved local structures for a collection of long non-coding RNAs. CONCLUSIONS By clustering data from 2 cross-linking immunoprecipitation experiments, we demonstrate the benefits of GraphClust2 for motif discovery under the presence of biological and methodological biases. Finally, we uncover prominent targets of double-stranded RNA binding protein Roquin-1, such as BCOR's 3' untranslated region that contains multiple binding stem-loops that are evolutionary conserved.
Collapse
Affiliation(s)
- Milad Miladi
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Eteri Sokhoyan
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Torsten Houwaart
- Institute of Medical Microbiology and Hospital Hygiene, University of Dusseldorf, Universitaetsstr. 1, 40225 Dusseldorf, Germany
| | - Steffen Heyne
- Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Stuebeweg 51, 79108 Freiburg, Germany
| | - Fabrizio Costa
- Department of Computer Science, University of Exeter, North Park Road, EX4 4QF Exeter, UK
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
- ZBSA Centre for Biological Systems Analysis, University of Freiburg, Hauptstr. 1, 79104 Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
- ZBSA Centre for Biological Systems Analysis, University of Freiburg, Hauptstr. 1, 79104 Freiburg, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany
| |
Collapse
|
16
|
Crum M, Ram-Mohan N, Meyer MM. Regulatory context drives conservation of glycine riboswitch aptamers. PLoS Comput Biol 2019; 15:e1007564. [PMID: 31860665 PMCID: PMC6944388 DOI: 10.1371/journal.pcbi.1007564] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 01/06/2020] [Accepted: 11/25/2019] [Indexed: 12/13/2022] Open
Abstract
In comparison to protein coding sequences, the impact of mutation and natural selection on the sequence and function of non-coding (ncRNA) genes is not well understood. Many ncRNA genes are narrowly distributed to only a few organisms, and appear to be rapidly evolving. Compared to protein coding sequences, there are many challenges associated with assessment of ncRNAs that are not well addressed by conventional phylogenetic approaches, including: short sequence length, lack of primary sequence conservation, and the importance of secondary structure for biological function. Riboswitches are structured ncRNAs that directly interact with small molecules to regulate gene expression in bacteria. They typically consist of a ligand-binding domain (aptamer) whose folding changes drive changes in gene expression. The glycine riboswitch is among the most well-studied due to the widespread occurrence of a tandem aptamer arrangement (tandem), wherein two homologous aptamers interact with glycine and each other to regulate gene expression. However, a significant proportion of glycine riboswitches are comprised of single aptamers (singleton). Here we use graph clustering to circumvent the limitations of traditional phylogenetic analysis when studying the relationship between the tandem and singleton glycine aptamers. Graph clustering enables a broader range of pairwise comparison measures to be used to assess aptamer similarity. Using this approach, we show that one aptamer of the tandem glycine riboswitch pair is typically much more highly conserved, and that which aptamer is conserved depends on the regulated gene. Furthermore, our analysis also reveals that singleton aptamers are more similar to either the first or second tandem aptamer, again based on the regulated gene. Taken together, our findings suggest that tandem glycine riboswitches degrade into functional singletons, with the regulated gene(s) dictating which glycine-binding aptamer is conserved.
Collapse
Affiliation(s)
- Matt Crum
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| | - Nikhil Ram-Mohan
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| | - Michelle M. Meyer
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| |
Collapse
|
17
|
Mapping the RNA structural landscape of viral genomes. Methods 2019; 183:57-67. [PMID: 31711930 DOI: 10.1016/j.ymeth.2019.11.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 10/13/2019] [Accepted: 11/05/2019] [Indexed: 12/26/2022] Open
Abstract
Functional RNA structures are prevalent in viral genomes, and have been shown to play roles in almost every aspect of their biology. However, the majority of viral RNA remains structurally uncharacterized. This is likely to remain true as the cost of sequencing decreases much faster than the cost of structural characterizations. Because of this, there is a need for rapid, inexpensive methods to highlight regions of viral RNA which are ideal candidates for structure-function analyses. The ScanFold method was developed as a single sequence alternative to traditional RNA structural motif discovery pipelines, which rely heavily on well curated sequence alignments to identify conserved RNA structures. ScanFold focuses on identifying (based on their more stable than expected folding energies) the most likely functional structures encoded within a single large RNA sequence, while allowing predicted motifs to be tested for evidence of structural conservation later. Decoupling these processes can be a benefit to researchers studying viruses lacking the ideal phylogenetic depth to yield evidence of structural conservation. Here, we demonstrate how the most significant ScanFold predicted structures correspond to higher base pairing probabilities, SHAPE reactivities, and predict known functional structures within the ZIKV and HIV-1 genomes with accuracy. Best practices and examples are also shown to aid users in utilizing ScanFold for their own systems of interest. ScanFold is available as a Webserver (https://mosslabtools.bb.iastate.edu/scanfold) or can be downloaded (https://github.com/moss-lab/ScanFold) and run locally.
Collapse
|
18
|
Glouzon JPS, Ouangraoua A. aliFreeFold: an alignment-free approach to predict secondary structure from homologous RNA sequences. Bioinformatics 2019; 34:i70-i78. [PMID: 29949960 PMCID: PMC6022685 DOI: 10.1093/bioinformatics/bty234] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Motivation Predicting the conserved secondary structure of homologous ribonucleic acid (RNA) sequences is crucial for understanding RNA functions. However, fast and accurate RNA structure prediction is challenging, especially when the number and the divergence of homologous RNA increases. To address this challenge, we propose aliFreeFold, based on a novel alignment-free approach which computes a representative structure from a set of homologous RNA sequences using sub-optimal secondary structures generated for each sequence. It is based on a vector representation of sub-optimal structures capturing structure conservation signals by weighting structural motifs according to their conservation across the sub-optimal structures. Results We demonstrate that aliFreeFold provides a good balance between speed and accuracy regarding predictions of representative structures for sets of homologous RNA compared to traditional methods based on sequence and structure alignment. We show that aliFreeFold is capable of uncovering conserved structural features fastly and effectively thanks to its weighting scheme that gives more (resp. less) importance to common (resp. uncommon) structural motifs. The weighting scheme is also shown to be capable of capturing conservation signal as the number of homologous RNA increases. These results demonstrate the ability of aliFreefold to efficiently and accurately provide interesting structural representatives of RNA families. Availability and implementation aliFreeFold was implemented in C++. Source code and Linux binary are freely available at https://github.com/UdeS-CoBIUS/aliFreeFold. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Aïda Ouangraoua
- Department of Computer Science, University of Sherbrooke, Sherbrooke, QC, Canada
| |
Collapse
|
19
|
Aoki G, Sakakibara Y. Convolutional neural networks for classification of alignments of non-coding RNA sequences. Bioinformatics 2019; 34:i237-i244. [PMID: 29949978 PMCID: PMC6022636 DOI: 10.1093/bioinformatics/bty228] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Motivation The convolutional neural network (CNN) has been applied to the classification problem of DNA sequences, with the additional purpose of motif discovery. The training of CNNs with distributed representations of four nucleotides has successfully derived position weight matrices on the learned kernels that corresponded to sequence motifs such as protein-binding sites. Results We propose a novel application of CNNs to classification of pairwise alignments of sequences for accurate clustering of sequences and show the benefits of the CNN method of inputting pairwise alignments for clustering of non-coding RNA (ncRNA) sequences and for motif discovery. Classification of a pairwise alignment of two sequences into positive and negative classes corresponds to the clustering of the input sequences. After we combined the distributed representation of RNA nucleotides with the secondary-structure information specific to ncRNAs and furthermore with mapping profiles of next-generation sequence reads, the training of CNNs for classification of alignments of RNA sequences yielded accurate clustering in terms of ncRNA families and outperformed the existing clustering methods for ncRNA sequences. Several interesting sequence motifs and secondary-structure motifs known for the snoRNA family and specific to microRNA and tRNA families were identified. Availability and implementation The source code of our CNN software in the deep-learning framework Chainer is available at http://www.dna.bio.keio.ac.jp/cnn/, and the dataset used for performance evaluation in this work is available at the same URL.
Collapse
Affiliation(s)
- Genta Aoki
- Department of Biosciences and Informatics, Keio University, Yokohama, Japan
| | | |
Collapse
|
20
|
Andrews RJ, Moss WN. Computational approaches for the discovery of splicing regulatory RNA structures. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1862:194380. [PMID: 31048028 DOI: 10.1016/j.bbagrm.2019.04.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 04/15/2019] [Accepted: 04/16/2019] [Indexed: 12/14/2022]
Abstract
Global RNA structure and local functional motifs mediate interactions important in determining the rates and patterns of mRNA splicing. In this review, we overview approaches for the computational prediction of RNA secondary structure with a special emphasis on the discovery of motifs important to RNA splicing. The process of identifying and modeling potential splicing regulatory structures is illustrated using a recently-developed approach for RNA structural motif discovery, the ScanFold pipeline, which is applied to the identification of a known splicing regulatory structure in influenza virus.
Collapse
Affiliation(s)
- Ryan J Andrews
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, 2437 Pammel Drive, Ames, IA 50011, USA
| | - Walter N Moss
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, 2437 Pammel Drive, Ames, IA 50011, USA.
| |
Collapse
|
21
|
Chen X, Castro SA, Liu Q, Hu W, Zhang S. Practical considerations on performing and analyzing CLIP-seq experiments to identify transcriptomic-wide RNA-protein interactions. Methods 2019; 155:49-57. [PMID: 30527764 PMCID: PMC6387833 DOI: 10.1016/j.ymeth.2018.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Revised: 11/27/2018] [Accepted: 12/03/2018] [Indexed: 10/27/2022] Open
Abstract
RNA-binding proteins are important players in post-transcriptional regulation, such as modulating mRNA splicing, translation, and degradation under diverse biological settings. Identifying and characterizing the RNA substrates is a critical step in deciphering the function and molecular mechanisms of the target RNA-binding proteins. High-throughput sequencing of the RNA fragments isolated by crosslinking immunoprecipitation (CLIP-seq) is one of the standard techniques to identify the in vivo transcriptome-wide binding sites of the target RNA-binding protein. This method is widely used in functional and mechanistic characterizations of RNA-binding proteins. In this review, we provide several practical considerations on performing and analyzing CLIP-seq experiments. Particularly, we focus on how to perform CLIP-seq experiments on endogenous RNA-binding proteins. In addition, we provide a practical summary on how to choose and use computational pipelines from an increasing number of computational methods and packages that are available for analyzing the sequencing datasets from the CLIP-seq experiments. We hope these practical considerations will facilitate experimental biologists in performing and analyzing CLIP-seq experiment to obtain biologically relevant mechanistic insights.
Collapse
Affiliation(s)
- Xiaoli Chen
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Sarah A Castro
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN 55905, USA
| | - Qiuying Liu
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN 55905, USA
| | - Wenqian Hu
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN 55905, USA.
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA.
| |
Collapse
|
22
|
isoTar: Consensus Target Prediction with Enrichment Analysis for MicroRNAs Harboring Editing Sites and Other Variations. Methods Mol Biol 2019; 1970:211-235. [PMID: 30963495 DOI: 10.1007/978-1-4939-9207-2_12] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
MicroRNAs (miRNAs) are small noncoding RNA molecules (sncRNAs) involved in gene expression regulation. Having been widely studied during last two decades, they have been associated with several diseases, including cancer. Recent improvements in high throughput sequencing technologies have revealed a more complex miRNAome, due to miRNA sequence modification phenomena, such as RNA editing and isomiRs. As a result, a new class of tools is necessary in order to appropriately investigate this emerging complexity. To address such need, we developed isoTar, a high-performance Web-based containerized application designed for miRNA consensus targeting prediction and functional enrichment analyses. In the present chapter, we provide an overview of isoTar ( https://ncrnaome.osumc.edu/isotar/ ), as well as benchmarks and a guide to its usage.
Collapse
|
23
|
Veneziano D, Marceca GP, Di Bella S, Nigita G, Distefano R, Croce CM. Investigating miRNA-lncRNA Interactions: Computational Tools and Resources. Methods Mol Biol 2019; 1970:251-277. [PMID: 30963497 DOI: 10.1007/978-1-4939-9207-2_14] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
In the last two decades noncoding RNAs have been the recipients of increasing scientific interest. In particular, miRNAs, short (~22 nts) noncoding transcripts, have been thoroughly investigated since their essential role in posttranscriptional gene expression regulation had been established in the early 2000s. With the advent and the advancements of high-throughput sequencing technologies in recent years, long noncoding RNAs have also started to emerge as important actors in cellular functions and processes. Such transcripts, on average longer than 200 nt, whose functions have yet to be fully characterized, have recently been identified as regulatory elements of the RNAi pathway, harboring several miRNA response elements, uncovering the phenomena of competing endogenous RNAs (ceRNAs), or "sponge RNAs." The present chapter aims to provide a brief update on the actual biomedical relevance of ceRNAs, together with a summary of resources, tools, and practical examples of their application to aid researchers in the discovery and further elucidation of lncRNA-miRNA interactions.
Collapse
Affiliation(s)
- Dario Veneziano
- Department of Cancer Biology and Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA.
| | - Gioacchino P Marceca
- Department of Cancer Biology and Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | | | - Giovanni Nigita
- Department of Cancer Biology and Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA
| | - Rosario Distefano
- Department of Cancer Biology and Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA
| | - Carlo M Croce
- Department of Cancer Biology and Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA
| |
Collapse
|
24
|
Glouzon JPS, Perreault JP, Wang S. Structurexplor: a platform for the exploration of structural features of RNA secondary structures. Bioinformatics 2018; 33:3117-3120. [PMID: 28575203 DOI: 10.1093/bioinformatics/btx323] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 05/26/2017] [Indexed: 11/14/2022] Open
Abstract
Summary Discovering function-related structural features, such as the cloverleaf shape of transfer RNA secondary structures, is essential to understand RNA function. With this aim, we have developed a platform, named Structurexplor, to facilitate the exploration of structural features in populations of RNA secondary structures. It has been designed and developed to help biologists interactively search for, evaluate and select interesting structural features that can potentially explain RNA functions. Availability and implementation Structurxplor is a web application available at http://structurexplor.dinf.usherbrooke.ca. The source code can be found at http://jpsglouzon.github.io/structurexplor/. Contact shengrui.wang@usherbrooke.ca. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jean-Pierre Séhi Glouzon
- Department of Computer Science, Faculty of Science, Université de Sherbrooke, Sherbrooke, QC, J1K 2R1 Canada.,RNA Group, Department of Biochemistry, Faculty of Medicine and Health Sciences, Applied Cancer Research Pavilion, Université de Sherbrooke, Sherbrooke, QC, J1K 2R1, Canada
| | - Jean-Pierre Perreault
- RNA Group, Department of Biochemistry, Faculty of Medicine and Health Sciences, Applied Cancer Research Pavilion, Université de Sherbrooke, Sherbrooke, QC, J1K 2R1, Canada
| | - Shengrui Wang
- Department of Computer Science, Faculty of Science, Université de Sherbrooke, Sherbrooke, QC, J1K 2R1 Canada
| |
Collapse
|
25
|
Phylogenomic and comparative analysis of the distribution and regulatory patterns of TPP riboswitches in fungi. Sci Rep 2018; 8:5563. [PMID: 29615754 PMCID: PMC5882874 DOI: 10.1038/s41598-018-23900-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Accepted: 03/20/2018] [Indexed: 01/01/2023] Open
Abstract
Riboswitches are metabolite or ion sensing cis-regulatory elements that regulate the expression of the associated genes involved in biosynthesis or transport of the corresponding metabolite. Among the nearly 40 different classes of riboswitches discovered in bacteria so far, only the TPP riboswitch has also been found in algae, plants, and in fungi where their presence has been experimentally validated in a few instances. We analyzed all the available complete fungal and related genomes and identified TPP riboswitch-based regulation systems in 138 fungi and 15 oomycetes. We find that TPP riboswitches are most abundant in Ascomycota and Basidiomycota where they regulate TPP biosynthesis and/or transporter genes. Many of these transporter genes were found to contain conserved domains consistent with nucleoside, urea and amino acid transporter gene families. The genomic location of TPP riboswitches when correlated with the intron structure of the regulated genes enabled prediction of the precise regulation mechanism employed by each riboswitch. Our comprehensive analysis of TPP riboswitches in fungi provides insights about the phylogenomic distribution, regulatory patterns and functioning mechanisms of TPP riboswitches across diverse fungal species and provides a useful resource that will enhance the understanding of RNA-based gene regulation in eukaryotes.
Collapse
|
26
|
Dotu I, Adamson SI, Coleman B, Fournier C, Ricart-Altimiras E, Eyras E, Chuang JH. SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data. PLoS Comput Biol 2018; 14:e1006078. [PMID: 29596423 PMCID: PMC5892938 DOI: 10.1371/journal.pcbi.1006078] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Revised: 04/10/2018] [Accepted: 03/05/2018] [Indexed: 12/02/2022] Open
Abstract
RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC. RNA-protein binding is critical to gene regulation, and aberrant RNA-protein interactions play a role in a wide variety of diseases. However, molecular understanding of these interactions remains limited because of the difficulty of ascertaining the motifs that bind each protein. To address this challenge, we have developed a novel algorithm, SARNAclust, to computationally identify combined structure/sequence motifs from immunoprecipitation data. SARNAclust can deconvolve multiple motifs simultaneously and determine the importance of specific features through a graph kernel and bulge graph formalism. We have verified SARNAclust to be effective on synthetic motif data and also tested it on ENCODE eCLIP datasets, identifying known motifs and novel predictions. We have experimentally validated SARNAclust for two proteins, SLBP and ILF3, using RNA Bind-n-Seq measurements. Applying SARNAclust to ENCODE data provides new evidence for previously unknown regulatory interactions, notably splicing co-regulation by ILF3 and the splicing factor hnRNPC.
Collapse
Affiliation(s)
- Ivan Dotu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM)–Pompeu Fabra University (UPF), Barcelona, Spain
| | - Scott I. Adamson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
- UCONN Health, Department of Genetics and Genome Sciences, Farmington, CT, United States of America
| | - Benjamin Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
| | - Cyril Fournier
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
| | - Emma Ricart-Altimiras
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM)–Pompeu Fabra University (UPF), Barcelona, Spain
| | - Eduardo Eyras
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM)–Pompeu Fabra University (UPF), Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
| | - Jeffrey H. Chuang
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
- UCONN Health, Department of Genetics and Genome Sciences, Farmington, CT, United States of America
- * E-mail:
| |
Collapse
|
27
|
Ledda M, Aviran S. PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures. Genome Biol 2018; 19:28. [PMID: 29495968 PMCID: PMC5833111 DOI: 10.1186/s13059-018-1399-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 01/30/2018] [Indexed: 02/08/2023] Open
Abstract
Establishing a link between RNA structure and function remains a great challenge in RNA biology. The emergence of high-throughput structure profiling experiments is revolutionizing our ability to decipher structure, yet principled approaches for extracting information on structural elements directly from these data sets are lacking. We present PATTERNA, an unsupervised pattern recognition algorithm that rapidly mines RNA structure motifs from profiling data. We demonstrate that PATTERNA detects motifs with an accuracy comparable to commonly used thermodynamic models and highlight its utility in automating data-directed structure modeling from large data sets. PATTERNA is versatile and compatible with diverse profiling techniques and experimental conditions.
Collapse
Affiliation(s)
- Mirko Ledda
- Department of Biomedical Engineering and Genome Center, UC Davis, 1 Shields Ave, Davis, 95616 USA
- Integrative Genetics and Genomics Graduate Group, UC Davis, 1 Shields Ave, Davis, 95616 USA
| | - Sharon Aviran
- Department of Biomedical Engineering and Genome Center, UC Davis, 1 Shields Ave, Davis, 95616 USA
| |
Collapse
|
28
|
Smith MA, Seemann SE, Quek XC, Mattick JS. DotAligner: identification and clustering of RNA structure motifs. Genome Biol 2017; 18:244. [PMID: 29284541 PMCID: PMC5747123 DOI: 10.1186/s13059-017-1371-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 12/05/2017] [Indexed: 01/01/2023] Open
Abstract
The diversity of processed transcripts in eukaryotic genomes poses a challenge for the classification of their biological functions. Sparse sequence conservation in non-coding sequences and the unreliable nature of RNA structure predictions further exacerbate this conundrum. Here, we describe a computational method, DotAligner, for the unsupervised discovery and classification of homologous RNA structure motifs from a set of sequences of interest. Our approach outperforms comparable algorithms at clustering known RNA structure families, both in speed and accuracy. It identifies clusters of known and novel structure motifs from ENCODE immunoprecipitation data for 44 RNA-binding proteins.
Collapse
Affiliation(s)
- Martin A Smith
- RNA Biology and Plasticity Group, Garvan Institute of Medical Research, 384 Victoria Street, Sydney, NSW 2010, Australia. .,St Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW 2010, Australia.
| | - Stefan E Seemann
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Groennegaardsvej 3, Frederiksberg, 1870, Denmark.,Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, DK-1870, Frederiksberg, Denmark
| | - Xiu Cheng Quek
- RNA Biology and Plasticity Group, Garvan Institute of Medical Research, 384 Victoria Street, Sydney, NSW 2010, Australia.,St Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW 2010, Australia
| | - John S Mattick
- RNA Biology and Plasticity Group, Garvan Institute of Medical Research, 384 Victoria Street, Sydney, NSW 2010, Australia.,St Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW 2010, Australia
| |
Collapse
|
29
|
Arslan AN, Anandan J, Fry E, Monschke K, Ganneboina N, Bowerman J. Efficient RNA structure comparison algorithms. J Bioinform Comput Biol 2017; 15:1740009. [DOI: 10.1142/s0219720017400091] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Recently proposed relative addressing-based ([Formula: see text]) RNA secondary structure representation has important features by which an RNA structure database can be stored into a suffix array. A fast substructure search algorithm has been proposed based on binary search on this suffix array. Using this substructure search algorithm, we present a fast algorithm that finds the largest common substructure of given multiple RNA structures in [Formula: see text] format. The multiple RNA structure comparison problem is NP-hard in its general formulation. We introduced a new problem for comparing multiple RNA structures. This problem has more strict similarity definition and objective, and we propose an algorithm that solves this problem efficiently. We also develop another comparison algorithm that iteratively calls this algorithm to locate nonoverlapping large common substructures in compared RNAs. With the new resulting tools, we improved the RNASSAC website (linked from http://faculty.tamuc.edu/aarslan ). This website now also includes two drawing tools: one specialized for preparing RNA substructures that can be used as input by the search tool, and another one for automatically drawing the entire RNA structure from a given structure sequence.
Collapse
Affiliation(s)
- Abdullah N. Arslan
- Department of Computer Science, Texas A&M University-Commerce, Commerce, TX 75428, USA
| | - Jithendar Anandan
- Department of Computer Science, Texas A&M University-Commerce, Commerce, TX 75428, USA
| | - Eric Fry
- Department of Computer Science, Texas A&M University-Commerce, Commerce, TX 75428, USA
| | - Keith Monschke
- Department of Computer Science, Texas A&M University-Commerce, Commerce, TX 75428, USA
| | - Nitin Ganneboina
- Department of Computer Science, Texas A&M University-Commerce, Commerce, TX 75428, USA
| | - Jason Bowerman
- Department of Computer Science, Texas A&M University-Commerce, Commerce, TX 75428, USA
| |
Collapse
|
30
|
Kato Y, Gorodkin J, Havgaard JH. Alignment-free comparative genomic screen for structured RNAs using coarse-grained secondary structure dot plots. BMC Genomics 2017; 18:935. [PMID: 29197323 PMCID: PMC5712110 DOI: 10.1186/s12864-017-4309-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Accepted: 11/15/2017] [Indexed: 01/01/2023] Open
Abstract
Background Structured non-coding RNAs play many different roles in the cells, but the annotation of these RNAs is lacking even within the human genome. The currently available computational tools are either too computationally heavy for use in full genomic screens or rely on pre-aligned sequences. Methods Here we present a fast and efficient method, DotcodeR, for detecting structurally similar RNAs in genomic sequences by comparing their corresponding coarse-grained secondary structure dot plots at string level. This allows us to perform an all-against-all scan of all window pairs from two genomes without alignment. Results Our computational experiments with simulated data and real chromosomes demonstrate that the presented method has good sensitivity. Conclusions DotcodeR can be useful as a pre-filter in a genomic comparative scan for structured RNAs. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4309-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yuki Kato
- Department of RNA Biology and Neuroscience, Graduate School of Medicine, Osaka University, 2-2 Yamadaoka, Suita, 565-0871, Japan. .,Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Groennegaardsvej 3, Frederiksberg, 1870, Denmark.
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Groennegaardsvej 3, Frederiksberg, 1870, Denmark
| | - Jakob Hull Havgaard
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Groennegaardsvej 3, Frederiksberg, 1870, Denmark.
| |
Collapse
|
31
|
Weinberg Z, Lünse CE, Corbino KA, Ames TD, Nelson JW, Roth A, Perkins KR, Sherlock ME, Breaker RR. Detection of 224 candidate structured RNAs by comparative analysis of specific subsets of intergenic regions. Nucleic Acids Res 2017; 45:10811-10823. [PMID: 28977401 PMCID: PMC5737381 DOI: 10.1093/nar/gkx699] [Citation(s) in RCA: 91] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Accepted: 08/02/2017] [Indexed: 11/29/2022] Open
Abstract
The discovery of structured non-coding RNAs (ncRNAs) in bacteria can reveal new facets of biology and biochemistry. Comparative genomics analyses executed by powerful computer algorithms have successfully been used to uncover many novel bacterial ncRNA classes in recent years. However, this general search strategy favors the discovery of more common ncRNA classes, whereas progressively rarer classes are correspondingly more difficult to identify. In the current study, we confront this problem by devising several methods to select subsets of intergenic regions that can concentrate these rare RNA classes, thereby increasing the probability that comparative sequence analysis approaches will reveal their existence. By implementing these methods, we discovered 224 novel ncRNA classes, which include ROOL RNA, an RNA class averaging 581 nt and present in multiple phyla, several highly conserved and widespread ncRNA classes with properties that suggest sophisticated biochemical functions and a multitude of putative cis-regulatory RNA classes involved in a variety of biological processes. We expect that further research on these newly found RNA classes will reveal additional aspects of novel biology, and allow for greater insights into the biochemistry performed by ncRNAs.
Collapse
Affiliation(s)
- Zasha Weinberg
- HHMI, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| | - Christina E Lünse
- Department of Molecular, Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| | - Keith A Corbino
- HHMI, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| | - Tyler D Ames
- Department of Molecular, Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| | - James W Nelson
- Department of Molecular, Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| | - Adam Roth
- HHMI, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| | - Kevin R Perkins
- Department of Molecular, Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| | - Madeline E Sherlock
- Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| | - Ronald R Breaker
- HHMI, Yale University, Box 208103, New Haven, CT 06520-8103, USA.,Department of Molecular, Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520-8103, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| |
Collapse
|
32
|
Fallmann J, Will S, Engelhardt J, Grüning B, Backofen R, Stadler PF. Recent advances in RNA folding. J Biotechnol 2017; 261:97-104. [DOI: 10.1016/j.jbiotec.2017.07.007] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Revised: 07/02/2017] [Accepted: 07/04/2017] [Indexed: 12/23/2022]
|
33
|
Fakhry CT, Kulkarni P, Chen P, Kulkarni R, Zarringhalam K. Prediction of bacterial small RNAs in the RsmA (CsrA) and ToxT pathways: a machine learning approach. BMC Genomics 2017; 18:645. [PMID: 28830349 PMCID: PMC5568370 DOI: 10.1186/s12864-017-4057-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 08/14/2017] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Small RNAs (sRNAs) constitute an important class of post-transcriptional regulators that control critical cellular processes in bacteria. Recent research using high-throughput transcriptomic approaches has led to a dramatic increase in the discovery of bacterial sRNAs. However, it is generally believed that the currently identified sRNAs constitute a limited subset of the bacterial sRNA repertoire. In several cases, sRNAs belonging to a specific class are already known and the challenge is to identify additional sRNAs belonging to the same class. In such cases, machine-learning approaches can be used to predict novel sRNAs in a given class. METHODS In this work, we develop novel bioinformatics approaches that integrate sequence and structure-based features to train machine-learning models for the discovery of bacterial sRNAs. We show that features derived from recurrent structural motifs in the ensemble of low energy secondary structures can distinguish the RNA classes with high accuracy. RESULTS We apply this approach to predict new members in two broad classes of bacterial small RNAs: 1) sRNAs that bind to the RNA-binding protein RsmA/CsrA in diverse bacterial species and 2) sRNAs regulated by the master regulator of virulence, ToxT, in Vibrio cholerae. CONCLUSION The involvement of sRNAs in bacterial adaptation to changing environments is an increasingly recurring theme in current research in microbiology. It is likely that future research, combining experimental and computational approaches, will discover many more examples of sRNAs as components of critical regulatory pathways in bacteria. We have developed a novel approach for prediction of small RNA regulators in important bacterial pathways. This approach can be applied to specific classes of sRNAs for which several members have been identified and the challenge is to identify additional sRNAs.
Collapse
Affiliation(s)
- Carl Tony Fakhry
- Department of Computer Science, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, 02125 MA USA
| | - Prajna Kulkarni
- Department of Physics, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, 02125 MA USA
| | - Ping Chen
- Department of Engineering, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, 02125 MA USA
| | - Rahul Kulkarni
- Department of Physics, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, 02125 MA USA
| | - Kourosh Zarringhalam
- Department of Mathematics, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, 02125 MA USA
| |
Collapse
|
34
|
Miladi M, Junge A, Costa F, Seemann SE, Havgaard JH, Gorodkin J, Backofen R. RNAscClust: clustering RNA sequences using structure conservation and graph based motifs. Bioinformatics 2017; 33:2089-2096. [PMID: 28334186 PMCID: PMC5870858 DOI: 10.1093/bioinformatics/btx114] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Revised: 12/22/2016] [Accepted: 02/23/2017] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Clustering RNA sequences with common secondary structure is an essential step towards studying RNA function. Whereas structural RNA alignment strategies typically identify common structure for orthologous structured RNAs, clustering seeks to group paralogous RNAs based on structural similarities. However, existing approaches for clustering paralogous RNAs, do not take the compensatory base pair changes obtained from structure conservation in orthologous sequences into account. RESULTS Here, we present RNAscClust , the implementation of a new algorithm to cluster a set of structured RNAs taking their respective structural conservation into account. For a set of multiple structural alignments of RNA sequences, each containing a paralog sequence included in a structural alignment of its orthologs, RNAscClust computes minimum free-energy structures for each sequence using conserved base pairs as prior information for the folding. The paralogs are then clustered using a graph kernel-based strategy, which identifies common structural features. We show that the clustering accuracy clearly benefits from an increasing degree of compensatory base pair changes in the alignments. AVAILABILITY AND IMPLEMENTATION RNAscClust is available at http://www.bioinf.uni-freiburg.de/Software/RNAscClust . CONTACT gorodkin@rth.dk or backofen@informatik.uni-freiburg.de. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Milad Miladi
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
| | - Alexander Junge
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Fabrizio Costa
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
| | - Stefan E Seemann
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Jakob Hull Havgaard
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
- Center for Biological Signalling Studies (BIOSS), Cluster of Excellence, University of Freiburg, Freiburg im Breisgau, Germany
| |
Collapse
|
35
|
Backofen R, Engelhardt J, Erxleben A, Fallmann J, Grüning B, Ohler U, Rajewsky N, Stadler PF. RNA-bioinformatics: Tools, services and databases for the analysis of RNA-based regulation. J Biotechnol 2017; 261:76-84. [PMID: 28554830 DOI: 10.1016/j.jbiotec.2017.05.019] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Revised: 05/20/2017] [Accepted: 05/23/2017] [Indexed: 12/26/2022]
Abstract
The importance of RNA-based regulation is becoming more and more evident. Genome-wide sequencing efforts have shown that the majority of the DNA in eukaryotic genomes is transcribed. Advanced high-throughput techniques like CLIP for the genome-wide detection of RNA-protein interactions have shown that post-transcriptional regulation by RNA-binding proteins matches the complexity of transcriptional regulation. The need for a specialized and integrated analysis of RNA-based data has led to the foundation of the RNA Bioinformatics Center (RBC) within the German Network of Bioinformatics Infrastructure (de.NBI). This paper describes the tools, services and databases provided by the RBC, and shows example applications. Furthermore, we have setup an RNA workbench within the Galaxy framework. For an easy dissemination, we offer a virtualized version of Galaxy (via Galaxy Docker) enabling other groups to use our RNA workbench in a very simple way.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, D-79110 Freiburg, Germany; BIOSS Centre for Biological Signaling Studies, University of Freiburg, Schänzlestr. 18, 79104 Freiburg, Germany.
| | - Jan Engelhardt
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
| | - Anika Erxleben
- Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, D-79110 Freiburg, Germany
| | - Jörg Fallmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
| | - Björn Grüning
- Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, D-79110 Freiburg, Germany
| | - Uwe Ohler
- Max-Delbrück-Centrum (MDC), Robert-Rössle-Str. 10, D-13092 Berlin, Germany
| | - Nikolaus Rajewsky
- Max-Delbrück-Centrum (MDC), Robert-Rössle-Str. 10, D-13092 Berlin, Germany
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria; RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany; Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| |
Collapse
|
36
|
Naghdi MR, Smail K, Wang JX, Wade F, Breaker RR, Perreault J. Search for 5'-leader regulatory RNA structures based on gene annotation aided by the RiboGap database. Methods 2017; 117:3-13. [PMID: 28279853 DOI: 10.1016/j.ymeth.2017.02.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 02/16/2017] [Accepted: 02/27/2017] [Indexed: 01/20/2023] Open
Abstract
The discovery of noncoding RNAs (ncRNAs) and their importance for gene regulation led us to develop bioinformatics tools to pursue the discovery of novel ncRNAs. Finding ncRNAs de novo is challenging, first due to the difficulty of retrieving large numbers of sequences for given gene activities, and second due to exponential demands on calculation needed for comparative genomics on a large scale. Recently, several tools for the prediction of conserved RNA secondary structure were developed, but many of them are not designed to uncover new ncRNAs, or are too slow for conducting analyses on a large scale. Here we present various approaches using the database RiboGap as a primary tool for finding known ncRNAs and for uncovering simple sequence motifs with regulatory roles. This database also can be used to easily extract intergenic sequences of eubacteria and archaea to find conserved RNA structures upstream of given genes. We also show how to extend analysis further to choose the best candidate ncRNAs for experimental validation.
Collapse
Affiliation(s)
- Mohammad Reza Naghdi
- INRS - Institut Armand-Frappier, 531 boul des Prairies, Laval (Québec) H7V1B7, Canada
| | - Katia Smail
- INRS - Institut Armand-Frappier, 531 boul des Prairies, Laval (Québec) H7V1B7, Canada
| | - Joy X Wang
- Department of Molecular, Cellular and Developmental Biology and the Howard Hughes Medical Institute, Yale University, P.O. Box 208103, New Haven, CT 06520-8103, United States
| | - Fallou Wade
- INRS - Institut Armand-Frappier, 531 boul des Prairies, Laval (Québec) H7V1B7, Canada
| | - Ronald R Breaker
- Department of Molecular, Cellular and Developmental Biology and the Howard Hughes Medical Institute, Yale University, P.O. Box 208103, New Haven, CT 06520-8103, United States
| | - Jonathan Perreault
- INRS - Institut Armand-Frappier, 531 boul des Prairies, Laval (Québec) H7V1B7, Canada.
| |
Collapse
|
37
|
Li Y, Shi X, Liang Y, Xie J, Zhang Y, Ma Q. RNA-TVcurve: a Web server for RNA secondary structure comparison based on a multi-scale similarity of its triple vector curve representation. BMC Bioinformatics 2017; 18:51. [PMID: 28109252 PMCID: PMC5251234 DOI: 10.1186/s12859-017-1481-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 01/10/2017] [Indexed: 01/10/2023] Open
Abstract
Background RNAs have been found to carry diverse functionalities in nature. Inferring the similarity between two given RNAs is a fundamental step to understand and interpret their functional relationship. The majority of functional RNAs show conserved secondary structures, rather than sequence conservation. Those algorithms relying on sequence-based features usually have limitations in their prediction performance. Hence, integrating RNA structure features is very critical for RNA analysis. Existing algorithms mainly fall into two categories: alignment-based and alignment-free. The alignment-free algorithms of RNA comparison usually have lower time complexity than alignment-based algorithms. Results An alignment-free RNA comparison algorithm was proposed, in which novel numerical representations RNA-TVcurve (triple vector curve representation) of RNA sequence and corresponding secondary structure features are provided. Then a multi-scale similarity score of two given RNAs was designed based on wavelet decomposition of their numerical representation. In support of RNA mutation and phylogenetic analysis, a web server (RNA-TVcurve) was designed based on this alignment-free RNA comparison algorithm. It provides three functional modules: 1) visualization of numerical representation of RNA secondary structure; 2) detection of single-point mutation based on secondary structure; and 3) comparison of pairwise and multiple RNA secondary structures. The inputs of the web server require RNA primary sequences, while corresponding secondary structures are optional. For the primary sequences alone, the web server can compute the secondary structures using free energy minimization algorithm in terms of RNAfold tool from Vienna RNA package. Conclusion RNA-TVcurve is the first integrated web server, based on an alignment-free method, to deliver a suite of RNA analysis functions, including visualization, mutation analysis and multiple RNAs structure comparison. The comparison results with two popular RNA comparison tools, RNApdist and RNAdistance, showcased that RNA-TVcurve can efficiently capture subtle relationships among RNAs for mutation detection and non-coding RNA classification. All the relevant results were shown in an intuitive graphical manner, and can be freely downloaded from this server. RNA-TVcurve, along with test examples and detailed documents, are available at: http://ml.jlu.edu.cn/tvcurve/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1481-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ying Li
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun, 130012, China
| | - Xiaohu Shi
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun, 130012, China
| | - Yanchun Liang
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun, 130012, China.,Zhuhai Laboratory of Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Zhuhai College of Jilin University, Zhuhai, 519041, China
| | - Juan Xie
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, 57007, USA.,Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD, 57007, USA.,BioSNTR, Brookings, SD, USA
| | - Yu Zhang
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China. .,Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun, 130012, China.
| | - Qin Ma
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, 57007, USA. .,Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD, 57007, USA. .,BioSNTR, Brookings, SD, USA.
| |
Collapse
|
38
|
Abstract
Protein-coding RNAs represent only a small fraction of the transcriptional output in higher eukaryotes. The remaining RNA species encompass a broad range of molecular functions and regulatory roles, a consequence of the structural polyvalence of RNA polymers. Albeit several classes of small noncoding RNAs are relatively well characterized, the accessibility of affordable high-throughput sequencing is generating a wealth of novel, unannotated transcripts, especially long noncoding RNAs (lncRNAs) that are derived from genomic regions that are antisense, intronic, intergenic, and overlapping protein-coding loci. Parsing and characterizing the functions of noncoding RNAs-lncRNAs in particular-is one of the great challenges of modern genome biology. Here we discuss concepts and computational methods for the identification of structural domains in lncRNAs from genomic and transcriptomic data. In the first part, we briefly review how to identify RNA structural motifs in individual lncRNAs. In the second part, we describe how to leverage the evolutionary dynamics of structured RNAs in a computationally efficient screen to detect putative functional lncRNA motifs using comparative genomics.
Collapse
Affiliation(s)
- Martin A Smith
- RNA Biology and Plasticity Laboratory, Garvan Institute of Medical Research, 384 Victoria St, Darlinghurst, NSW, 2010, Australia. .,St-Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW, 2052, Australia.
| | - John S Mattick
- RNA Biology and Plasticity Laboratory, Garvan Institute of Medical Research, 384 Victoria St, Darlinghurst, NSW, 2010, Australia.,St-Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW, 2052, Australia
| |
Collapse
|
39
|
Accurate Classification of RNA Structures Using Topological Fingerprints. PLoS One 2016; 11:e0164726. [PMID: 27755571 PMCID: PMC5068708 DOI: 10.1371/journal.pone.0164726] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 09/29/2016] [Indexed: 12/26/2022] Open
Abstract
While RNAs are well known to possess complex structures, functionally similar RNAs often have little sequence similarity. While the exact size and spacing of base-paired regions vary, functionally similar RNAs have pronounced similarity in the arrangement, or topology, of base-paired stems. Furthermore, predicted RNA structures often lack pseudoknots (a crucial aspect of biological activity), and are only partially correct, or incomplete. A topological approach addresses all of these difficulties. In this work we describe each RNA structure as a graph that can be converted to a topological spectrum (RNA fingerprint). The set of subgraphs in an RNA structure, its RNA fingerprint, can be compared with the fingerprints of other RNA structures to identify and correctly classify functionally related RNAs. Topologically similar RNAs can be identified even when a large fraction, up to 30%, of the stems are omitted, indicating that highly accurate structures are not necessary. We investigate the performance of the RNA fingerprint approach on a set of eight highly curated RNA families, with diverse sizes and functions, containing pseudoknots, and with little sequence similarity-an especially difficult test set. In spite of the difficult test set, the RNA fingerprint approach is very successful (ROC AUC > 0.95). Due to the inclusion of pseudoknots, the RNA fingerprint approach both covers a wider range of possible structures than methods based only on secondary structure, and its tolerance for incomplete structures suggests that it can be applied even to predicted structures. Source code is freely available at https://github.rcac.purdue.edu/mgribsko/XIOS_RNA_fingerprint.
Collapse
|
40
|
Corrado G, Tebaldi T, Costa F, Frasconi P, Passerini A. RNAcommender: genome-wide recommendation of RNA-protein interactions. Bioinformatics 2016; 32:3627-3634. [PMID: 27503225 DOI: 10.1093/bioinformatics/btw517] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Revised: 07/29/2016] [Accepted: 08/02/2016] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Information about RNA-protein interactions is a vital pre-requisite to tackle the dissection of RNA regulatory processes. Despite the recent advances of the experimental techniques, the currently available RNA interactome involves a small portion of the known RNA binding proteins. The importance of determining RNA-protein interactions, coupled with the scarcity of the available information, calls for in silico prediction of such interactions. RESULTS We present RNAcommender, a recommender system capable of suggesting RNA targets to unexplored RNA binding proteins, by propagating the available interaction information taking into account the protein domain composition and the RNA predicted secondary structure. Our results show that RNAcommender is able to successfully suggest RNA interactors for RNA binding proteins using little or no interaction evidence. RNAcommender was tested on a large dataset of human RBP-RNA interactions, showing a good ranking performance (average AUC ROC of 0.75) and significant enrichment of correct recommendations for 75% of the tested RBPs. RNAcommender can be a valid tool to assist researchers in identifying potential interacting candidates for the majority of RBPs with uncharacterized binding preferences. AVAILABILITY AND IMPLEMENTATION The software is freely available at http://rnacommender.disi.unitn.it CONTACT: gianluca.corrado@unitn.it or andrea.passerini@unitn.itSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gianluca Corrado
- Department of Information Engineering and Computer Science, University of Trento, Trento 38123, Italy
| | - Toma Tebaldi
- Centre for Integrative Biology, University of Trento, Trento 38123, Italy
| | - Fabrizio Costa
- Department of Computer Science, Albert-Ludwigs-Universitaet Freiburg, Freiburg 79110, Germany
| | - Paolo Frasconi
- Dipartimento di Ingegneria dell'Informazione, University of Florence, Florence 50139, Italy
| | - Andrea Passerini
- Department of Information Engineering and Computer Science, University of Trento, Trento 38123, Italy
| |
Collapse
|
41
|
Biswas AK, Gao JX. PR2S2Clust: Patched RNA-seq read segments' structure-oriented clustering. J Bioinform Comput Biol 2016; 14:1650027. [PMID: 27455882 DOI: 10.1142/s021972001650027x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
RNA-seq, the next generation sequencing platform, enables researchers to explore deep into the transcriptome of organisms, such as identifying functional non-coding RNAs (ncRNAs), and quantify their expressions on tissues. The functions of ncRNAs are mostly related to their secondary structures. Thus by exploring the clustering in terms of structural profiles of the corresponding read-segments would be essential and this fuels in our motivation behind this research. In this manuscript we proposed PR2S2Clust, Patched RNA-seq Read Segments' Structure-oriented Clustering, which is an analysis platform to extract features to prepare the secondary structure profiles of the RNA-seq read segments. It provides a strategy to employ the profiles to annotate the segments into ncRNA classes using several clustering strategies. The system considers seven pairwise structural distance metrics by considering short-read mappings onto each structure, which we term as the "patched structure" while clustering the segments. In this regard, we show applications of both classical and ensemble clusterings of the partitional and hierarchical variations. Extensive real-world experiments over three publicly available RNA-seq datasets and a comparative analysis over four competitive systems confirm the effectiveness and superiority of the proposed system. The source codes and dataset of PR2S2Clust are available at the http://biomecis.uta.edu/~ashis/res/PR2S2Clust-suppl/ .
Collapse
Affiliation(s)
- Ashis Kumer Biswas
- 1 Department of Computer Science and Engineering, The University of Texas at Arlington, Texas 76019, USA
| | - Jean X Gao
- 1 Department of Computer Science and Engineering, The University of Texas at Arlington, Texas 76019, USA
| |
Collapse
|
42
|
Hu X, Wu Y, Lu ZJ, Yip KY. Analysis of sequencing data for probing RNA secondary structures and protein–RNA binding in studying posttranscriptional regulations. Brief Bioinform 2015; 17:1032-1043. [DOI: 10.1093/bib/bbv106] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Revised: 10/11/2015] [Indexed: 11/12/2022] Open
|
43
|
RC3H1 post-transcriptionally regulates A20 mRNA and modulates the activity of the IKK/NF-κB pathway. Nat Commun 2015; 6:7367. [PMID: 26170170 PMCID: PMC4510711 DOI: 10.1038/ncomms8367] [Citation(s) in RCA: 81] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Accepted: 04/30/2015] [Indexed: 12/20/2022] Open
Abstract
The RNA-binding protein RC3H1 (also known as ROQUIN) promotes TNFα mRNA decay via a 3′UTR constitutive decay element (CDE). Here we applied PAR-CLIP to human RC3H1 to identify ∼3,800 mRNA targets with >16,000 binding sites. A large number of sites are distinct from the consensus CDE and revealed a structure-sequence motif with U-rich sequences embedded in hairpins. RC3H1 binds preferentially short-lived and DNA damage-induced mRNAs, indicating a role of this RNA-binding protein in the post-transcriptional regulation of the DNA damage response. Intriguingly, RC3H1 affects expression of the NF-κB pathway regulators such as IκBα and A20. RC3H1 uses ROQ and Zn-finger domains to contact a binding site in the A20 3′UTR, demonstrating a not yet recognized mode of RC3H1 binding. Knockdown of RC3H1 resulted in increased A20 protein expression, thereby interfering with IκB kinase and NF-κB activities, demonstrating that RC3H1 can modulate the activity of the IKK/NF-κB pathway. The RNA-binding protein RC3H1/ROQUIN1 promotes the degradation of mRNA by binding to a consensus CDE present in the 3′UTR. Here the authors expand the set of consensus sequences through which RCH31 binds and regulates mRNA encoding members of the DNA damage response and IKK/NF-κB pathway.
Collapse
|
44
|
Middleton SA, Kim J. NoFold: RNA structure clustering without folding or alignment. RNA (NEW YORK, N.Y.) 2014; 20:1671-1683. [PMID: 25234928 PMCID: PMC4201820 DOI: 10.1261/rna.041913.113] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Accepted: 07/28/2014] [Indexed: 06/03/2023]
Abstract
Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function-for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcripts may therefore yield significant insight into their binding partners and mechanism of regulation. However, as most methods for clustering structures are based on folding individual sequences or doing many pairwise alignments, this results in a tradeoff between speed and accuracy that can be problematic for large-scale data sets. Here we describe a novel method for comparing and characterizing RNA secondary structures that does not require folding or pairwise alignment of the input sequences. Our method uses the idea of constructing a distance function between two objects by their respective distances to a collection of empirical examples or models, which in our case consists of 1973 Rfam family covariance models. Using this as a basis for measuring structural similarity, we developed a clustering pipeline called NoFold to automatically identify and annotate structure motifs within large sequence data sets. We demonstrate that NoFold can simultaneously identify multiple structure motifs with an average sensitivity of 0.80 and precision of 0.98 and generally exceeds the performance of existing methods. We also perform a cross-validation analysis of the entire set of Rfam families, achieving an average sensitivity of 0.57. We apply NoFold to identify motifs enriched in dendritically localized transcripts and report 213 enriched motifs, including both known and novel structures.
Collapse
Affiliation(s)
- Sarah A Middleton
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Junhyong Kim
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
45
|
Videm P, Rose D, Costa F, Backofen R. BlockClust: efficient clustering and classification of non-coding RNAs from short read RNA-seq profiles. ACTA ACUST UNITED AC 2014; 30:i274-82. [PMID: 24931994 PMCID: PMC4058930 DOI: 10.1093/bioinformatics/btu270] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Summary: Non-coding RNAs (ncRNAs) play a vital role in many cellular processes such as RNA splicing, translation, gene regulation. However the vast majority of ncRNAs still have no functional annotation. One prominent approach for putative function assignment is clustering of transcripts according to sequence and secondary structure. However sequence information is changed by post-transcriptional modifications, and secondary structure is only a proxy for the true 3D conformation of the RNA polymer. A different type of information that does not suffer from these issues and that can be used for the detection of RNA classes, is the pattern of processing and its traces in small RNA-seq reads data. Here we introduce BlockClust, an efficient approach to detect transcripts with similar processing patterns. We propose a novel way to encode expression profiles in compact discrete structures, which can then be processed using fast graph-kernel techniques. We perform both unsupervised clustering and develop family specific discriminative models; finally we show how the proposed approach is scalable, accurate and robust across different organisms, tissues and cell lines. Availability: The whole BlockClust galaxy workflow including all tool dependencies is available at http://toolshed.g2.bx.psu.edu/view/rnateam/blockclust_workflow. Contact:backofen@informatik.uni-freiburg.de; costa@informatik.uni-freiburg.de Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pavankumar Videm
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Munich Leukemia Laboratory (MLL), Munich, Centre for Biological Signalling Studies (BIOSS), Centre for Biological Systems Analysis (ZBSA), University of Freiburg, Germany and Centre for Non-coding RNA in Technology and Health, Bagsvaerd, Denmark
| | - Dominic Rose
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Munich Leukemia Laboratory (MLL), Munich, Centre for Biological Signalling Studies (BIOSS), Centre for Biological Systems Analysis (ZBSA), University of Freiburg, Germany and Centre for Non-coding RNA in Technology and Health, Bagsvaerd, DenmarkBioinformatics Group, Department of Computer Science, University of Freiburg, Munich Leukemia Laboratory (MLL), Munich, Centre for Biological Signalling Studies (BIOSS), Centre for Biological Systems Analysis (ZBSA), University of Freiburg, Germany and Centre for Non-coding RNA in Technology and Health, Bagsvaerd, Denmark
| | - Fabrizio Costa
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Munich Leukemia Laboratory (MLL), Munich, Centre for Biological Signalling Studies (BIOSS), Centre for Biological Systems Analysis (ZBSA), University of Freiburg, Germany and Centre for Non-coding RNA in Technology and Health, Bagsvaerd, Denmark
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Munich Leukemia Laboratory (MLL), Munich, Centre for Biological Signalling Studies (BIOSS), Centre for Biological Systems Analysis (ZBSA), University of Freiburg, Germany and Centre for Non-coding RNA in Technology and Health, Bagsvaerd, DenmarkBioinformatics Group, Department of Computer Science, University of Freiburg, Munich Leukemia Laboratory (MLL), Munich, Centre for Biological Signalling Studies (BIOSS), Centre for Biological Systems Analysis (ZBSA), University of Freiburg, Germany and Centre for Non-coding RNA in Technology and Health, Bagsvaerd, DenmarkBioinformatics Group, Department of Computer Science, University of Freiburg, Munich Leukemia Laboratory (MLL), Munich, Centre for Biological Signalling Studies (BIOSS), Centre for Biological Systems Analysis (ZBSA), University of Freiburg, Germany and Centre for Non-coding RNA in Technology and Health, Bagsvaerd, DenmarkBioinformatics Group, Department of Computer Science, University of Freiburg, Munich Leukemia Laboratory (MLL), Munich, Centre for Biological Signalling Studies (BIOSS), Centre for Biological Systems Analysis (ZBSA), University of Freiburg, Germany and Centre for Non-coding RNA in Technology and Health, Bagsvaerd, Denmark
| |
Collapse
|
46
|
Saberi Fathi SM, White DT, Tuszynski JA. Geometrical comparison of two protein structures using Wigner-D functions. Proteins 2014; 82:2756-69. [PMID: 25043646 DOI: 10.1002/prot.24640] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2013] [Revised: 05/20/2014] [Accepted: 06/18/2014] [Indexed: 12/13/2022]
Abstract
In this article, we develop a quantitative comparison method for two arbitrary protein structures. This method uses a root-mean-square deviation characterization and employs a series expansion of the protein's shape function in terms of the Wigner-D functions to define a new criterion, which is called a "similarity value." We further demonstrate that the expansion coefficients for the shape function obtained with the help of the Wigner-D functions correspond to structure factors. Our method addresses the common problem of comparing two proteins with different numbers of atoms. We illustrate it with a worked example.
Collapse
Affiliation(s)
- S M Saberi Fathi
- Department of Physics, Ferdowsi University of Mashhad, Mashhad, Iran
| | | | | |
Collapse
|
47
|
Backofen R, Vogel T. Biological and bioinformatical approaches to study crosstalk of long-non-coding RNAs and chromatin-modifying proteins. Cell Tissue Res 2014; 356:507-26. [PMID: 24820400 DOI: 10.1007/s00441-014-1885-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2014] [Accepted: 03/27/2014] [Indexed: 02/04/2023]
Abstract
Long-non-coding RNA (lncRNA) regulates gene expression through transcriptional and epigenetic regulation as well as alternative splicing in the nucleus. In addition, regulation is achieved at the levels of mRNA translation, storage and degradation in the cytoplasm. During recent years, several studies have described the interaction of lncRNAs with enzymes that confer so-called epigenetic modifications, such as DNA methylation, histone modifications and chromatin structure or remodelling. LncRNA interaction with chromatin-modifying enzymes (CME) is an emerging field that confers another layer of complexity in transcriptional regulation. Given that CME-lncRNA interactions have been identified in many biological processes, ranging from development to disease, comprehensive understanding of underlying mechanisms is important to inspire basic and translational research in the future. In this review, we highlight recent findings to extend our understanding about the functional interdependencies between lncRNAs and CMEs that activate or repress gene expression. We focus on recent highlights of molecular and functional roles for CME-lncRNAs and provide an interdisciplinary overview of recent technical and methodological developments that have improved biological and bioinformatical approaches for detection and functional studies of CME-lncRNA interaction.
Collapse
Affiliation(s)
- Rolf Backofen
- Institute of Computer Science, Albert-Ludwigs-University, Freiburg, Germany
| | | |
Collapse
|
48
|
Corrado G, Tebaldi T, Bertamini G, Costa F, Quattrone A, Viero G, Passerini A. PTRcombiner: mining combinatorial regulation of gene expression from post-transcriptional interaction maps. BMC Genomics 2014; 15:304. [PMID: 24758252 PMCID: PMC4234518 DOI: 10.1186/1471-2164-15-304] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2014] [Accepted: 04/02/2014] [Indexed: 02/07/2023] Open
Abstract
Background The progress in mapping RNA-protein and RNA-RNA interactions at the transcriptome-wide level paves the way to decipher possible combinatorial patterns embedded in post-transcriptional regulation of gene expression. Results Here we propose an innovative computational tool to extract clusters of mRNA trans-acting co-regulators (RNA binding proteins and non-coding RNAs) from pairwise interaction annotations. In addition the tool allows to analyze the binding site similarity of co-regulators belonging to the same cluster, given their positional binding information. The tool has been tested on experimental collections of human and yeast interactions, identifying modules that coordinate functionally related messages. Conclusions This tool is an original attempt to uncover combinatorial patterns using all the post-transcriptional interaction data available so far. PTRcombiner is available at http://disi.unitn.it/~passerini/software/PTRcombiner/.
Collapse
Affiliation(s)
| | | | | | | | | | - Gabriella Viero
- Department of Information Engineering and Computer Science (DISI), University of Trento, 38123 Trento, Italy.
| | | |
Collapse
|
49
|
Backofen R, Amman F, Costa F, Findeiß S, Richter AS, Stadler PF. Bioinformatics of prokaryotic RNAs. RNA Biol 2014; 11:470-83. [PMID: 24755880 PMCID: PMC4152356 DOI: 10.4161/rna.28647] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Revised: 03/17/2014] [Accepted: 03/25/2014] [Indexed: 02/02/2023] Open
Abstract
The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group; Department of Computer Science; University of Freiburg; Georges-Köhler-Allee 106; D-79110 Freiburg, Germany
- Center for non-coding RNA in Technology and Health; University of Copenhagen; Grønnegårdsvej 3; DK-1870 Frederiksberg C, Denmark
| | - Fabian Amman
- Institute for Theoretical Chemistry; University of Vienna; Währingerstraße 17; A-1090 Wien, Austria
- Bioinformatics Group; Department of Computer Science, and Interdisciplinary Center for Bioinformatics; University of Leipzig; Härtelstraße 16-18; D-04107 Leipzig, Germany
| | - Fabrizio Costa
- Bioinformatics Group; Department of Computer Science; University of Freiburg; Georges-Köhler-Allee 106; D-79110 Freiburg, Germany
| | - Sven Findeiß
- Institute for Theoretical Chemistry; University of Vienna; Währingerstraße 17; A-1090 Wien, Austria
- Bioinformatics and Computational Biology Research Group; University of Vienna; Währingerstraße 29; A-1090 Wien, Austria
| | - Andreas S Richter
- Bioinformatics Group; Department of Computer Science; University of Freiburg; Georges-Köhler-Allee 106; D-79110 Freiburg, Germany
- Max Planck Institute of Immunobiology and Epigenetics; Stübeweg 51; D-79108 Freiburg, Germany
| | - Peter F Stadler
- Center for non-coding RNA in Technology and Health; University of Copenhagen; Grønnegårdsvej 3; DK-1870 Frederiksberg C, Denmark
- Institute for Theoretical Chemistry; University of Vienna; Währingerstraße 17; A-1090 Wien, Austria
- Bioinformatics Group; Department of Computer Science, and Interdisciplinary Center for Bioinformatics; University of Leipzig; Härtelstraße 16-18; D-04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences; Inselstraße 22; D-04103 Leipzig, Germany
- Fraunhofer Institute for Cell Therapy and Immunology – IZI; Perlickstraße 1; D-04103 Leipzig, Germany
- Santa Fe Institute; Santa Fe, NM USA
| |
Collapse
|
50
|
Maticzka D, Lange SJ, Costa F, Backofen R. GraphProt: modeling binding preferences of RNA-binding proteins. Genome Biol 2014; 15:R17. [PMID: 24451197 PMCID: PMC4053806 DOI: 10.1186/gb-2014-15-1-r17] [Citation(s) in RCA: 182] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 01/22/2014] [Indexed: 12/01/2022] Open
Abstract
We present GraphProt, a computational framework for learning sequence- and structure-binding preferences of RNA-binding proteins (RBPs) from high-throughput experimental data. We benchmark GraphProt, demonstrating that the modeled binding preferences conform to the literature, and showcase the biological relevance and two applications of GraphProt models. First, estimated binding affinities correlate with experimental measurements. Second, predicted Ago2 targets display higher levels of expression upon Ago2 knockdown, whereas control targets do not. Computational binding models, such as those provided by GraphProt, are essential for predicting RBP binding sites and affinities in all tissues. GraphProt is freely available at http://www.bioinf.uni-freiburg.de/Software/GraphProt.
Collapse
|