1
|
Lima DDS, Amichi LJA, Fernandez MA, Constantino AA, Seixas FAV. NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:557-565. [PMID: 34826297 DOI: 10.1109/tcbb.2021.3131136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Short non-coding RNAs (sncRNAs) are involved in multiple cellular processes and can be divided into dozens of classes. Among such classes, Y RNAs have been gaining attention, being essential factors for the initiation of DNA replication on vertebrates, as well as potential tumor biomarkers. Homologs have also been described in nematodes and insects, as well as related sequences in bacteria. Methods capable of accurately predicting Y RNA transcripts are lacking. In this work, we developed an attention-based LSTM network and built a classification model able to classify sncRNAs (including Y RNA) directly from nucleotide sequences. A dataset consisting of 45,447 sncRNA sequences, from a wide range of organisms, obtained from Rfam 14.3 was built. Performance evaluation demonstrated that our proposed method, NCYPred (Non-Coding/Y RNA Prediction), can accurately predict Y RNA sequences and their homologs, as well as 11 additional classes, achieving results comparable with state-of-the-art methods. We also demonstrate that applying t-SNE on learned sequence representations could be useful for sequence analysis. Our model is freely available as a web-server (https://www.gpea.uem.br/ncypred/).
Collapse
|
2
|
Geles K, Palumbo D, Sellitto A, Giurato G, Cianflone E, Marino F, Torella D, Mirici Cappa V, Nassa G, Tarallo R, Weisz A, Rizzo F. WIND (Workflow for pIRNAs aNd beyonD): a strategy for in-depth analysis of small RNA-seq data. F1000Res 2021; 10:1. [PMID: 34316353 PMCID: PMC8276195 DOI: 10.12688/f1000research.27868.3] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/02/2021] [Indexed: 12/15/2022] Open
Abstract
Current bioinformatics workflows for PIWI-interacting RNA (piRNA) analysis focus primarily on germline-derived piRNAs and piRNA-clusters. Frequently, they suffer from outdated piRNA databases, questionable quantification methods, and lack of reproducibility. Often, pipelines specific to miRNA analysis are used for the piRNA research
in silico. Furthermore, the absence of a well-established database for piRNA annotation, as for miRNA, leads to uniformity issues between studies and generates confusion for data analysts and biologists. For these reasons, we have developed WIND (
Workflow for p
IRNAs a
Nd beyon
D), a bioinformatics workflow that addresses the crucial issue of piRNA annotation, thereby allowing a reliable analysis of small RNA sequencing data for the identification of piRNAs and other small non-coding RNAs (sncRNAs) that in the past have been incorrectly classified as piRNAs. WIND allows the creation of a comprehensive annotation track of sncRNAs combining information available in RNAcentral, with piRNA sequences from piRNABank, the first database dedicated to piRNA annotation. WIND was built with Docker containers for reproducibility and integrates widely used bioinformatics tools for sequence alignment and quantification. In addition, it includes Bioconductor packages for exploratory data and differential expression analysis. Moreover, WIND implements a "dual" approach for the evaluation of sncRNAs expression level quantifying the aligned reads to the annotated genome and carrying out an alignment-free transcript quantification using reads mapped to the transcriptome. Therefore, a broader range of piRNAs can be annotated, improving their quantification and easing the subsequent downstream analysis. WIND performance has been tested with several small RNA-seq datasets, demonstrating how our approach can be a useful and comprehensive resource to analyse piRNAs and other classes of sncRNAs.
Collapse
Affiliation(s)
- Konstantinos Geles
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy.,Genomix4Life, via S. Allende 43/L, Baronissi, Salerno (SA), 84081, Italy
| | - Domenico Palumbo
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy.,Clinical Research and Innovation, Clinica Montevergine S.p.A., Mercogliano, Mercogliano, 83013, Italy
| | - Assunta Sellitto
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy
| | - Giorgio Giurato
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy.,Genomix4Life, via S. Allende 43/L, Baronissi, Salerno (SA), 84081, Italy.,CRGS (Genome Research Center for Health), University of Salerno Campus of Medicine, Baronissi, Salerno (SA), 84081, Italy
| | - Eleonora Cianflone
- Department of Medical and Surgical Sciences, Magna Graecia University, Viale Europa, Catanzaro, 88100, Italy
| | - Fabiola Marino
- Department of Experimental and Clinical Medicine, Molecular and Cellular Cardiology, Magna Graecia University, Viale Europa, Catanzaro, 88100, Italy
| | - Daniele Torella
- Department of Experimental and Clinical Medicine, Molecular and Cellular Cardiology, Magna Graecia University, Viale Europa, Catanzaro, 88100, Italy
| | - Valeria Mirici Cappa
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy.,Genomix4Life, via S. Allende 43/L, Baronissi, Salerno (SA), 84081, Italy
| | - Giovanni Nassa
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy.,Genomix4Life, via S. Allende 43/L, Baronissi, Salerno (SA), 84081, Italy.,CRGS (Genome Research Center for Health), University of Salerno Campus of Medicine, Baronissi, Salerno (SA), 84081, Italy
| | - Roberta Tarallo
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy.,CRGS (Genome Research Center for Health), University of Salerno Campus of Medicine, Baronissi, Salerno (SA), 84081, Italy
| | - Alessandro Weisz
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy.,CRGS (Genome Research Center for Health), University of Salerno Campus of Medicine, Baronissi, Salerno (SA), 84081, Italy
| | - Francesca Rizzo
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy.,Genomix4Life, via S. Allende 43/L, Baronissi, Salerno (SA), 84081, Italy.,CRGS (Genome Research Center for Health), University of Salerno Campus of Medicine, Baronissi, Salerno (SA), 84081, Italy
| |
Collapse
|