1
|
Chiang TW, Jhong SE, Chen YC, Chen CY, Wu WS, Chuang TJ. FL-circAS: an integrative resource and analysis for full-length sequences and alternative splicing of circular RNAs with nanopore sequencing. Nucleic Acids Res 2024; 52:D115-D123. [PMID: 37823705 PMCID: PMC10767854 DOI: 10.1093/nar/gkad829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 08/26/2023] [Accepted: 10/02/2023] [Indexed: 10/13/2023] Open
Abstract
Circular RNAs (circRNAs) are RNA molecules with a continuous loop structure characterized by back-splice junctions (BSJs). While analyses of short-read RNA sequencing have identified millions of BSJ events, it is inherently challenging to determine exact full-length sequences and alternatively spliced (AS) isoforms of circRNAs. Recent advances in nanopore long-read sequencing with circRNA enrichment bring an unprecedented opportunity for investigating the issues. Here, we developed FL-circAS (https://cosbi.ee.ncku.edu.tw/FL-circAS/), which collected such long-read sequencing data of 20 cell lines/tissues and thereby identified 884 636 BSJs with 1 853 692 full-length circRNA isoforms in human and 115 173 BSJs with 135 617 full-length circRNA isoforms in mouse. FL-circAS also provides multiple circRNA features. For circRNA expression, FL-circAS calculates expression levels for each circRNA isoform, cell line/tissue specificity at both the BSJ and isoform levels, and AS entropy for each BSJ across samples. For circRNA biogenesis, FL-circAS identifies reverse complementary sequences and RNA binding protein (RBP) binding sites residing in flanking sequences of BSJs. For functional patterns, FL-circAS identifies potential microRNA/RBP binding sites and several types of evidence for circRNA translation on each full-length circRNA isoform. FL-circAS provides user-friendly interfaces for browsing, searching, analyzing, and downloading data, serving as the first resource for discovering full-length circRNAs at the isoform level.
Collapse
Affiliation(s)
- Tai-Wei Chiang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Song-En Jhong
- Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan
| | - Yu-Chen Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Chia-Ying Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Wei-Sheng Wu
- Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan
| | | |
Collapse
|
2
|
Chen YC, Chen CY, Chiang TW, Chan MH, Hsiao M, Ke HM, Tsai I, Chuang TJ. Detecting intragenic trans-splicing events from non-co-linearly spliced junctions by hybrid sequencing. Nucleic Acids Res 2023; 51:7777-7797. [PMID: 37497782 PMCID: PMC10450196 DOI: 10.1093/nar/gkad623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 07/14/2023] [Indexed: 07/28/2023] Open
Abstract
Trans-spliced RNAs (ts-RNAs) are a type of non-co-linear (NCL) transcripts that consist of exons in an order topologically inconsistent with the corresponding DNA template. Detecting ts-RNAs is often interfered by experimental artifacts, circular RNAs (circRNAs) and genetic rearrangements. Particularly, intragenic ts-RNAs, which are derived from separate precursor mRNA molecules of the same gene, are often mistaken for circRNAs through analyses of RNA-seq data. Here we developed a bioinformatics pipeline (NCLscan-hybrid), which integrated short and long RNA-seq reads to minimize false positives and proposed out-of-circle and rolling-circle long reads to distinguish between intragenic ts-RNAs and circRNAs. Combining NCLscan-hybrid screening and multiple experimental validation steps successfully confirmed that four NCL events, which were previously regarded as circRNAs in databases, originated from trans-splicing. CRISPR-based endogenous genome modification experiments further showed that flanking intronic complementary sequences can significantly contribute to ts-RNA formation, providing an efficient/specific method to deplete ts-RNAs. We also experimentally validated that one ts-RNA (ts-ARFGEF1) played an important role for p53-mediated apoptosis through affecting the PERK/eIF2a/ATF4/CHOP signaling pathway in breast cancer cells. This study thus described both bioinformatics procedures and experimental validation steps for rigorous characterization of ts-RNAs, expanding future studies for identification, biogenesis, and function of these important but understudied transcripts.
Collapse
Affiliation(s)
- Yu-Chen Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Chia-Ying Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Tai-Wei Chiang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Ming-Hsien Chan
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Department of Biomedical Imaging and Radiological Sciences, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Michael Hsiao
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Huei-Mien Ke
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
- Department of Microbiology, Soochow University, Taipei, Taiwan
| | | | | |
Collapse
|
3
|
Chuang TJ, Chiang TW, Chen CY. Assessing the impacts of various factors on circular RNA reliability. Life Sci Alliance 2023; 6:6/5/e202201793. [PMID: 36849251 PMCID: PMC9971162 DOI: 10.26508/lsa.202201793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 02/15/2023] [Accepted: 02/15/2023] [Indexed: 03/01/2023] Open
Abstract
Circular RNAs (circRNAs) are non-polyadenylated RNAs with a continuous loop structure characterized by a non-colinear back-splice junction (BSJ). Although millions of circRNA candidates have been identified, it remains a major challenge for determining circRNA reliability because of various types of false positives. Here, we systematically assess the impacts of numerous factors related to circRNA identification, conservation, biogenesis, and function on circRNA reliability by comparisons of circRNA expression from mock and the corresponding colinear/polyadenylated RNA-depleted datasets based on three different RNA treatment approaches. Eight important indicators of circRNA reliability are determined. The relative contribution to variability explained analyses reveal that the relative importance of these factors in affecting circRNA reliability in descending order is the conservation level of circRNA, full-length circular sequences, supporting BSJ read count, both BSJ donor and acceptor splice sites at the same colinear transcript isoforms, both BSJ donor and acceptor splice sites at the annotated exon boundaries, BSJs detected by multiple tools, supporting functional features, and both BSJ donor and acceptor splice sites undergoing alternative splicing. This study thus provides a useful guideline and an important resource for selecting high-confidence circRNAs for further investigations.
Collapse
Affiliation(s)
| | - Tai-Wei Chiang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Chia-Ying Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
4
|
Chiang TW, Mai TL, Chuang TJ. CircMiMi: a stand-alone software for constructing circular RNA-microRNA-mRNA interactions across species. BMC Bioinformatics 2022; 23:164. [PMID: 35524165 PMCID: PMC9074202 DOI: 10.1186/s12859-022-04692-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 04/17/2022] [Indexed: 01/22/2023] Open
Abstract
Background Circular RNAs (circRNAs) are a class of non-coding RNAs formed by pre-mRNA back-splicing, which are widely expressed in animal/plant cells and often play an important role in regulating microRNA (miRNA) activities. While numerous databases have collected a large amount of predicted circRNA candidates and provided the corresponding circRNA-regulated interactions, a stand-alone package for constructing circRNA-miRNA-mRNA interactions based on user-identified circRNAs across species is lacking. Results We present CircMiMi (circRNA-miRNA-mRNA interactions), a modular, Python-based software to identify circRNA-miRNA-mRNA interactions across 18 species (including 16 animals and 2 plants) with the given coordinates of circRNA junctions. The CircMiMi-constructed circRNA-miRNA-mRNA interactions are derived from circRNA-miRNA and miRNA-mRNA axes with the support of computational predictions and/or experimental data. CircMiMi also allows users to examine alignment ambiguity of back-splice junctions for checking circRNA reliability and examine reverse complementary sequences residing in the sequences flanking the circularized exons for investigating circRNA formation. We further employ CircMiMi to identify circRNA-miRNA-mRNA interactions based on the circRNAs collected in NeuroCirc, a large-scale database of circRNAs in the human brain. We construct circRNA-miRNA-mRNA interactions comprising differentially expressed circRNAs, and miRNAs in autism spectrum disorder (ASD) and cross-species analyze the relevance of the targets to ASD. We thus provide a rich set of ASD-associated circRNA-miRNA-mRNA axes and a useful starting point for investigation of regulatory mechanisms in ASD pathophysiology. Conclusions CircMiMi allows users to identify circRNA-mediated interactions in multiple species, shedding light on regulatory roles of circRNAs. The software package and web interface are freely available at https://github.com/TreesLab/CircMiMi and http://circmimi.genomics.sinica.edu.tw/, respectively. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04692-0.
Collapse
Affiliation(s)
- Tai-Wei Chiang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Te-Lun Mai
- Department of Life Science, National Taiwan University, Taipei, Taiwan
| | | |
Collapse
|
5
|
Robic A, Cerutti C, Kühn C, Faraut T. Comparative Analysis of the Circular Transcriptome in Muscle, Liver, and Testis in Three Livestock Species. Front Genet 2021; 12:665153. [PMID: 34040640 PMCID: PMC8141914 DOI: 10.3389/fgene.2021.665153] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 04/07/2021] [Indexed: 12/13/2022] Open
Abstract
Circular RNAs have been observed in a large number of species and tissues and are now recognized as a clear component of the transcriptome. Our study takes advantage of functional datasets produced within the FAANG consortium to investigate the pervasiveness of circular RNA transcription in farm animals. We describe here the circular transcriptional landscape in pig, sheep and bovine testicular, muscular and liver tissues using total 66 RNA-seq datasets. After an exhaustive detection of circular RNAs, we propose an annotation of exonic, intronic and sub-exonic circRNAs and comparative analyses of circRNA content to evaluate the variability between individuals, tissues and species. Despite technical bias due to the various origins of the datasets, we were able to characterize some features (i) (ruminant) liver contains more exonic circRNAs than muscle (ii) in testis, the number of exonic circRNAs seems associated with the sexual maturity of the animal. (iii) a particular class of circRNAs, sub-exonic circRNAs, are produced by a large variety of multi-exonic genes (protein-coding genes, long non-coding RNAs and pseudogenes) and mono-exonic genes (protein-coding genes from mitochondrial genome and small non-coding genes). Moreover, for multi-exonic genes there seems to be a relationship between the sub-exonic circRNAs transcription level and the linear transcription level. Finally, sub-exonic circRNAs produced by mono-exonic genes (mitochondrial protein-coding genes, ribozyme, and sno) exhibit a particular behavior. Caution has to be taken regarding the interpretation of the unannotated circRNA proportion in a given tissue/species: clusters of circRNAs without annotation were characterized in genomic regions with annotation and/or assembly problems of the respective animal genomes. This study highlights the importance of improving genome annotation to better consider candidate circRNAs and to better understand the circular transcriptome. Furthermore, it emphasizes the need for considering the relative “weight” of circRNAs/parent genes for comparative analyses of several circular transcriptomes. Although there are points of agreement in the circular transcriptome of the same tissue in two species, it will be not possible to do without the characterization of it in both species.
Collapse
Affiliation(s)
- Annie Robic
- INRAE, ENVT, GenPhySE, Université de Toulouse, Castanet-Tolosan, France
| | - Chloé Cerutti
- INRAE, ENVT, GenPhySE, Université de Toulouse, Castanet-Tolosan, France
| | - Christa Kühn
- Institute Genome Biology, Leibniz Institute for Farm Animal Biology (FBN), Dummerstorf, Germany.,Faculty of Agricultural and Environmental Sciences, University of Rostock, Rostock, Germany
| | - Thomas Faraut
- INRAE, ENVT, GenPhySE, Université de Toulouse, Castanet-Tolosan, France
| |
Collapse
|
6
|
Kui L, Tang M. Overview of Computational Methods and Resources for Circular RNAs. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11638-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|
7
|
Wu W, Ji P, Zhao F. CircAtlas: an integrated resource of one million highly accurate circular RNAs from 1070 vertebrate transcriptomes. Genome Biol 2020; 21:101. [PMID: 32345360 PMCID: PMC7187532 DOI: 10.1186/s13059-020-02018-y] [Citation(s) in RCA: 234] [Impact Index Per Article: 58.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Accepted: 04/14/2020] [Indexed: 12/19/2022] Open
Abstract
Existing circular RNA (circRNA) databases have become essential for transcriptomics. However, most are unsuitable for mining in-depth information for candidate circRNA prioritization. To address this, we integrate circular transcript collections to develop the circAtlas database based on 1070 RNA-seq samples collected from 19 normal tissues across six vertebrate species. This database contains 1,007,087 highly reliable circRNAs, of which over 81.3% have been assembled into full-length sequences. We profile their expression pattern, conservation, and functional annotation. We describe a novel multiple conservation score, co-expression, and regulatory networks for circRNA annotation and prioritization. CircAtlas can be accessed at http://circatlas.biols.ac.cn/.
Collapse
Affiliation(s)
- Wanying Wu
- Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Peifeng Ji
- Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China
| | - Fangqing Zhao
- Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China. .,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China.
| |
Collapse
|
8
|
Dou L, Li X, Ding H, Xu L, Xiang H. Is There Any Sequence Feature in the RNA Pseudouridine Modification Prediction Problem? MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 19:293-303. [PMID: 31865116 PMCID: PMC6931122 DOI: 10.1016/j.omtn.2019.11.014] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 10/29/2019] [Accepted: 11/11/2019] [Indexed: 01/01/2023]
Abstract
Pseudouridine (Ψ) is the most abundant RNA modification and has been found in many kinds of RNAs, including snRNA, rRNA, tRNA, mRNA, and snoRNA. Thus, Ψ sites play a significant role in basic research and drug development. Although some experimental techniques have been developed to identify Ψ sites, they are expensive and time consuming, especially in the post-genomic era with the explosive growth of known RNA sequences. Thus, highly accurate computational methods are urgently required to quickly detect the Ψ sites on uncharacterized RNA sequences. Several predictors have been proposed using multifarious features, but their evaluated performances are still unsatisfactory. In this study, we first identified Ψ sites for H. sapiens, S. cerevisiae, and M. musculus using the sequence features from the bi-profile Bayes (BPB) method based on the random forest (RF) and support vector machine (SVM) algorithms, where the performances were evaluated using 5-fold cross-validation and independent tests. It was found that the SVM-based accuracies were 3.55% and 5.09% lower than the iPseU-CUU predictor for the H_990 and S_628 datasets, respectively. Almost the same-level results were obtained for M_994 and an independent H_200 dataset, even showing a 5.0% improvement for S_200. Then, three different kinds of features, including basic Kmer, general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-General), and nucleotide chemical property (NCP) and nucleotide density (ND) from the iRNA-PseU method, were combined with BPB to show their comprehensive performances, where the effective features are selected by the max-relevance-max-distance (MRMD) method. The best evaluated accuracies of the combined features for the S_628 and M_994 datasets were achieved at 70.54% and 72.45%, which were 2.39% and 0.65% higher than iPseU-CUU. For the S_200 dataset, it was also improved 8% from 69% to 77%. However, there was no obvious improvement for H. sapiens, which was evaluated as approximately 63.23% and 72.0% for the H_990 and H_200 datasets, respectively. The overall performances for Ψ identification using BPB features as well as the combined features were not obviously improved. Although some kinds of feature extraction methods based on the RNA sequence information have been applied to construct the predictors in previous studies, the corresponding accuracies are generally in the range of 60%-70%. Thus, researchers need to reconsider whether there is any sequence feature in the RNA Ψ modification prediction problem.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China; Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaoling Li
- Department of Oncology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China.
| | - Huaikun Xiang
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China.
| |
Collapse
|
9
|
Chen YJ, Chen CY, Mai TL, Chuang CF, Chen YC, Gupta SK, Yen L, Wang YD, Chuang TJ. Genome-wide, integrative analysis of circular RNA dysregulation and the corresponding circular RNA-microRNA-mRNA regulatory axes in autism. Genome Res 2020; 30:375-391. [PMID: 32127416 PMCID: PMC7111521 DOI: 10.1101/gr.255463.119] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Accepted: 02/24/2020] [Indexed: 02/07/2023]
Abstract
Circular RNAs (circRNAs), a class of long noncoding RNAs, are known to be enriched in mammalian neural tissues. Although a wide range of dysregulation of gene expression in autism spectrum disorder (ASD) have been reported, the role of circRNAs in ASD remains largely unknown. Here, we performed genome-wide circRNA expression profiling in postmortem brains from individuals with ASD and controls and identified 60 circRNAs and three coregulated modules that were perturbed in ASD. By integrating circRNA, microRNA, and mRNA dysregulation data derived from the same cortex samples, we identified 8170 ASD-associated circRNA-microRNA-mRNA interactions. Putative targets of the axes were enriched for ASD risk genes and genes encoding inhibitory postsynaptic density (PSD) proteins, but not for genes implicated in monogenetic forms of other brain disorders or genes encoding excitatory PSD proteins. This reflects the previous observation that ASD-derived organoids show overproduction of inhibitory neurons. We further confirmed that some ASD risk genes (NLGN1, STAG1, HSD11B1, VIP, and UBA6) were regulated by an up-regulated circRNA (circARID1A) via sponging a down-regulated microRNA (miR-204-3p) in human neuronal cells. Particularly, alteration of NLGN1 expression is known to affect the dynamic processes of memory consolidation and strengthening. To the best of our knowledge, this is the first systems-level view of circRNA regulatory networks in ASD cortex samples. We provided a rich set of ASD-associated circRNA candidates and the corresponding circRNA-microRNA-mRNA axes, particularly those involving ASD risk genes. Our findings thus support a role for circRNA dysregulation and the corresponding circRNA-microRNA-mRNA axes in ASD pathophysiology.
Collapse
Affiliation(s)
- Yen-Ju Chen
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan.,Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei 10617, Taiwan
| | - Chia-Ying Chen
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Te-Lun Mai
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Chih-Fan Chuang
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Yu-Chen Chen
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Sachin Kumar Gupta
- Department of Pathology and Immunology.,Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Laising Yen
- Department of Pathology and Immunology.,Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Yi-Da Wang
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Trees-Juen Chuang
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan.,Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei 10617, Taiwan
| |
Collapse
|
10
|
Non-coding RNA regulatory networks. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1863:194417. [PMID: 31493559 DOI: 10.1016/j.bbagrm.2019.194417] [Citation(s) in RCA: 245] [Impact Index Per Article: 49.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 08/13/2019] [Accepted: 08/13/2019] [Indexed: 02/06/2023]
Abstract
It is well established that the vast majority of human RNA transcripts do not encode for proteins and that non-coding RNAs regulate cell physiology and shape cellular functions. A subset of them is involved in gene regulation at different levels, from epigenetic gene silencing to post-transcriptional regulation of mRNA stability. Notably, the aberrant expression of many non-coding RNAs has been associated with aggressive pathologies. Rapid advances in network biology indicates that the robustness of cellular processes is the result of specific properties of biological networks such as scale-free degree distribution and hierarchical modularity, suggesting that regulatory network analyses could provide new insights on gene regulation and dysfunction mechanisms. In this study we present an overview of public repositories where non-coding RNA-regulatory interactions are collected and annotated, we discuss unresolved questions for data integration and we recall existing resources to build and analyse networks.
Collapse
|
11
|
Zeng X, Lin W, Guo M, Zou Q. Details in the evaluation of circular RNA detection tools: Reply to Chen and Chuang. PLoS Comput Biol 2019; 15:e1006916. [PMID: 31022173 PMCID: PMC6527241 DOI: 10.1371/journal.pcbi.1006916] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Revised: 05/20/2019] [Accepted: 03/01/2019] [Indexed: 01/21/2023] Open
Affiliation(s)
- Xiangxiang Zeng
- Shenzhen Research Institute of Xiamen University, Shenzhen, China
- School of Information Science and Engineering, Xiamen University, Xiamen, China
| | - Wei Lin
- School of Information Science and Engineering, Xiamen University, Xiamen, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
12
|
Chen CY, Chuang TJ. NCLcomparator: systematically post-screening non-co-linear transcripts (circular, trans-spliced, or fusion RNAs) identified from various detectors. BMC Bioinformatics 2019; 20:3. [PMID: 30606103 PMCID: PMC6318855 DOI: 10.1186/s12859-018-2589-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 12/21/2018] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Non-co-linear (NCL) transcripts consist of exonic sequences that are topologically inconsistent with the reference genome in an intragenic fashion (circular or intragenic trans-spliced RNAs) or in an intergenic fashion (fusion or intergenic trans-spliced RNAs). On the basis of RNA-seq data, numerous NCL event detectors have been developed and detected thousands of NCL events in diverse species. However, there are great discrepancies in the identification results among detectors, indicating a considerable proportion of false positives in the detected NCL events. Although several helpful guidelines for evaluating the performance of NCL event detectors have been provided, a systematic guideline for measurement of NCL events identified by existing tools has not been available. RESULTS We develop a software, NCLcomparator, for systematically post-screening the intragenic or intergenic NCL events identified by various NCL detectors. NCLcomparator first examine whether the input NCL events are potentially false positives derived from ambiguous alignments (i.e., the NCL events have an alternative co-linear explanation or multiple matches against the reference genome). To evaluate the reliability of the identified NCL events, we define the NCL score (NCLscore) based on the variation in the number of supporting NCL junction reads identified by the tools examined. Of the input NCL events, we show that the ambiguous alignment-derived events have relatively lower NCLscore values than the other events, indicating that an NCL event with a higher NCLscore has a higher level of reliability. To help selecting highly expressed NCL events, NCLcomparator also provides a series of useful measurements such as the expression levels of the detected NCL events and their corresponding host genes and the junction usage of the co-linear splice junctions at both NCL donor and acceptor sites. CONCLUSION NCLcomparator provides useful guidelines, with the input of identified NCL events from various detectors and the corresponding paired-end RNA-seq data only, to help users selecting potentially high-confidence NCL events for further functional investigation. The software thus helps to facilitate future studies into NCL events, shedding light on the fundamental biology of this important but understudied class of transcripts. NCLcomparator is freely accessible at https://github.com/TreesLab/NCLcomparator .
Collapse
Affiliation(s)
- Chia-Ying Chen
- Genomics Research Center, Academia Sinica, Taipei, 11529 Taiwan
| | | |
Collapse
|