1
|
Zhou Y, Cui H, Liu D, Wang W. MSTCRB: Predicting circRNA-RBP interaction by extracting multi-scale features based on transformer and attention mechanism. Int J Biol Macromol 2024; 278:134805. [PMID: 39153682 DOI: 10.1016/j.ijbiomac.2024.134805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/14/2024] [Accepted: 08/14/2024] [Indexed: 08/19/2024]
Abstract
CircRNAs play vital roles in biological system mainly through binding RNA-binding protein (RBP), which is essential for regulating physiological processes in vivo and for identifying causal disease variants. Therefore, predicting interactions between circRNA and RBP is a critical step for the discovery of new therapeutic agents. Application of various deep-learning models in bioinformatics has significantly improved prediction and classification performance. However, most of existing prediction models are only applicable to specific type of RNA or RNA with simple characteristics. In this study, we proposed an attractive deep learning model, MSTCRB, based on transformer and attention mechanism for extracting multi-scale features to predict circRNA-RBP interactions. Therein, K-mer and KNF encoding are employed to capture the global sequence features of circRNA, NCP and DPCP encoding are utilized to extract local sequence features, and the CDPfold method is applied to extract structural features. In order to improve prediction performance, optimized transformer framework and attention mechanism were used to integrate these multi-scale features. We compared our model's performance with other five state-of-the-art methods on 37 circRNA datasets and 31 linear RNA datasets. The results show that the average AUC value of MSTCRB reaches 98.45 %, which is better than other comparative methods. All of above datasets are deposited in https://github.com/chy001228/MSTCRB_database.git and source code are available from https://github.com/chy001228/MSTCRB.git.
Collapse
Affiliation(s)
- Yun Zhou
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China; Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China.
| | - Haoyu Cui
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Dong Liu
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China; Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China.
| | - Wei Wang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China; Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China.
| |
Collapse
|
2
|
Sanadgol N, Amini J, Khalseh R, Bakhshi M, Nikbin A, Beyer C, Zendehdel A. Mitochondrial genome-derived circRNAs: Orphan epigenetic regulators in molecular biology. Mitochondrion 2024; 79:101968. [PMID: 39321951 DOI: 10.1016/j.mito.2024.101968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 09/02/2024] [Accepted: 09/18/2024] [Indexed: 09/27/2024]
Abstract
Mitochondria are vital for cellular activities, influencing ATP production, Ca2+ signaling, and reactive oxygen species generation. It has been proposed that nuclear genome-derived circular RNAs (circRNAs) play a role in biological processes. For the first time, this study aims to comprehensively explore experimentally confirmed human mitochondrial genome-derived circRNAs (mt-circRNAs) via in-silico analysis. We utilized wide-ranging bioinformatics tools to anticipate their roles in molecular biology, involving miRNA sponging, protein antagonism, and peptide translation. Among five well-characterized mt-circRNAs, SCAR/mc-COX2 stands out as particularly significant with the potential to sponge around 41 different miRNAs, which target several genes mostly involved in endocytosis, MAP kinase, and PI3K-Akt pathways. Interestingly, circMNTND5 and mecciND1 specifically interact with miRNAs through their unique back-splice junction sequence. These exclusively targeted miRNAs (has-miR-5186, 6888-5p, 8081, 924, 672-5p) are predominantly associated with insulin secretion, proteoglycans in cancer, and MAPK signaling pathways. Moreover, all mt-circRNAs intricately affect the P53 pathway through miRNA sequestration. Remarkably, mc-COX2 and circMNTND5 appear to be involved in the RNA's biogenesis by antagonizing AGO1/2, EIF4A3, and DGCR8. All mt-circRNAs engaged with IGF2BP proteins crucial in redox signaling, and except mecciND1, they all potentially generate at least one protein resembling the immunoglobulin heavy chain protein. Given P53's function as a redox-sensitive transcription factor, and insulin's role as a crucial regulator of energy metabolism, their indirect interplay with mt-circRNAs could influence cellular outcomes. However, due to limited attention and infrequent data availability, it is advisable to conduct more thorough investigations to gain a deeper understanding of the functions of mt-circRNA.
Collapse
Affiliation(s)
- Nima Sanadgol
- Institute of Neuroanatomy, RWTH University Hospital Aachen, 52074 Aachen, Germany.
| | - Javad Amini
- Department of Physiology and Pharmacology, School of Medicine, North Khorasan University of Medical Sciences, 94149-75516 Bojnurd, Iran
| | - Roghayeh Khalseh
- Institute of Neuroanatomy, RWTH University Hospital Aachen, 52074 Aachen, Germany
| | - Mostafa Bakhshi
- Department of Electrical and Computer Engineering, Kharazmi University, 15719-14911 Tehran, Iran
| | - Arezoo Nikbin
- Department of Oral and Maxillofacial Radiology, School of Dentistry, Golestan University of Medical Sciences, Gorgan, Iran
| | - Cordian Beyer
- Institute of Neuroanatomy, RWTH University Hospital Aachen, 52074 Aachen, Germany
| | - Adib Zendehdel
- Institut of Anatomy, Department of Biomedicine, University of Basel, 4031 Basel, Switzerland
| |
Collapse
|
3
|
Liu L, Wei Y, Tan Z, Zhang Q, Sun J, Zhao Q. Predicting circRNA-RBP Binding Sites Using a Hybrid Deep Neural Network. Interdiscip Sci 2024; 16:635-648. [PMID: 38381315 DOI: 10.1007/s12539-024-00616-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/26/2024] [Accepted: 01/29/2024] [Indexed: 02/22/2024]
Abstract
Circular RNAs (circRNAs) are non-coding RNAs generated by reverse splicing. They are involved in biological process and human diseases by interacting with specific RNA-binding proteins (RBPs). Due to traditional biological experiments being costly, computational methods have been proposed to predict the circRNA-RBP interaction. However, these methods have problems of single feature extraction. Therefore, we propose a novel model called circ-FHN, which utilizes only circRNA sequences to predict circRNA-RBP interactions. The circ-FHN approach involves feature coding and a hybrid deep learning model. Feature coding takes into account the physicochemical properties of circRNA sequences and employs four coding methods to extract sequence features. The hybrid deep structure comprises a convolutional neural network (CNN) and a bidirectional gated recurrent unit (BiGRU). The CNN learns high-level abstract features, while the BiGRU captures long-term dependencies in the sequence. To assess the effectiveness of circ-FHN, we compared it to other computational methods on 16 datasets and conducted ablation experiments. Additionally, we conducted motif analysis. The results demonstrate that circ-FHN exhibits exceptional performance and surpasses other methods. circ-FHN is freely available at https://github.com/zhaoqi106/circ-FHN .
Collapse
Affiliation(s)
- Liwei Liu
- College of Science, Dalian Jiaotong University, Dalian, 116028, China
- Key Laboratory of Computational Science and Application of Hainan Province, Hainan Normal University, Haikou, 571158, China
| | - Yixin Wei
- College of Science, Dalian Jiaotong University, Dalian, 116028, China
| | - Zhebin Tan
- College of Software, Dalian Jiaotong University, Dalian, 116028, China
| | - Qi Zhang
- College of Science, Dalian Jiaotong University, Dalian, 116028, China
| | - Jianqiang Sun
- School of Information Science and Engineering, Linyi University, Linyi, 276000, China.
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China.
| |
Collapse
|
4
|
Zuo Y, Chen H, Yang L, Chen R, Zhang X, Deng Z. Research progress on prediction of RNA-protein binding sites in the past five years. Anal Biochem 2024; 691:115535. [PMID: 38643894 DOI: 10.1016/j.ab.2024.115535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 04/08/2024] [Accepted: 04/11/2024] [Indexed: 04/23/2024]
Abstract
Accurately predicting RNA-protein binding sites is essential to gain a deeper comprehension of the protein-RNA interactions and their regulatory mechanisms, which are fundamental in gene expression and regulation. However, conventional biological approaches to detect these sites are often costly and time-consuming. In contrast, computational methods for predicting RNA protein binding sites are both cost-effective and expeditious. This review synthesizes already existing computational methods, summarizing commonly used databases for predicting RNA protein binding sites. In addition, applications and innovations of computational methods using traditional machine learning and deep learning for RNA protein binding site prediction during 2018-2023 are presented. These methods cover a wide range of aspects such as effective database utilization, feature selection and encoding, innovative classification algorithms, and evaluation strategies. Exploring the limitations of existing computational methods, this paper delves into the potential directions for future development. DeepRKE, RDense, and DeepDW all employ convolutional neural networks and long and short-term memory networks to construct prediction models, yet their algorithm design and feature encoding differ, resulting in diverse prediction performances.
Collapse
Affiliation(s)
- Yun Zuo
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Huixian Chen
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Lele Yang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Ruoyan Chen
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Xiaoyao Zhang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Zhaohong Deng
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China.
| |
Collapse
|
5
|
Yuan L, Zhao L, Lai J, Jiang Y, Zhang Q, Shen Z, Zheng CH, Huang DS. iCRBP-LKHA: Large convolutional kernel and hybrid channel-spatial attention for identifying circRNA-RBP interaction sites. PLoS Comput Biol 2024; 20:e1012399. [PMID: 39173070 PMCID: PMC11373821 DOI: 10.1371/journal.pcbi.1012399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 09/04/2024] [Accepted: 08/08/2024] [Indexed: 08/24/2024] Open
Abstract
Circular RNAs (circRNAs) play vital roles in transcription and translation. Identification of circRNA-RBP (RNA-binding protein) interaction sites has become a fundamental step in molecular and cell biology. Deep learning (DL)-based methods have been proposed to predict circRNA-RBP interaction sites and achieved impressive identification performance. However, those methods cannot effectively capture long-distance dependencies, and cannot effectively utilize the interaction information of multiple features. To overcome those limitations, we propose a DL-based model iCRBP-LKHA using deep hybrid networks for identifying circRNA-RBP interaction sites. iCRBP-LKHA adopts five encoding schemes. Meanwhile, the neural network architecture, which consists of large kernel convolutional neural network (LKCNN), convolutional block attention module with one-dimensional convolution (CBAM-1D) and bidirectional gating recurrent unit (BiGRU), can explore local information, global context information and multiple features interaction information automatically. To verify the effectiveness of iCRBP-LKHA, we compared its performance with shallow learning algorithms on 37 circRNAs datasets and 37 circRNAs stringent datasets. And we compared its performance with state-of-the-art DL-based methods on 37 circRNAs datasets, 37 circRNAs stringent datasets and 31 linear RNAs datasets. The experimental results not only show that iCRBP-LKHA outperforms other competing methods, but also demonstrate the potential of this model in identifying other RNA-RBP interaction sites.
Collapse
Affiliation(s)
- Lin Yuan
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Ling Zhao
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Jinling Lai
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Yufeng Jiang
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Qinhu Zhang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Ningbo, China
| | - Zhen Shen
- School of Computer and Software, Nanyang Institute of Technology, Nanyang, China
| | - Chun-Hou Zheng
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei, China
| | - De-Shuang Huang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Ningbo, China
| |
Collapse
|
6
|
Li F, Ma C, Lei S, Pan Y, Lin L, Pan C, Li Q, Geng F, Min D, Tang X. Gingipains may be one of the key virulence factors of Porphyromonas gingivalis to impair cognition and enhance blood-brain barrier permeability: An animal study. J Clin Periodontol 2024; 51:818-839. [PMID: 38414291 DOI: 10.1111/jcpe.13966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 01/24/2024] [Accepted: 02/08/2024] [Indexed: 02/29/2024]
Abstract
AIM Blood-brain barrier (BBB) disorder is one of the early findings in cognitive impairments. We have recently found that Porphyromonas gingivalis bacteraemia can cause cognitive impairment and increased BBB permeability. This study aimed to find out the possible key virulence factors of P. gingivalis contributing to the pathological process. MATERIALS AND METHODS C57/BL6 mice were infected with P. gingivalis or gingipains or P. gingivalis lipopolysaccharide (P. gingivalis LPS group) by tail vein injection for 8 weeks. The cognitive behaviour changes in mice, the histopathological changes in the hippocampus and cerebral cortex, the alternations of BBB permeability, and the changes in Mfsd2a and Cav-1 levels were measured. The mechanisms of Ddx3x-induced regulation on Mfsd2a by arginine-specific gingipain A (RgpA) in BMECs were explored. RESULTS P. gingivalis and gingipains significantly promoted mice cognitive impairment, pathological changes in the hippocampus and cerebral cortex, increased BBB permeability, inhibited Mfsd2a expression and up-regulated Cav-1 expression. After RgpA stimulation, the permeability of the BBB model in vitro increased, and the Ddx3x/Mfsd2a/Cav-1 regulatory axis was activated. CONCLUSIONS Gingipains may be one of the key virulence factors of P. gingivalis to impair cognition and enhance BBB permeability by the Ddx3x/Mfsd2a/Cav-1 axis.
Collapse
Affiliation(s)
- Fulong Li
- Department of Periodontics, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
- Center of Implantology, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| | - Chunliang Ma
- Department of Periodontics, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| | - Shuang Lei
- Department of Pediatric Dentistry, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| | - Yaping Pan
- Department of Periodontics, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| | - Li Lin
- Department of Periodontics, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| | - Chunling Pan
- Department of Periodontics, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| | - Qian Li
- Department of Periodontics, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| | - Fengxue Geng
- Department of Periodontics, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| | - Dongyu Min
- Traditional Chinese Medicine Experimental Center, Affiliated Hospital of Liaoning University of Traditional Chinese Medicine, Shenyang, China
- Key Laboratory of Ministry of Education for TCM Viscera State Theory and Applications, Liaoning University of Traditional Chinese Medicine, Shenyang, China
| | - Xiaolin Tang
- Department of Periodontics, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| |
Collapse
|
7
|
Mou Y, Lv K. Extracellular vesicle-delivered hsa_circ_0090081 regulated by EIF4A3 enhances gastric cancer tumorigenesis. Cell Div 2024; 19:19. [PMID: 38862985 PMCID: PMC11165812 DOI: 10.1186/s13008-024-00123-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 06/03/2024] [Indexed: 06/13/2024] Open
Abstract
BACKGROUND Circular RNA (circRNA) and extracellular vesicles (EVs) in tumors are crucial for the malignant phenotype of tumor cells. Nevertheless, the mechanisms and clinical effects of EV-delivered hsa_circ_0090081 in gastric cancer (GC) are unclear. This study aimed to reveal the effect of eukaryotic translation initiation factor 4A3 (EIF4A3)-mediated hsa_circ_0090081 expression and EV-delivered hsa_circ_0090081 on GC progression. METHODS qRT-PCR was conducted to clarify hsa_circ_0090081 and EIF4A3 levels in GC tissues. Transmission electronic microscopy (TEM), nanoparticle tracking analysis (NTA), and Western blotting identified the EVs isolated from GC cells by ultracentrifugation. The roles of hsa_circ_0090081, EIF4A3, and EV-delivered hsa_circ_0090081 in GC cells were analyzed using Transwell, EdU, and CCK-8 assays. The regulatory role between EIF4A3 and hsa_circ_0090081 was investigated using RIP, qRT-PCR, and Pearson's analysis. RESULTS Our study showed that hsa_circ_0090081 and EIF4A3 were highly expressed in GC, and hsa_circ_0090081 was associated with poor prognosis. Data revealed that hsa_circ_0090081 inhibition restrained GC cell proliferation, invasion, and migration. Additionally, EIF4A3 could bind to the pre-mRNA of PHEX (linear form of hsa_circ_0090081) to enhance hsa_circ_0090081 expression in GC cells. Moreover, EIF4A3 overexpression nullified the malignant phenotypic suppression caused by hsa_circ_0090081 silencing in GC cells. Furthermore, EVs secreted by GC cells delivered hsa_circ_0090081 to facilitate the malignant progression of targeted GC cells. CONCLUSION This study showed that hsa_circ_0090081 was enhanced by EIF4A3 to play a promotive role in GC development. The results may help understand the mechanism of EIF4A3 and EV-delivered hsa_circ_0090081 and offer a valuable GC therapeutic target.
Collapse
Affiliation(s)
- Yanjie Mou
- Department of Tradition Chinese Medicine, Wuhan Third Hospital (Tongren Hospital of Wuhan University), No. 241, Pengliuyang Road, Wuchang District, Wuhan, 430060, Hubei, China
| | - Kun Lv
- Department of Tradition Chinese Medicine, Wuhan Third Hospital (Tongren Hospital of Wuhan University), No. 241, Pengliuyang Road, Wuchang District, Wuhan, 430060, Hubei, China.
| |
Collapse
|
8
|
Digby B, Finn S, Ó Broin P. Computational approaches and challenges in the analysis of circRNA data. BMC Genomics 2024; 25:527. [PMID: 38807085 PMCID: PMC11134749 DOI: 10.1186/s12864-024-10420-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 05/15/2024] [Indexed: 05/30/2024] Open
Abstract
Circular RNAs (circRNA) are a class of non-coding RNA, forming a single-stranded covalently closed loop structure generated via back-splicing. Advancements in sequencing methods and technologies in conjunction with algorithmic developments of bioinformatics tools have enabled researchers to characterise the origin and function of circRNAs, with practical applications as a biomarker of diseases becoming increasingly relevant. Computational methods developed for circRNA analysis are predicated on detecting the chimeric back-splice junction of circRNAs whilst mitigating false-positive sequencing artefacts. In this review, we discuss in detail the computational strategies developed for circRNA identification, highlighting a selection of tool strengths, weaknesses and assumptions. In addition to circRNA identification tools, we describe methods for characterising the role of circRNAs within the competing endogenous RNA (ceRNA) network, their interactions with RNA-binding proteins, and publicly available databases for rich circRNA annotation.
Collapse
Affiliation(s)
- Barry Digby
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland.
| | - Stephen Finn
- Discipline of Histopathology, School of Medicine, Trinity College Dublin and Cancer Molecular Diagnostic Laboratory, Dublin, Ireland
| | - Pilib Ó Broin
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland
| |
Collapse
|
9
|
Yuan Y, Tang X, Li H, Lang X, Song Y, Yang Y, Zhou Z. BiLSTM- and CNN-Based m6A Modification Prediction Model for circRNAs. Molecules 2024; 29:2429. [PMID: 38893304 PMCID: PMC11173551 DOI: 10.3390/molecules29112429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 05/13/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open
Abstract
m6A methylation, a ubiquitous modification on circRNAs, exerts a profound influence on RNA function, intracellular behavior, and diverse biological processes, including disease development. While prediction algorithms exist for mRNA m6A modifications, a critical gap remains in the prediction of circRNA m6A modifications. Therefore, accurate identification and prediction of m6A sites are imperative for understanding RNA function and regulation. This study presents a novel hybrid model combining a convolutional neural network (CNN) and a bidirectional long short-term memory network (BiLSTM) for precise m6A methylation site prediction in circular RNAs (circRNAs) based on data from HEK293 cells. This model exploits the synergy between CNN's ability to extract intricate sequence features and BiLSTM's strength in capturing long-range dependencies. Furthermore, the integrated attention mechanism empowers the model to pinpoint critical biological information for studying circRNA m6A methylation. Our model, exhibiting over 78% prediction accuracy on independent datasets, offers not only a valuable tool for scientific research but also a strong foundation for future biomedical applications. This work not only furthers our understanding of gene expression regulation but also opens new avenues for the exploration of circRNA methylation in biological research.
Collapse
Affiliation(s)
- Yuqian Yuan
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China; (Y.Y.); (H.L.); (X.L.); (Y.S.)
| | - Xiaozhu Tang
- School of Medicine & Holistic Integrative Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China;
| | - Hongyan Li
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China; (Y.Y.); (H.L.); (X.L.); (Y.S.)
| | - Xufeng Lang
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China; (Y.Y.); (H.L.); (X.L.); (Y.S.)
| | - Yihua Song
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China; (Y.Y.); (H.L.); (X.L.); (Y.S.)
| | - Ye Yang
- School of Medicine & Holistic Integrative Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China;
| | - Zuojian Zhou
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China; (Y.Y.); (H.L.); (X.L.); (Y.S.)
| |
Collapse
|
10
|
Lasantha D, Vidanagamachchi S, Nallaperuma S. CRIECNN: Ensemble convolutional neural network and advanced feature extraction methods for the precise forecasting of circRNA-RBP binding sites. Comput Biol Med 2024; 174:108466. [PMID: 38615462 DOI: 10.1016/j.compbiomed.2024.108466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 03/29/2024] [Accepted: 04/08/2024] [Indexed: 04/16/2024]
Abstract
Circular RNAs (circRNAs) have surfaced as important non-coding RNA molecules in biology. Understanding interactions between circRNAs and RNA-binding proteins (RBPs) is crucial in circRNA research. Existing prediction models suffer from limited availability and accuracy, necessitating advanced approaches. In this study, we propose CRIECNN (Circular RNA-RBP Interaction predictor using an Ensemble Convolutional Neural Network), a novel ensemble deep learning model that enhances circRNA-RBP binding site prediction accuracy. CRIECNN employs advanced feature extraction methods and evaluates four distinct sequence datasets and encoding techniques (BERT, Doc2Vec, KNF, EIIP). The model consists of an ensemble convolutional neural network, a BiLSTM, and a self-attention mechanism for feature refinement. Our results demonstrate that CRIECNN outperforms state-of-the-art methods in accuracy and performance, effectively predicting circRNA-RBP interactions from both full-length sequences and fragments. This novel strategy makes an enormous advancement in the prediction of circRNA-RBP interactions, improving our understanding of circRNAs and their regulatory roles.
Collapse
Affiliation(s)
- Dilan Lasantha
- Department of Computer Science, University of Ruhuna, Sri Lanka.
| | | | - Sam Nallaperuma
- Department of Engineering, University of Cambridge, United Kingdom.
| |
Collapse
|
11
|
Wu H, Liu X, Fang Y, Yang Y, Huang Y, Pan X, Shen HB. Decoding protein binding landscape on circular RNAs with base-resolution transformer models. Comput Biol Med 2024; 171:108175. [PMID: 38402841 DOI: 10.1016/j.compbiomed.2024.108175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 01/16/2024] [Accepted: 02/18/2024] [Indexed: 02/27/2024]
Abstract
Circular RNAs (circRNAs), a class of endogenous RNA with a covalent loop structure, can regulate gene expression by serving as sponges for microRNAs and RNA-binding proteins (RBPs). To date, most computational methods for predicting RBP binding sites on circRNAs focus on circRNA fragments instead of circRNAs. These methods detect whether a circRNA fragment contains binding sites, but cannot determine where are the binding sites and how many binding sites are on the circRNA transcript. We report a hybrid deep learning-based tool, CircSite, to predict RBP binding sites at single-nucleotide resolution and detect key contributed nucleotides on circRNA transcripts. CircSite takes advantage of convolutional neural networks (CNNs) and Transformer for learning local and global representations of circRNAs binding to RBPs, respectively. We construct 37 datasets of circRNAs interacting with proteins for benchmarking and the experimental results show that CircSite offers accurate predictions of RBP binding nucleotides and detects key subsequences aligning well with known binding motifs. CircSite is an easy-to-use online webserver for predicting RBP binding sites on circRNA transcripts and freely available at http://www.csbio.sjtu.edu.cn/bioinf/CircSite/.
Collapse
Affiliation(s)
- Hehe Wu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, And Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaojian Liu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, And Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Yi Fang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, And Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Yang Yang
- Center for Brain-Like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yan Huang
- State Key Laboratory of Infrared Physics, Shanghai Institute of Technical Physics Chinese Academy of Sciences, 500 Yutian Road, Shanghai, 200083, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, And Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, And Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| |
Collapse
|
12
|
Cao C, Wang C, Yang S, Zou Q. CircSI-SSL: circRNA-binding site identification based on self-supervised learning. Bioinformatics 2024; 40:btae004. [PMID: 38180876 PMCID: PMC10789309 DOI: 10.1093/bioinformatics/btae004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 11/13/2023] [Accepted: 01/03/2024] [Indexed: 01/07/2024] Open
Abstract
MOTIVATION In recent years, circular RNAs (circRNAs), the particular form of RNA with a closed-loop structure, have attracted widespread attention due to their physiological significance (they can directly bind proteins), leading to the development of numerous protein site identification algorithms. Unfortunately, these studies are supervised and require the vast majority of labeled samples in training to produce superior performance. But the acquisition of sample labels requires a large number of biological experiments and is difficult to obtain. RESULTS To resolve this matter that a great deal of tags need to be trained in the circRNA-binding site prediction task, a self-supervised learning binding site identification algorithm named CircSI-SSL is proposed in this article. According to the survey, this is unprecedented in the research field. Specifically, CircSI-SSL initially combines multiple feature coding schemes and employs RNA_Transformer for cross-view sequence prediction (self-supervised task) to learn mutual information from the multi-view data, and then fine-tuning with only a few sample labels. Comprehensive experiments on six widely used circRNA datasets indicate that our CircSI-SSL algorithm achieves excellent performance in comparison to previous algorithms, even in the extreme case where the ratio of training data to test data is 1:9. In addition, the transplantation experiment of six linRNA datasets without network modification and hyperparameter adjustment shows that CircSI-SSL has good scalability. In summary, the prediction algorithm based on self-supervised learning proposed in this article is expected to replace previous supervised algorithms and has more extensive application value. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/cc646201081/CircSI-SSL.
Collapse
Affiliation(s)
- Chao Cao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Shuhong Yang
- Faculty of Mathematics and Computer Science, Guangdong Ocean University, Zhanjiang, Guangdong 524088, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
| |
Collapse
|
13
|
Shi M, Fang Y, Liang Y, Hu Y, Huang J, Xia W, Bian H, Zhuo Q, Wu L, Zhao C. Identification and characterization of differentially expressed circular RNAs in extraocular muscle of oculomotor nerve palsy. BMC Genomics 2023; 24:617. [PMID: 37848864 PMCID: PMC10583365 DOI: 10.1186/s12864-023-09733-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/11/2023] [Indexed: 10/19/2023] Open
Abstract
BACKGROUND Oculomotor nerve palsy (ONP) is a neuroparalytic disorder resulting in dysfunction of innervating extraocular muscles (EOMs), of which the pathological characteristics remain underexplored. METHODS In this study, medial rectus muscle tissue samples from four ONP patients and four constant exotropia (CXT) patients were collected for RNA sequencing. Differentially expressed circular RNAs (circRNAs) were identified and included in functional enrichment analysis, followed by interaction analysis with microRNAs and mRNAs as well as RNA binding proteins. Furthermore, RT-qPCR was used to validate the expression level of the differentially expressed circRNAs. RESULTS A total of 84 differentially expressed circRNAs were identified from 10,504 predicted circRNAs. Functional enrichment analysis indicated that the differentially expressed circRNAs significantly correlated with skeletal muscle contraction. In addition, interaction analyses showed that up-regulated circRNA_03628 was significantly interacted with RNA binding protein AGO2 and EIF4A3 as well as microRNA hsa-miR-188-5p and hsa-miR-4529-5p. The up-regulation of circRNA_03628 was validated by RT-qPCR, followed by further elaboration of the expression, location and clinical significance of circRNA_03628 in EOMs of ONP. CONCLUSIONS Our study may shed light on the role of differentially expressed circRNAs, especially circRNA_03628, in the pathological changes of EOMs in ONP.
Collapse
Affiliation(s)
- Mingsu Shi
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, 83 Fenyang Road, Shanghai, 200031, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, 83 Fenyang Road, Shanghai, 200031, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, 83 Fenyang Road, Shanghai, 200031, China
| | - Yanxi Fang
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, 83 Fenyang Road, Shanghai, 200031, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, 83 Fenyang Road, Shanghai, 200031, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, 83 Fenyang Road, Shanghai, 200031, China
| | - Yu Liang
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, 83 Fenyang Road, Shanghai, 200031, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, 83 Fenyang Road, Shanghai, 200031, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, 83 Fenyang Road, Shanghai, 200031, China
| | - Yuxiang Hu
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, 83 Fenyang Road, Shanghai, 200031, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, 83 Fenyang Road, Shanghai, 200031, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, 83 Fenyang Road, Shanghai, 200031, China
| | - Jiaqiu Huang
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, 83 Fenyang Road, Shanghai, 200031, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, 83 Fenyang Road, Shanghai, 200031, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, 83 Fenyang Road, Shanghai, 200031, China
| | - Weiyi Xia
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, 83 Fenyang Road, Shanghai, 200031, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, 83 Fenyang Road, Shanghai, 200031, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, 83 Fenyang Road, Shanghai, 200031, China
| | - Hewei Bian
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, 83 Fenyang Road, Shanghai, 200031, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, 83 Fenyang Road, Shanghai, 200031, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, 83 Fenyang Road, Shanghai, 200031, China
| | - Qiao Zhuo
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, 83 Fenyang Road, Shanghai, 200031, China
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, 83 Fenyang Road, Shanghai, 200031, China
- Shanghai Key Laboratory of Visual Impairment and Restoration, 83 Fenyang Road, Shanghai, 200031, China
| | - Lianqun Wu
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, 83 Fenyang Road, Shanghai, 200031, China.
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, 83 Fenyang Road, Shanghai, 200031, China.
- Shanghai Key Laboratory of Visual Impairment and Restoration, 83 Fenyang Road, Shanghai, 200031, China.
| | - Chen Zhao
- Eye Institute, Department of Ophthalmology, Eye & ENT Hospital, Fudan University, 83 Fenyang Road, Shanghai, 200031, China.
- NHC Key Laboratory of Myopia (Fudan University), Key Laboratory of Myopia, Chinese Academy of Medical Sciences, 83 Fenyang Road, Shanghai, 200031, China.
- Shanghai Key Laboratory of Visual Impairment and Restoration, 83 Fenyang Road, Shanghai, 200031, China.
| |
Collapse
|
14
|
Shen Z, Liu W, Zhao S, Zhang Q, Wang S, Yuan L. Nucleotide-level prediction of CircRNA-protein binding based on fully convolutional neural network. Front Genet 2023; 14:1283404. [PMID: 37867600 PMCID: PMC10587422 DOI: 10.3389/fgene.2023.1283404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Accepted: 09/21/2023] [Indexed: 10/24/2023] Open
Abstract
Introduction: CircRNA-protein binding plays a critical role in complex biological activity and disease. Various deep learning-based algorithms have been proposed to identify CircRNA-protein binding sites. These methods predict whether the CircRNA sequence includes protein binding sites from the sequence level, and primarily concentrate on analysing the sequence specificity of CircRNA-protein binding. For model performance, these methods are unsatisfactory in accurately predicting motif sites that have special functions in gene expression. Methods: In this study, based on the deep learning models that implement pixel-level binary classification prediction in computer vision, we viewed the CircRNA-protein binding sites prediction as a nucleotide-level binary classification task, and use a fully convolutional neural networks to identify CircRNA-protein binding motif sites (CPBFCN). Results: CPBFCN provides a new path to predict CircRNA motifs. Based on the MEME tool, the existing CircRNA-related and protein-related database, we analysed the motif functions discovered by CPBFCN. We also investigated the correlation between CircRNA sponge and motif distribution. Furthermore, by comparing the motif distribution with different input sequence lengths, we found that some motifs in the flanking sequences of CircRNA-protein binding region may contribute to CircRNA-protein binding. Conclusion: This study contributes to identify circRNA-protein binding and provides help in understanding the role of circRNA-protein binding in gene expression regulation.
Collapse
Affiliation(s)
- Zhen Shen
- School of Computer and Software, Nanyang Institute of Technology, Nanyang, Henan, China
| | - Wei Liu
- School of Computer and Software, Nanyang Institute of Technology, Nanyang, Henan, China
| | - ShuJun Zhao
- School of Computer and Software, Nanyang Institute of Technology, Nanyang, Henan, China
| | - QinHu Zhang
- EIT Institute for Advanced Study, Ningbo, Zhejiang, China
| | - SiGuo Wang
- EIT Institute for Advanced Study, Ningbo, Zhejiang, China
| | - Lin Yuan
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| |
Collapse
|
15
|
Liu N, Zhang Z, Wu Y, Wang Y, Liang Y. CRBSP:Prediction of CircRNA-RBP Binding Sites Based on Multimodal Intermediate Fusion. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2898-2906. [PMID: 37130249 DOI: 10.1109/tcbb.2023.3272400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Circular RNA (CircRNA) is widely expressed and has physiological and pathological significance, regulating post-transcriptional processes via its protein-binding activity. However, whereas much work has been done on linear RNA and RNA binding protein (RBP), little is known about the binding sites of CircRNA. The current report is on the development of a medium-term multimodal data fusion strategy, CRBSP, to predict CircRNA-RBP binding sites. CRBSP represents the CircRNA trinucleotide semantic, location, composition and frequency information as the corresponding coding methods of Word to vector (Word2vec), Position-specific trinucleotide propensity (PSTNP), Pseudo trinucleotide composition (PseTNC) and Trinucleotide nucleotide composition (TNC), respectively. CNN (Convolution Neural Networks) was used to extract global information and BiLSTM (bidirectional Long- and Short-Term Memory network) encoder and LSTM (Long- and Short-Term Memory network) decoder for local sequence information. Enhancement of the contributions of key features by the self-attention mechanism was followed by mid-term fusion of the four enhanced features. Logistic Regression (LR) classifier showed that CRBSP gives a mean AUC value of 0.9362 through 5-fold Cross Validation of all 37 datasets, a performance which is superior to five current state-of-the-art models. Similar evaluation of linear RNA-RBP binding sites gave an AUC value of 0.7615 which is also higher than other prediction methods, demonstrating the robustness of CRBSP.
Collapse
|
16
|
Li L, Xue Z, Du X. ASCRB: Multi-view based attentional feature selection for CircRNA-binding site prediction. Comput Biol Med 2023; 162:107077. [PMID: 37290390 DOI: 10.1016/j.compbiomed.2023.107077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 05/15/2023] [Accepted: 05/27/2023] [Indexed: 06/10/2023]
Abstract
CircRNA is a non-coding RNA with a special circular structure, which plays a key role in a variety of life activities by interacting with RNA-binding proteins through CircRNA binding sites. Therefore, accurately identifying CircRNA binding sites is of great importance for gene regulation. In previous studies, most of the methods are based on single-view or multi-view features. Considering that single-view methods provide less effective information, the current mainstream methods mainly focus on extracting rich relevant features by constructing multiple views. However, the increasing number of views leads to a large amount of redundant information, which is detrimental to the detection of CircRNA binding sites. Therefore, to solve this problem, we propose to use the channel attention mechanism to further obtain useful multi-view features by filtering out invalid information in each view. First, we use five feature encoding schemes to construct multi-view. Then, we calibrate the features by generating the global representation of each view, filtering out redundant information to retain important feature information. Finally, features obtained from multiple views are fused to detect RNA binding sites. To validate the effectiveness of the method, we compared its performance on 37 CircRNA-RBP datasets with existing methods. Experimental results show that the average AUC performance of our method is 93.85%, which is better than the current state-of-the-art methods. We also provide the source code, which can be accessed at https://github.com/dxqllp/ASCRB for access.
Collapse
Affiliation(s)
- Lei Li
- Department of Neurology, Shuyang Hospital Affiliated to Yangzhou University School of Medicine (Shuyang Hospital of Traditional Chinese Medicine, Suqian, China
| | - Zhigang Xue
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Xiuquan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, China; School of Computer Science and Technology, Anhui University, Hefei, China.
| |
Collapse
|
17
|
Cao C, Yang S, Li M, Li C. CircSSNN: circRNA-binding site prediction via sequence self-attention neural networks with pre-normalization. BMC Bioinformatics 2023; 24:220. [PMID: 37254080 DOI: 10.1186/s12859-023-05352-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 05/25/2023] [Indexed: 06/01/2023] Open
Abstract
BACKGROUND Circular RNAs (circRNAs) play a significant role in some diseases by acting as transcription templates. Therefore, analyzing the interaction mechanism between circRNA and RNA-binding proteins (RBPs) has far-reaching implications for the prevention and treatment of diseases. Existing models for circRNA-RBP identification usually adopt convolution neural network (CNN), recurrent neural network (RNN), or their variants as feature extractors. Most of them have drawbacks such as poor parallelism, insufficient stability, and inability to capture long-term dependencies. METHODS In this paper, we propose a new method completely using the self-attention mechanism to capture deep semantic features of RNA sequences. On this basis, we construct a CircSSNN model for the cirRNA-RBP identification. The proposed model constructs a feature scheme by fusing circRNA sequence representations with statistical distributions, static local contexts, and dynamic global contexts. With a stable and efficient network architecture, the distance between any two positions in a sequence is reduced to a constant, so CircSSNN can quickly capture the long-term dependencies and extract the deep semantic features. RESULTS Experiments on 37 circRNA datasets show that the proposed model has overall advantages in stability, parallelism, and prediction performance. Keeping the network structure and hyperparameters unchanged, we directly apply the CircSSNN to linRNA datasets. The favorable results show that CircSSNN can be transformed simply and efficiently without task-oriented tuning. CONCLUSIONS In conclusion, CircSSNN can serve as an appealing circRNA-RBP identification tool with good identification performance, excellent scalability, and wide application scope without the need for task-oriented fine-tuning of parameters, which is expected to reduce the professional threshold required for hyperparameter tuning in bioinformatics analysis.
Collapse
Affiliation(s)
- Chao Cao
- School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou, China
| | - Shuhong Yang
- Key Laboratory of Guangxi Universities on Intelligent Computing and Distributed Information Processing, Guangxi University of Science and Technology, Liuzhou, China.
| | - Mengli Li
- School of Technology, Guilin University, Guilin, China
| | - Chungui Li
- School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou, China.
| |
Collapse
|
18
|
Ma Z, Sun ZL, Liu M. CRBP-HFEF: Prediction of RBP-Binding Sites on circRNAs Based on Hierarchical Feature Expansion and Fusion. Interdiscip Sci 2023:10.1007/s12539-023-00572-0. [PMID: 37233959 DOI: 10.1007/s12539-023-00572-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 04/20/2023] [Accepted: 04/21/2023] [Indexed: 05/27/2023]
Abstract
Circular RNAs (circRNAs) participate in the regulation of biological processes by binding to specific proteins and thus influence transcriptional processes. In recent years, circRNAs have become an emerging hotspot in RNA research. Due to powerful learning ability, the various deep learning frameworks have been used to predict the binding sites of RNA-binding protein (RPB) on circRNAs. These methods usually perform only single-level feature extraction of sequence information. However, the feature acquisition may be inadequate for single-level extraction. Generally, the features of deep and shallow layers of neural network can complement each other and are both important for binding site prediction tasks. Based on this concept, we propose a method that combines deep and shallow features, namely CRBP-HFEF. Specifically, features are first extracted and expanded for different levels of network. Then, the expanded deep and shallow features are fused and fed into the classification network, which finally determines whether they are binding sites. Compared to several existing methods, the experimental results on multiple datasets show that the proposed method achieves significant improvements in a number of metrics (with an average AUC of 0.9855). Moreover, much sufficient ablation experiments are also performed to verify the effectiveness of the hierarchical feature expansion strategy.
Collapse
Affiliation(s)
- Zheng Ma
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, and School of Electrical Engineering and Automation Anhui University, Hefei, 230601, Anhui, China
| | - Zhan-Li Sun
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, and School of Electrical Engineering and Automation Anhui University, Hefei, 230601, Anhui, China.
| | - Mengya Liu
- School of Computer Science and Technology, Anhui University, Hefei, 230601, Anhui, China
| |
Collapse
|
19
|
Rebolledo C, Silva JP, Saavedra N, Maracaja-Coutinho V. Computational approaches for circRNAs prediction and in silico characterization. Brief Bioinform 2023; 24:7150741. [PMID: 37139555 DOI: 10.1093/bib/bbad154] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 03/20/2023] [Accepted: 03/30/2023] [Indexed: 05/05/2023] Open
Abstract
Circular RNAs (circRNAs) are single-stranded and covalently closed non-coding RNA molecules originated from RNA splicing. Their functions include regulatory potential over other RNA species, such as microRNAs, messenger RNAs and RNA binding proteins. For circRNA identification, several algorithms are available and can be classified in two major types: pseudo-reference-based and split-alignment-based approaches. In general, the data generated from circRNA transcriptome initiatives is deposited on public specific databases, which provide a large amount of information on different species and functional annotations. In this review, we describe the main computational resources for the identification and characterization of circRNAs, covering the algorithms and predictive tools to evaluate its potential role in a particular transcriptomics project, including the public repositories containing relevant data and information for circRNAs, recapitulating their characteristics, reliability and amount of data reported.
Collapse
Affiliation(s)
- Camilo Rebolledo
- Center of Molecular Biology & Pharmacogenetics, Department of Basic Sciences, Scientific and Technological Resources, Universidad de La Frontera, Temuco, Chile
- Advanced Center for Chronic Diseases - ACCDiS, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
- Centro de Modelamiento Molecular, Biofísica y Bioinformática - CM2B2, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
| | - Juan Pablo Silva
- Centro de Modelamiento Molecular, Biofísica y Bioinformática - CM2B2, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
- ANID Anillo ACT210004 SYSTEMIX, Rancagua, Chile
| | - Nicolás Saavedra
- Center of Molecular Biology & Pharmacogenetics, Department of Basic Sciences, Scientific and Technological Resources, Universidad de La Frontera, Temuco, Chile
| | - Vinicius Maracaja-Coutinho
- Advanced Center for Chronic Diseases - ACCDiS, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
- Centro de Modelamiento Molecular, Biofísica y Bioinformática - CM2B2, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
- ANID Anillo ACT210004 SYSTEMIX, Rancagua, Chile
- Anillo Inflammation in HIV/AIDS - InflammAIDS, Santiago, Chile
| |
Collapse
|
20
|
Zhang L, Lu C, Zeng M, Li Y, Wang J. CRMSS: predicting circRNA-RBP binding sites based on multi-scale characterizing sequence and structure features. Brief Bioinform 2023; 24:6889442. [PMID: 36511222 DOI: 10.1093/bib/bbac530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 11/01/2022] [Accepted: 11/07/2022] [Indexed: 12/14/2022] Open
Abstract
Circular RNAs (circRNAs) are reverse-spliced and covalently closed RNAs. Their interactions with RNA-binding proteins (RBPs) have multiple effects on the progress of many diseases. Some computational methods are proposed to identify RBP binding sites on circRNAs but suffer from insufficient accuracy, robustness and explanation. In this study, we first take the characteristics of both RNA and RBP into consideration. We propose a method for discriminating circRNA-RBP binding sites based on multi-scale characterizing sequence and structure features, called CRMSS. For circRNAs, we use sequence ${k}\hbox{-}{mer}$ embedding and the forming probabilities of local secondary structures as features. For RBPs, we combine sequence and structure frequencies of RNA-binding domain regions to generate features. We capture binding patterns with multi-scale residual blocks. With BiLSTM and attention mechanism, we obtain the contextual information of high-level representation for circRNA-RBP binding. To validate the effectiveness of CRMSS, we compare its predictive performance with other methods on 37 RBPs. Taking the properties of both circRNAs and RBPs into account, CRMSS achieves superior performance over state-of-the-art methods. In the case study, our model provides reliable predictions and correctly identifies experimentally verified circRNA-RBP pairs. The code of CRMSS is freely available at https://github.com/BioinformaticsCSU/CRMSS.
Collapse
Affiliation(s)
- Lishen Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, China
| | - Chengqian Lu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, China
| | - Min Zeng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, China
| | - Yaohang Li
- Department of Computer Science at Old Dominion University, USA
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, China
| |
Collapse
|
21
|
Ruan H, Wang PC, Han L. Characterization of circular RNAs with advanced sequencing technologies in human complex diseases. WILEY INTERDISCIPLINARY REVIEWS. RNA 2023; 14:e1759. [PMID: 36164985 DOI: 10.1002/wrna.1759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 07/09/2022] [Accepted: 08/02/2022] [Indexed: 01/31/2023]
Abstract
Circular RNAs (circRNAs) are one category of non-coding RNAs that do not possess 5' caps and 3' free ends. Instead, they are derived in closed circle forms from pre-mRNAs by a non-canonical splicing mechanism named "back-splicing." CircRNAs were discovered four decades ago, initially called "scrambled exons." Compared to linear RNAs, the expression levels of circRNAs are considerably lower, and it is challenging to identify circRNAs specifically. Thus, the biological relevance of circRNAs has been underappreciated until the advancement of next generation sequencing (NGS) technology. The biological insights of circRNAs, such as their tissue-specific expression patterns, biogenesis factors, and functional effects in complex diseases, namely human cancers, have been extensively explored in the last decade. With the invention of the third generation sequencing (TGS) with longer sequencing reads and newly designed strategies to characterize full-length circRNAs, the panorama of circRNAs in human complex diseases could be further unveiled. In this review, we first introduce the history of circular RNA detection. Next, we describe widely adopted NGS-based methods and the recently established TGS-based approaches capable of characterizing circRNAs in full-length. We then summarize data resources and representative circRNA functional studies related to human complex diseases. In the last section, we reviewed computational tools and discuss the potential advantages of utilizing advanced sequencing approaches to a functional interpretation of full-length circRNAs in complex diseases. This article is categorized under: RNA Evolution and Genomics > Computational Analyses of RNA RNA in Disease and Development > RNA in Disease.
Collapse
Affiliation(s)
- Hang Ruan
- Institutes of Biology and Medical Sciences, Soochow University, Suzhou, China
| | - Peng-Cheng Wang
- Institutes of Biology and Medical Sciences, Soochow University, Suzhou, China
| | - Leng Han
- Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, Texas A&M University, Houston, Texas, USA.,Department of Translational Medical Sciences, College of Medicine, Texas A&M University, Houston, Texas, USA
| |
Collapse
|
22
|
Wei Q, Zhang Q, Gao H, Song T, Salhi A, Yu B. DEEPStack-RBP: Accurate identification of RNA-binding proteins based on autoencoder feature selection and deep stacking ensemble classifier. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
23
|
Pepe G, Appierdo R, Carrino C, Ballesio F, Helmer-Citterich M, Gherardini PF. Artificial intelligence methods enhance the discovery of RNA interactions. Front Mol Biosci 2022; 9:1000205. [PMID: 36275611 PMCID: PMC9585310 DOI: 10.3389/fmolb.2022.1000205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/20/2022] [Indexed: 11/13/2022] Open
Abstract
Understanding how RNAs interact with proteins, RNAs, or other molecules remains a challenge of main interest in biology, given the importance of these complexes in both normal and pathological cellular processes. Since experimental datasets are starting to be available for hundreds of functional interactions between RNAs and other biomolecules, several machine learning and deep learning algorithms have been proposed for predicting RNA-RNA or RNA-protein interactions. However, most of these approaches were evaluated on a single dataset, making performance comparisons difficult. With this review, we aim to summarize recent computational methods, developed in this broad research area, highlighting feature encoding and machine learning strategies adopted. Given the magnitude of the effect that dataset size and quality have on performance, we explored the characteristics of these datasets. Additionally, we discuss multiple approaches to generate datasets of negative examples for training. Finally, we describe the best-performing methods to predict interactions between proteins and specific classes of RNA molecules, such as circular RNAs (circRNAs) and long non-coding RNAs (lncRNAs), and methods to predict RNA-RNA or RNA-RBP interactions independently of the RNA type.
Collapse
Affiliation(s)
- G Pepe
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - R Appierdo
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - C Carrino
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - F Ballesio
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - M Helmer-Citterich
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - PF Gherardini
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| |
Collapse
|
24
|
JLCRB: A unified multi-view-based joint representation learning for CircRNA binding sites prediction. J Biomed Inform 2022; 136:104231. [DOI: 10.1016/j.jbi.2022.104231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 10/14/2022] [Accepted: 10/14/2022] [Indexed: 11/07/2022]
|
25
|
A pseudo-Siamese framework for circRNA-RBP binding sites prediction integrating BiLSTM and soft attention mechanism. Methods 2022; 207:57-64. [PMID: 36113743 DOI: 10.1016/j.ymeth.2022.09.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 08/24/2022] [Accepted: 09/09/2022] [Indexed: 11/20/2022] Open
Abstract
Circular RNAs (circRNAs) are widely expressed in tissues and play a key role in diseases through interacting with RNA binding proteins (RBPs). Since the high cost of traditional technology, computational methods are developed to identify the binding sites between circRNAs and RBPs. Unfortunately, these methods suffer from the insufficient learning of features and the single classification of output. To address these limitations, we propose a novel method named circ-pSBLA which constructs a pseudo-Siamese framework integrating Bi-directional long short-term memory (BiLSTM) network and soft attention mechanism for circRNA-RBP binding sites prediction. Softmax function and CatBoost are adopted to classify, respectively, and then a pseudo-Siamese framework is constructed. circ-pSBLA combines them to get final output. To validate the effectiveness of circ-pSBLA, we compare it with other state-of-the-art methods and carry out an ablation experiment on 17 sub-datasets. Moreover, we do motif analysis on 3 sub-datasets. The results show that circ-pSBLA achieves superior performance and outperforms other methods. All supporting source codes can be downloaded from https://github.com/gyj9811/circ-pSBLA.
Collapse
|
26
|
Wang Z, Lei X. A web server for identifying circRNA-RBP variable-length binding sites based on stacked generalization ensemble deep learning network. Methods 2022; 205:179-190. [PMID: 35810958 DOI: 10.1016/j.ymeth.2022.06.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Revised: 06/23/2022] [Accepted: 06/28/2022] [Indexed: 11/28/2022] Open
Abstract
Circular RNA (circRNA) can exert biological functions by interacting with RNA-binding protein (RBP), and some deep learning-based methods have been developed to predict RBP binding sites on circRNA. However, most of these methods identify circRNA-RBP binding sites are only based on single data resource and cannot provide exact binding sites, only providing the probability value of a sequence fragment. To solve these problems, we propose a binding sites localization algorithm that fuses binding sites from multiple databases, and further design a stacked generalization ensemble deep learning model named CirRBP to identify RBP binding sites on circRNA. The CirRBP is trained by combining the binding sites from multiple databases and makes predictions by weighted aggregating the predictions of each sub-model. The results show that the CirRBP outperforms any sub-model and existing online prediction model. For better access to our research results, we develop an open-source web application called CRWS (CircRNA-RBP Web Server). Its back-end learning model of the CRWS is a stacked generalization ensemble learning model CirRBP based on different deep learning frameworks. Given a full-length circRNA or fragment sequence and a target RBP, the CRWS can analyze and provide the exact potential binding sites of the target RBP on the given sequence through the binding sites localization algorithm, and visualize it. In addition, the CRWS can discover the most widely distributed motif in each RBP dataset. Up to now, CRWS is the first significant online tool that uses multi-source data to train models and predict exact binding sites. CRWS is now publicly and freely available without login requirement at: http://www.bioinformatics.team.
Collapse
Affiliation(s)
- Zhengfeng Wang
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China; College of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China; Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin 541004, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China.
| |
Collapse
|
27
|
Dong X, Chen K, Chen W, Wang J, Chang L, Deng J, Wei L, Han L, Huang C, He C. circRIP: an accurate tool for identifying circRNA-RBP interactions. Brief Bioinform 2022; 23:6596315. [PMID: 35641157 DOI: 10.1093/bib/bbac186] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 04/07/2022] [Accepted: 04/23/2022] [Indexed: 12/25/2022] Open
Abstract
Circular ribonucleic acids (RNAs) (circRNAs) are formed by covalently linking the downstream splice donor and the upstream splice acceptor. One of the most important functions of circRNAs is mainly exerted through binding RNA-binding proteins (RBPs). However, there is no efficient algorithm for identifying genome-wide circRNA-RBP interactions. Here, we developed a unique algorithm, circRIP, for identifying circRNA-RBP interactions from RNA immunoprecipitation sequencing (RIP-Seq) data. A simulation test demonstrated the sensitivity and specificity of circRIP. By applying circRIP, we identified 95 IGF2BP3-binding circRNAs based on the IGF2BP3 RIP-Seq dataset. We further identified 2823 and 1333 circRNAs binding to >100 RBPs in K562 and HepG2 cell lines, respectively, based on enhanced cross-linking immunoprecipitation (eCLIP) data, demonstrating the significance to survey the potential interactions between circRNAs and RBPs. In this study, we provide an accurate and sensitive tool, circRIP (https://github.com/bioinfolabwhu/circRIP), to systematically identify RBP and circRNA interactions from RIP-Seq and eCLIP data, which can significantly benefit the research community for the functional exploration of circRNAs.
Collapse
Affiliation(s)
- Xin Dong
- School of Basic Medical Sciences, Wuhan University, Wuhan 430071, China
| | - Ke Chen
- Department of Urology,Tongji Hospital, Tongji Medical College,Huazhong University of Science and Technology, 430030, Wuhan, China
| | - Wenbo Chen
- School of Basic Medical Sciences, Wuhan University, Wuhan 430071, China
| | - Jun Wang
- School of Basic Medical Sciences, Wuhan University, Wuhan 430071, China
| | - Liuping Chang
- College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, China
| | - Jin Deng
- School of Basic Medical Sciences, Wuhan University, Wuhan 430071, China
| | - Lei Wei
- School of Basic Medical Sciences, Wuhan University, Wuhan 430071, China
| | - Leng Han
- Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, Texas A&M University, Houston, TX, 77030, USA
| | - Chunhua Huang
- College of Basic Medicine, Guizhou University of Traditional Chinese Medicine, Guiyang, Key Laboratory of Traditional Chinese Medicine Toxicology in Forensic Medicine, Guizhou Education Department, Guiyang 550025, China
| | - Chunjiang He
- School of Basic Medical Sciences, Wuhan University, Wuhan 430071, China.,College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
28
|
Du X, Zhao X, Zhang Y. DeepBtoD: Improved RNA-binding proteins prediction via integrated deep learning. J Bioinform Comput Biol 2022; 20:2250006. [PMID: 35451938 DOI: 10.1142/s0219720022500068] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
RNA-binding proteins (RBPs) have crucial roles in various cellular processes such as alternative splicing and gene regulation. Therefore, the analysis and identification of RBPs is an essential issue. However, although many computational methods have been developed for predicting RBPs, a few studies simultaneously consider local and global information from the perspective of the RNA sequence. Facing this challenge, we present a novel method called DeepBtoD, which predicts RBPs directly from RNA sequences. First, a [Formula: see text]-BtoD encoding is designed, which takes into account the composition of [Formula: see text]-nucleotides and their relative positions and forms a local module. Second, we designed a multi-scale convolutional module embedded with a self-attentive mechanism, the ms-focusCNN, which is used to further learn more effective, diverse, and discriminative high-level features. Finally, global information is considered to supplement local modules with ensemble learning to predict whether the target RNA binds to RBPs. Our preliminary 24 independent test datasets show that our proposed method can classify RBPs with the area under the curve of 0.933. Remarkably, DeepBtoD shows competitive results across seven state-of-the-art methods, suggesting that RBPs can be highly recognized by integrating local [Formula: see text]-BtoD and global information only from RNA sequences. Hence, our integrative method may be useful to improve the power of RBPs prediction, which might be particularly useful for modeling protein-nucleic acid interactions in systems biology studies. Our DeepBtoD server can be accessed at http://175.27.228.227/DeepBtoD/.
Collapse
Affiliation(s)
- XiuQuan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230601, Anhui, P. R. China.,School of Computer Science and Technology, Anhui University, Hefei 230601, Anhui, P. R. China
| | - XiuJuan Zhao
- School of Computer Science and Technology, Anhui University, Hefei 230601, Anhui, P. R. China
| | - YanPing Zhang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230601, Anhui, P. R. China
| |
Collapse
|
29
|
Chalupová E, Vaculík O, Poláček J, Jozefov F, Majtner T, Alexiou P. ENNGene: an Easy Neural Network model building tool for Genomics. BMC Genomics 2022; 23:248. [PMID: 35361122 PMCID: PMC8973509 DOI: 10.1186/s12864-022-08414-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 02/23/2022] [Indexed: 11/17/2022] Open
Abstract
Background The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. Results Here we present ENNGene—Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. Conclusions As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08414-x.
Collapse
Affiliation(s)
- Eliška Chalupová
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia.,Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
| | - Ondřej Vaculík
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia.,Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
| | - Jakub Poláček
- Faculty of Informatics, Masaryk University, Brno, Czechia
| | - Filip Jozefov
- Faculty of Informatics, Masaryk University, Brno, Czechia
| | - Tomáš Majtner
- Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
| | - Panagiotis Alexiou
- Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia.
| |
Collapse
|
30
|
Liu Q, Yu J, Cai Y, Zhang G, Dai X. SAAED: Embedding and Deep Learning Enhance Accurate Prediction of Association Between circRNA and Disease. Front Genet 2022; 13:832244. [PMID: 35273640 PMCID: PMC8902643 DOI: 10.3389/fgene.2022.832244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 01/17/2022] [Indexed: 11/13/2022] Open
Abstract
Emerging evidence indicates that circRNA can regulate various diseases. However, the mechanisms of circRNA in these diseases have not been fully understood. Therefore, detecting potential circRNA–disease associations has far-reaching significance for pathological development and treatment of these diseases. In recent years, deep learning models are used in association analysis of circRNA–disease, but a lack of circRNA–disease association data limits further improvement. Therefore, there is an urgent need to mine more semantic information from data. In this paper, we propose a novel method called Semantic Association Analysis by Embedding and Deep learning (SAAED), which consists of two parts, a neural network embedding model called Entity Relation Network (ERN) and a Pseudo-Siamese network (PSN) for analysis. ERN can fuse multiple sources of data and express the information with low-dimensional embedding vectors. PSN can extract the feature between circRNA and disease for the association analysis. CircRNA–disease, circRNA–miRNA, disease–gene, disease–miRNA, disease–lncRNA, and disease–drug association information are used in this paper. More association data can be introduced for analysis without restriction. Based on the CircR2Disease benchmark dataset for evaluation, a fivefold cross-validation experiment showed an AUC of 98.92%, an accuracy of 95.39%, and a sensitivity of 93.06%. Compared with other state-of-the-art models, SAAED achieves the best overall performance. SAAED can expand the expression of the biological related information and is an efficient method for predicting potential circRNA–disease association.
Collapse
Affiliation(s)
- Qingyu Liu
- School of Electronics and Information Technology, Sun Yat-Sen University, Guangzhou, China
| | - Junjie Yu
- Macquarie Business School, Macquarie University, Sydney, NSW, Australia
| | - Yanning Cai
- College of Information Science and Technology, Jinan University, Guangzhou, China
| | - Guishan Zhang
- College of Engineering, Shantou University, Shantou, China
| | - Xianhua Dai
- School of Electronics and Information Technology, Sun Yat-Sen University, Guangzhou, China
| |
Collapse
|
31
|
Yu B, Wang X, Zhang Y, Gao H, Wang Y, Liu Y, Gao X. RPI-MDLStack: Predicting RNA-protein interactions through deep learning with stacking strategy and LASSO. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108676] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
32
|
Yang Y, Hou Z, Wang Y, Ma H, Sun P, Ma Z, Wong KC, Li X. HCRNet: high-throughput circRNA-binding event identification from CLIP-seq data using deep temporal convolutional network. Brief Bioinform 2022; 23:6533504. [PMID: 35189638 DOI: 10.1093/bib/bbac027] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 01/03/2022] [Accepted: 01/17/2022] [Indexed: 01/11/2023] Open
Abstract
Identifying genome-wide binding events between circular RNAs (circRNAs) and RNA-binding proteins (RBPs) can greatly facilitate our understanding of functional mechanisms within circRNAs. Thanks to the development of cross-linked immunoprecipitation sequencing technology, large amounts of genome-wide circRNA binding event data have accumulated, providing opportunities for designing high-performance computational models to discriminate RBP interaction sites and thus to interpret the biological significance of circRNAs. Unfortunately, there are still no computational models sufficiently flexible to accommodate circRNAs from different data scales and with various degrees of feature representation. Here, we present HCRNet, a novel end-to-end framework for identification of circRNA-RBP binding events. To capture the hierarchical relationships, the multi-source biological information is fused to represent circRNAs, including various natural language sequence features. Furthermore, a deep temporal convolutional network incorporating global expectation pooling was developed to exploit the latent nucleotide dependencies in an exhaustive manner. We benchmarked HCRNet on 37 circRNA datasets and 31 linear RNA datasets to demonstrate the effectiveness of our proposed method. To evaluate further the model's robustness, we performed HCRNet on a full-length dataset containing 740 circRNAs. Results indicate that HCRNet generally outperforms existing methods. In addition, motif analyses were conducted to exhibit the interpretability of HCRNet on circRNAs. All supporting source code and data can be downloaded from https://github.com/yangyn533/HCRNet and https://doi.org/10.6084/m9.figshare.16943722.v1. And the web server of HCRNet is publicly accessible at http://39.104.118.143:5001/.
Collapse
Affiliation(s)
- Yuning Yang
- School of Information Science and Technology, Northeast Normal University, Changchun, Jilin, China
| | - Zilong Hou
- School of Artificial Intelligence, Jilin University, Changchun, Jilin, China
| | - Yansong Wang
- School of Artificial Intelligence, Jilin University, Changchun, Jilin, China
| | - Hongli Ma
- School of Mathematics, Shandong University, Jinan, Shandong, China
| | - Pingping Sun
- School of Information Science and Technology, Northeast Normal University, Changchun, Jilin, China
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University, Changchun, Jilin, China
| | - Ka-Chun Wong
- School of Computer Science, City University of Hong Kong, Hong Kong SAR
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Changchun, Jilin, China
| |
Collapse
|
33
|
Wang Y, Yang Y, Ma Z, Wong KC, Li X. EDCNN: identification of genome-wide RNA-binding proteins using evolutionary deep convolutional neural network. Bioinformatics 2022; 38:678-686. [PMID: 34694393 DOI: 10.1093/bioinformatics/btab739] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Revised: 10/14/2021] [Accepted: 10/20/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION RNA-binding proteins (RBPs) are a group of proteins associated with RNA regulation and metabolism, and play an essential role in mediating the maturation, transport, localization and translation of RNA. Recently, Genome-wide RNA-binding event detection methods have been developed to predict RBPs. Unfortunately, the existing computational methods usually suffer some limitations, such as high-dimensionality, data sparsity and low model performance. RESULTS Deep convolution neural network has a useful advantage for solving high-dimensional and sparse data. To improve further the performance of deep convolution neural network, we propose evolutionary deep convolutional neural network (EDCNN) to identify protein-RNA interactions by synergizing evolutionary optimization with gradient descent to enhance deep conventional neural network. In particular, EDCNN combines evolutionary algorithms and different gradient descent models in a complementary algorithm, where the gradient descent and evolution steps can alternately optimize the RNA-binding event search. To validate the performance of EDCNN, an experiment is conducted on two large-scale CLIP-seq datasets, and results reveal that EDCNN provides superior performance to other state-of-the-art methods. Furthermore, time complexity analysis, parameter analysis and motif analysis are conducted to demonstrate the effectiveness of our proposed algorithm from several perspectives. AVAILABILITY AND IMPLEMENTATION The EDCNN algorithm is available at GitHub: https://github.com/yaweiwang1232/EDCNN. Both the software and the supporting data can be downloaded from: https://figshare.com/articles/software/EDCNN/16803217. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yawei Wang
- School of Artificial Intelligence, Jilin University, Changchun, Jilin, China
| | - Yuning Yang
- School of Information Science and Technology, Northeast Normal University, Changchun, Jilin, China
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University, Changchun, Jilin, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Changchun, Jilin, China
| |
Collapse
|
34
|
DFpin: Deep learning-based protein-binding site prediction with feature-based non-redundancy from RNA level. Comput Biol Med 2022; 142:105216. [PMID: 35030497 DOI: 10.1016/j.compbiomed.2022.105216] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 12/19/2021] [Accepted: 01/02/2022] [Indexed: 11/20/2022]
Abstract
The interaction between proteins and RNA is closely related to various human diseases. Computer-aided drug design can be facilitated by detecting the RNA sites that bind proteins. However, due to the aggregation of binding sites in RNA sequences, high sample similarity occurs when extracting RNA fragments by using a sliding window. Considering these problems, we present a method, DFpin, to predict protein-interacting nucleotides in RNA. To retain more key nucleotide sites, we used the redundancy method based on feature similarity, that is, feature redundancy is removed based on the RNA mono-nucleotide composition to maintain the diversity of RNA samples and avoid the residue of redundant data. In addition, to extract key abstract features and avoid over-fitting, we used the cascade structure of a deep forest model to predict protein-interacting nucleotides. Overall, DFpin demonstrated excellent classification with 85.4% accuracy and 93.3% area under the curve. Compared with other methods, the accuracy of DFpin was better, suggesting that feature-based redundancy removal and deep forest can help predict nucleotides of protein interactions. The source code and all dataset are available at: https://github.com/zhaoxj-tech/DFpin.git.
Collapse
|
35
|
Niu M, Zou Q, Lin C. CRBPDL: Identification of circRNA-RBP interaction sites using an ensemble neural network approach. PLoS Comput Biol 2022; 18:e1009798. [PMID: 35051187 PMCID: PMC8806072 DOI: 10.1371/journal.pcbi.1009798] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 02/01/2022] [Accepted: 01/02/2022] [Indexed: 02/06/2023] Open
Abstract
Circular RNAs (circRNAs) are non-coding RNAs with a special circular structure produced formed by the reverse splicing mechanism. Increasing evidence shows that circular RNAs can directly bind to RNA-binding proteins (RBP) and play an important role in a variety of biological activities. The interactions between circRNAs and RBPs are key to comprehending the mechanism of posttranscriptional regulation. Accurately identifying binding sites is very useful for analyzing interactions. In past research, some predictors on the basis of machine learning (ML) have been presented, but prediction accuracy still needs to be ameliorated. Therefore, we present a novel calculation model, CRBPDL, which uses an Adaboost integrated deep hierarchical network to identify the binding sites of circular RNA-RBP. CRBPDL combines five different feature encoding schemes to encode the original RNA sequence, uses deep multiscale residual networks (MSRN) and bidirectional gating recurrent units (BiGRUs) to effectively learn high-level feature representations, it is sufficient to extract local and global context information at the same time. Additionally, a self-attention mechanism is employed to train the robustness of the CRBPDL. Ultimately, the Adaboost algorithm is applied to integrate deep learning (DL) model to improve prediction performance and reliability of the model. To verify the usefulness of CRBPDL, we compared the efficiency with state-of-the-art methods on 37 circular RNA data sets and 31 linear RNA data sets. Moreover, results display that CRBPDL is capable of performing universal, reliable, and robust. The code and data sets are obtainable at https://github.com/nmt315320/CRBPDL.git. More and more evidences show that circular RNA can directly bind to proteins and participate in countless different biological processes. The calculation method can quickly and accurately predict the binding site of circular RNA and RBP. In order to identify the interaction of circRNA with 37 different types of circRNA binding proteins, we developed an integrated deep learning network based on hierarchical network, called CRBPDL. It can effectively learn high-level feature representations. The performance of the model was verified through comparative experiments of different feature extraction algorithms, different deep learning models and classifier models. Moreover, the CRBPDL model was applied to 31 linear RNAs, and the effectiveness of our method was proved by comparison with the results of current excellent algorithms. It is expected that the CRBPDL model can effectively predict the binding site of circular RNA-RBP and provide reliable candidates for further biological experiments.
Collapse
Affiliation(s)
- Mengting Niu
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Chen Lin
- School of Informatics, Xiamen University, Xiamen, China
- * E-mail:
| |
Collapse
|
36
|
Wang Z, Lei X. Prediction of RBP binding sites on circRNAs using an LSTM-based deep sequence learning architecture. Brief Bioinform 2021; 22:6355419. [PMID: 34415289 DOI: 10.1093/bib/bbab342] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 07/14/2021] [Accepted: 08/02/2021] [Indexed: 01/22/2023] Open
Abstract
Circular RNAs (circRNAs) are widely expressed in highly diverged eukaryotes. Although circRNAs have been known for many years, their function remains unclear. Interaction with RNA-binding protein (RBP) to influence post-transcriptional regulation is considered to be an important pathway for circRNA function, such as acting as an oncogenic RBP sponge to inhibit cancer. In this study, we design a deep learning framework, CRPBsites, to predict the binding sites of RBPs on circRNAs. In this model, the sequences of variable-length binding sites are transformed into embedding vectors by word2vec model. Bidirectional LSTM is used to encode the embedding vectors of binding sites, and then they are fed into another LSTM decoder for decoding and classification tasks. To train and test the model, we construct four datasets that contain sequences of variable-length binding sites on circRNAs, and each set corresponds to an RBP, which is overexpressed in bladder cancer tissues. Experimental results on four datasets and comparison with other existing models show that CRPBsites has superior performance. Afterwards, we found that there were highly similar binding motifs in the four binding site datasets. Finally, we applied well-trained CRPBsites to identify the binding sites of IGF2BP1 on circCDYL, and the results proved the effectiveness of this method. In conclusion, CRPBsites is an effective prediction model for circRNA-RBP interaction site identification. We hope that CRPBsites can provide valuable guidance for experimental studies on the influence of circRNA on post-transcriptional regulation.
Collapse
Affiliation(s)
- Zhengfeng Wang
- School of Computer Science, Shaanxi Normal University, Xi'an, China.,College of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| |
Collapse
|
37
|
Tayara H, Chong KT. Improved Predicting of The Sequence Specificities of RNA Binding Proteins by Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2526-2534. [PMID: 32191896 DOI: 10.1109/tcbb.2020.2981335] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
RNA-binding proteins (RBPs) have a significant role in various regulatory tasks. However, the mechanism by which RBPs identify the subsequence target RNAs is still not clear. In recent years, several machine and deep learning-based computational models have been proposed for understanding the binding preferences of RBPs. These methods required integrating multiple features with raw RNA sequences such as secondary structure and their performances can be further improved. In this paper, we propose an efficient and simple convolution neural network, RBPCNN, that relies on the combination of the raw RNA sequence and evolutionary information. We show that conservation scores (evolutionary information) for the RNA sequences can significantly improve the overall performance of the proposed predictor. In addition, the automatic extraction of the binding sequence motifs can enhance our understanding of the binding specificities of RBPs. The experimental results show that RBPCNN outperforms significantly the current state-of-the-art methods. More specifically, the average area under the receiver operator curve was improved by 2.67 percent and the mean average precision was improved by 8.03 percent. The datasets and results can be downloaded from https://home.jbnu.ac.kr/NSCL/RBPCNN.htm.
Collapse
|
38
|
Li H, Deng Z, Yang H, Pan X, Wei Z, Shen HB, Choi KS, Wang L, Wang S, Wu J. circRNA-binding protein site prediction based on multi-view deep learning, subspace learning and multi-view classifier. Brief Bioinform 2021; 23:6375057. [PMID: 34571539 DOI: 10.1093/bib/bbab394] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 08/08/2021] [Accepted: 08/30/2021] [Indexed: 12/22/2022] Open
Abstract
Circular RNAs (circRNAs) generally bind to RNA-binding proteins (RBPs) to play an important role in the regulation of autoimmune diseases. Thus, it is crucial to study the binding sites of RBPs on circRNAs. Although many methods, including traditional machine learning and deep learning, have been developed to predict the interactions between RNAs and RBPs, and most of them are focused on linear RNAs. At present, few studies have been done on the binding relationships between circRNAs and RBPs. Thus, in-depth research is urgently needed. In the existing circRNA-RBP binding site prediction methods, circRNA sequences are the main research subjects, but the relevant characteristics of circRNAs have not been fully exploited, such as the structure and composition information of circRNA sequences. Some methods have extracted different views to construct recognition models, but how to efficiently use the multi-view data to construct recognition models is still not well studied. Considering the above problems, this paper proposes a multi-view classification method called DMSK based on multi-view deep learning, subspace learning and multi-view classifier for the identification of circRNA-RBP interaction sites. In the DMSK method, first, we converted circRNA sequences into pseudo-amino acid sequences and pseudo-dipeptide components for extracting high-dimensional sequence features and component features of circRNAs, respectively. Then, the structure prediction method RNAfold was used to predict the secondary structure of the RNA sequences, and the sequence embedding model was used to extract the context-dependent features. Next, we fed the above four views' raw features to a hybrid network, which is composed of a convolutional neural network and a long short-term memory network, to obtain the deep features of circRNAs. Furthermore, we used view-weighted generalized canonical correlation analysis to extract four views' common features by subspace learning. Finally, the learned subspace common features and multi-view deep features were fed to train the downstream multi-view TSK fuzzy system to construct a fuzzy rule and fuzzy inference-based multi-view classifier. The trained classifier was used to predict the specific positions of the RBP binding sites on the circRNAs. The experiments show that the prediction performance of the proposed method DMSK has been improved compared with the existing methods. The code and dataset of this study are available at https://github.com/Rebecca3150/DMSK.
Collapse
Affiliation(s)
- Hui Li
- Jiangnan University, Wuxi, Jiangsu 214012, China
| | - Zhaohong Deng
- School of Artificial Intelligence and Computer Science of Jiangnan University, Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (LCNBI) and ZJLab, Wuxi, Jiangsu 214012, China
| | - Haitao Yang
- Jiangnan University, Wuxi, Jiangsu 214012, China
| | - Xiaoyong Pan
- Department of Automation of Shanghai Jiao Tong University, Wuxi, Jiangsu 214012, China
| | - Zhisheng Wei
- School of Biotechnology and Key Laboratory of Industrial Biotechnology Ministry in Jiangnan University, Wuxi, Jiangsu 214012, China
| | - Hong-Bin Shen
- Shanghai Jiao Tong University, Wuxi, Jiangsu 214012, China
| | - Kup-Sze Choi
- Hong Kong Polytechnic University, Wuxi, Jiangsu 214012, China
| | - Lei Wang
- School of Biotechnology and Key Laboratory of Industrial Biotechnology Ministry in Jiangnan University, Wuxi, Jiangsu 214012, China
| | - Shitong Wang
- School of Artificial Intelligence and Computer Science of Jiangnan University, Wuxi, Jiangsu 214012, China
| | - Jing Wu
- School of Biotechnology and Key Laboratory of Industrial Biotechnology Ministry in Jiangnan University, Wuxi, Jiangsu 214012, China
| |
Collapse
|
39
|
Das A, Sinha T, Shyamal S, Panda AC. Emerging Role of Circular RNA-Protein Interactions. Noncoding RNA 2021; 7:48. [PMID: 34449657 PMCID: PMC8395946 DOI: 10.3390/ncrna7030048] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 07/26/2021] [Accepted: 07/29/2021] [Indexed: 12/17/2022] Open
Abstract
Circular RNAs (circRNAs) are emerging as novel regulators of gene expression in various biological processes. CircRNAs regulate gene expression by interacting with cellular regulators such as microRNAs and RNA binding proteins (RBPs) to regulate downstream gene expression. The accumulation of high-throughput RNA-protein interaction data revealed the interaction of RBPs with the coding and noncoding RNAs, including recently discovered circRNAs. RBPs are a large family of proteins known to play a critical role in gene expression by modulating RNA splicing, nuclear export, mRNA stability, localization, and translation. However, the interaction of RBPs with circRNAs and their implications on circRNA biogenesis and function has been emerging in the last few years. Recent studies suggest that circRNA interaction with target proteins modulates the interaction of the protein with downstream target mRNAs or proteins. This review outlines the emerging mechanisms of circRNA-protein interactions and their functional role in cell physiology.
Collapse
Affiliation(s)
- Arundhati Das
- Institute of Life Sciences, Nalco Square, Bhubaneswar 751023, India; (A.D.); (T.S.); (S.S.)
- School of Biotechnology, KIIT University, Bhubaneswar 751024, India
| | - Tanvi Sinha
- Institute of Life Sciences, Nalco Square, Bhubaneswar 751023, India; (A.D.); (T.S.); (S.S.)
| | - Sharmishtha Shyamal
- Institute of Life Sciences, Nalco Square, Bhubaneswar 751023, India; (A.D.); (T.S.); (S.S.)
| | - Amaresh Chandra Panda
- Institute of Life Sciences, Nalco Square, Bhubaneswar 751023, India; (A.D.); (T.S.); (S.S.)
| |
Collapse
|
40
|
Wu H, Pan X, Yang Y, Shen HB. Recognizing binding sites of poorly characterized RNA-binding proteins on circular RNAs using attention Siamese network. Brief Bioinform 2021; 22:6326526. [PMID: 34297803 DOI: 10.1093/bib/bbab279] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 06/04/2021] [Accepted: 07/01/2021] [Indexed: 12/24/2022] Open
Abstract
Circular RNAs (circRNAs) interact with RNA-binding proteins (RBPs) to play crucial roles in gene regulation and disease development. Computational approaches have attracted much attention to quickly predict highly potential RBP binding sites on circRNAs using the sequence or structure statistical binding knowledge. Deep learning is one of the popular learning models in this area but usually requires a lot of labeled training data. It would perform unsatisfactorily for the less characterized RBPs with a limited number of known target circRNAs. How to improve the prediction performance for such small-size labeled characterized RBPs is a challenging task for deep learning-based models. In this study, we propose an RBP-specific method iDeepC for predicting RBP binding sites on circRNAs from sequences. It adopts a Siamese neural network consisting of a lightweight attention module and a metric module. We have found that Siamese neural network effectively enhances the network capability of capturing mutual information between circRNAs with pairwise metric learning. To further deal with the small-sample size problem, we have performed the pretraining using available labeled data from other RBPs and also demonstrate the efficacy of this transfer-learning pipeline. We comprehensively evaluated iDeepC on the benchmark datasets of RBP-binding circRNAs, and the results suggest iDeepC achieving promising results on the poorly characterized RBPs. The source code is available at https://github.com/hehew321/iDeepC.
Collapse
Affiliation(s)
- Hehe Wu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Yang Yang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| |
Collapse
|
41
|
Qaid TS, Mazaar H, Alqahtani MS, Raweh AA, Alakwaa W. Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation. PeerJ Comput Sci 2021; 7:e597. [PMID: 34239977 PMCID: PMC8237341 DOI: 10.7717/peerj-cs.597] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 05/26/2021] [Indexed: 06/13/2023]
Abstract
The worldwide coronavirus (COVID-19) pandemic made dramatic and rapid progress in the year 2020 and requires urgent global effort to accelerate the development of a vaccine to stop the daily infections and deaths. Several types of vaccine have been designed to teach the immune system how to fight off certain kinds of pathogens. mRNA vaccines are the most important candidate vaccines because of their capacity for rapid development, high potency, safe administration and potential for low-cost manufacture. mRNA vaccine acts by training the body to recognize and response to the proteins produced by disease-causing organisms such as viruses or bacteria. This type of vaccine is the fastest candidate to treat COVID-19 but it currently facing several limitations. In particular, it is a challenge to design stable mRNA molecules because of the inefficient in vivo delivery of mRNA, its tendency for spontaneous degradation and low protein expression levels. This work designed and implemented a sequence deep model based on bidirectional GRU and LSTM models applied on the Stanford COVID-19 mRNA vaccine dataset to predict the mRNA sequences responsible for degradation by predicting five reactivity values for every position in the sequence. Four of these values determine the likelihood of degradation with/without magnesium at high pH (pH 10) and high temperature (50 degrees Celsius) and the fifth reactivity value is used to determine the likely secondary structure of the RNA sample. The model relies on two types of features, namely numerical and categorical features, where the categorical features are extracted from the mRNA sequences, structure and predicted loop. These features are represented and encoded by numbers, and then, the features are extracted using embedding layer learning. There are five numerical features depending on the likelihood for each pair of nucleotides in the RNA. The model gives promising results because it predicts the five reactivity values with a validation mean columnwise root mean square error (MCRMSE) of 0.125 using LSTM model with augmentation and the codon encoding method. Codon encoding outperforms Base encoding in MCRMSE validation error using the LSTM model meanwhile Base encoding outperforms codon encoding due to less over-fitting and the difference between the training and validation loss error is 0.008.
Collapse
Affiliation(s)
- Talal S. Qaid
- Computer Science Department, College of Computer Science, King Khalid University, Abha, Saudi Arabia
- Faculty of Computer Science, Hodeidah University, Hodeidah, Yemen
| | - Hussein Mazaar
- Computer Science Department, College of Science & Arts in Tanumah, King Khalid University, Abha, Saudi Arabia
| | - Mohammed S. Alqahtani
- Radiological Sciences Department, College of Applied Medical Sciences, King Khalid University, Abha, Saudi Arabia
| | - Abeer A. Raweh
- Computer Science Department, College of Computer Science, King Khalid University, Abha, Saudi Arabia
- Faculty of Computer Science, Hodeidah University, Hodeidah, Yemen
| | - Wafaa Alakwaa
- Computer Science Department, College of Science & Arts in Tanumah, King Khalid University, Abha, Saudi Arabia
| |
Collapse
|
42
|
Yu J, Sun S, Mao W, Xu B, Chen M. Identification of Enzalutamide Resistance-Related circRNA-miRNA-mRNA Regulatory Networks in Patients with Prostate Cancer. Onco Targets Ther 2021; 14:3833-3848. [PMID: 34188491 PMCID: PMC8232970 DOI: 10.2147/ott.s309917] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 05/25/2021] [Indexed: 12/20/2022] Open
Abstract
Purpose This study aimed to identify enzalutamide resistant-related (EnzR-related) circRNAs and to characterize and validate circRNA-miRNA-mRNA ceRNA regulatory network and corresponding prognostic signature of prostate cancer patients. Methods We obtained circRNA expression microarray from the Gene Expression Omnibus (GEO) database and performed differential expression analysis to identify EnzR-related circRNAs using the limma package. The miRNA and mRNA expression profiling were downloaded and performed differential expression analysis, then overlapped with predicted candidates. Next, we established circRNA-miRNA-mRNA ceRNA network and PPI network utilized Cytoscape software and STRING database, respectively. In addition, univariate and Lasso Cox regression analyses were applied to generate a prognostic signature. Receiver operating characteristic (ROC) curves and Kaplan–Meier analysis were used to evaluate the reliability and sensitivity of the signature. Ultimately, we chose hsa_circ_0047641 to validate the feasibility of the EnzR-related ceRNA regulatory pathway using qRT-PCR, CCK8 and Transwell assays. Results We identified 13 EnzR-related circRNAs and constructed a ceRNA regulatory network that contained two downregulated circRNAs (has-circ-00000919 and has-circ-0000036) and two upregulated circRNAs (has-circ-0047641 and has-circ-0068697), and their sponged 6 miRNAs and 167 targeted mRNAs. Subsequently, these targeted mRNAs were performed to implement PPI analysis and to identify 10 Hub genes. Functional enrichment analysis provided new ways to seek potential biological functions. Besides, we established a prognostic signature of PCa patients based on 8 prognostic-associated mRNAs. We confirmed that the survival rates of PCa patients with high-risk subgroup were slightly lower than those with low-risk subgroup in the TCGA dataset (p<0.001), and ROC curves revealed that the AUC value for prognostic signature was 0.816. Finally, the functional analysis suggested that knockdown of hsa_circ_0047641 could inhibit the progression of PCa and could reverse Enz-resistance in vitro. Conclusion We identified 13 EnzR-related circRNAs, and constructed and confirmed that EnzR-related circRNA-miRNA-mRNA ceRNA network and corresponding prognostic signature could be a useful prognostic biomarker and therapeutic target.
Collapse
Affiliation(s)
- JunJie Yu
- Surgical Research Center, Institute of Urology, School of Medicine, Southeast University, Nanjing, People's Republic of China.,Department of Medical College, Southeast University, Nanjing, Jiangsu, People's Republic of China
| | - Si Sun
- Surgical Research Center, Institute of Urology, School of Medicine, Southeast University, Nanjing, People's Republic of China.,Department of Medical College, Southeast University, Nanjing, Jiangsu, People's Republic of China
| | - WeiPu Mao
- Surgical Research Center, Institute of Urology, School of Medicine, Southeast University, Nanjing, People's Republic of China.,Department of Medical College, Southeast University, Nanjing, Jiangsu, People's Republic of China
| | - Bin Xu
- Department of Urology, Affiliated Zhongda Hospital of Southeast University, Nanjing, People's Republic of China
| | - Ming Chen
- Department of Urology, Affiliated Zhongda Hospital of Southeast University, Nanjing, People's Republic of China.,Institute of Urology, Southeastern University, Nanjing, People's Republic of China.,Department of Urology, Affiliated Lishui People's Hospital of Southeast University, Nanjing, People's Republic of China
| |
Collapse
|
43
|
Ulshöfer CJ, Pfafenrot C, Bindereif A, Schneider T. Methods to study circRNA-protein interactions. Methods 2021; 196:36-46. [PMID: 33894379 DOI: 10.1016/j.ymeth.2021.04.014] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 04/15/2021] [Accepted: 04/18/2021] [Indexed: 02/07/2023] Open
Abstract
Circular RNAs (circRNAs) have been studied extensively in the last few years, uncovering functional roles in a diverse range of cell types and organisms. As shown for a few cases, these functions may be mediated by trans-acting factors, in particular RNA-binding proteins (RBPs). However, the specific interaction partners for most circRNAs remain unknown. This is mainly due to technical difficulties in their identification and in differentiating between interactors of circRNAs and their linear counterparts. Here we review the currently used methodology to systematically study circRNA-protein complexes (circRNPs), focusing either on a specific RNA or protein, both on the gene-specific or global level, and discuss advantages and challenges of the available approaches.
Collapse
Affiliation(s)
- Corinna J Ulshöfer
- Institute of Biochemistry, Justus-Liebig-University of Giessen, 35392 Giessen, Germany
| | - Christina Pfafenrot
- Institute of Biochemistry, Justus-Liebig-University of Giessen, 35392 Giessen, Germany
| | - Albrecht Bindereif
- Institute of Biochemistry, Justus-Liebig-University of Giessen, 35392 Giessen, Germany.
| | - Tim Schneider
- Institute of Biochemistry, Justus-Liebig-University of Giessen, 35392 Giessen, Germany.
| |
Collapse
|
44
|
CircRNA-Protein Interactions in Muscle Development and Diseases. Int J Mol Sci 2021; 22:ijms22063262. [PMID: 33806945 PMCID: PMC8005172 DOI: 10.3390/ijms22063262] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 03/17/2021] [Accepted: 03/19/2021] [Indexed: 02/07/2023] Open
Abstract
Circular RNA (circRNA) is a kind of novel endogenous noncoding RNA formed through back-splicing of mRNA precursor. The biogenesis, degradation, nucleus-cytoplasm transport, location, and even translation of circRNA are controlled by RNA-binding proteins (RBPs). Therefore, circRNAs and the chaperoned RBPs play critical roles in biological functions that significantly contribute to normal animal development and disease. In this review, we systematically characterize the possible molecular mechanism of circRNA-protein interactions, summarize the latest research on circRNA-protein interactions in muscle development and myocardial disease, and discuss the future application of circRNA in treating muscle diseases. Finally, we provide several valid prediction methods and experimental verification approaches. Our review reveals the significance of circRNAs and their protein chaperones and provides a reference for further study in this field.
Collapse
|
45
|
The GAUGAA Motif Is Responsible for the Binding between circSMARCA5 and SRSF1 and Related Downstream Effects on Glioblastoma Multiforme Cell Migration and Angiogenic Potential. Int J Mol Sci 2021; 22:ijms22041678. [PMID: 33562358 PMCID: PMC7915938 DOI: 10.3390/ijms22041678] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 01/26/2021] [Accepted: 02/04/2021] [Indexed: 12/17/2022] Open
Abstract
Circular RNAs (circRNAs) are a large class of RNAs with regulatory functions within cells. We recently showed that circSMARCA5 is a tumor suppressor in glioblastoma multiforme (GBM) and acts as a decoy for Serine and Arginine Rich Splicing Factor 1 (SRSF1) through six predicted binding sites (BSs). Here we characterized RNA motifs functionally involved in the interaction between circSMARCA5 and SRSF1. Three different circSMARCA5 molecules (Mut1, Mut2, Mut3), each mutated in two predicted SRSF1 BSs at once, were obtained through PCR-based replacement of wild-type (WT) BS sequences and cloned in three independent pcDNA3 vectors. Mut1 significantly decreased its capability to interact with SRSF1 as compared to WT, based on the RNA immunoprecipitation assay. In silico analysis through the “Find Individual Motif Occurrences” (FIMO) algorithm showed GAUGAA as an experimentally validated SRSF1 binding motif significantly overrepresented within both predicted SRSF1 BSs mutated in Mut1 (q-value = 0.0011). U87MG and CAS-1, transfected with Mut1, significantly increased their migration with respect to controls transfected with WT, as revealed by the cell exclusion zone assay. Immortalized human brain microvascular endothelial cells (IM-HBMEC) exposed to conditioned medium (CM) harvested from U87MG and CAS-1 transfected with Mut1 significantly sprouted more than those treated with CM harvested from U87MG and CAS-1 transfected with WT, as shown by the tube formation assay. qRT-PCR showed that the intracellular pro- to anti-angiogenic Vascular Endothelial Growth Factor A (VEGFA) mRNA isoform ratio and the amount of total VEGFA mRNA secreted in CM significantly increased in Mut1-transfected CAS-1 as compared to controls transfected with WT. Our data suggest that GAUGAA is the RNA motif responsible for the interaction between circSMARCA5 and SRSF1 as well as for the circSMARCA5-mediated control of GBM cell migration and angiogenic potential.
Collapse
|
46
|
Yuan L, Yang Y. DeCban: Prediction of circRNA-RBP Interaction Sites by Using Double Embeddings and Cross-Branch Attention Networks. Front Genet 2021; 11:632861. [PMID: 33552144 PMCID: PMC7862712 DOI: 10.3389/fgene.2020.632861] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 12/23/2020] [Indexed: 12/17/2022] Open
Abstract
Circular RNAs (circRNAs), as a rising star in the RNA world, play important roles in various biological processes. Understanding the interactions between circRNAs and RNA binding proteins (RBPs) can help reveal the functions of circRNAs. For the past decade, the emergence of high-throughput experimental data, like CLIP-Seq, has made the computational identification of RNA-protein interactions (RPIs) possible based on machine learning methods. However, as the underlying mechanisms of RPIs have not been fully understood yet and the information sources of circRNAs are limited, the computational tools for predicting circRNA-RBP interactions have been very few. In this study, we propose a deep learning method to identify circRNA-RBP interactions, called DeCban, which is featured by hybrid double embeddings for representing RNA sequences and a cross-branch attention neural network for classification. To capture more information from RNA sequences, the double embeddings include pre-trained embedding vectors for both RNA segments and their converted amino acids. Meanwhile, the cross-branch attention network aims to address the learning of very long sequences by integrating features of different scales and focusing on important information. The experimental results on 37 benchmark datasets show that both double embeddings and the cross-branch attention model contribute to the improvement of performance. DeCban outperforms the mainstream deep learning-based methods on not only prediction accuracy but also computational efficiency. The data sets and source code of this study are freely available at: https://github.com/AaronYll/DECban.
Collapse
Affiliation(s)
- Liangliang Yuan
- Department of Computer Science and Engineering, Center for Brain-Like Computing and Machine Intelligence, Shanghai Jiao Tong University, Shanghai, China
| | - Yang Yang
- Department of Computer Science and Engineering, Center for Brain-Like Computing and Machine Intelligence, Shanghai Jiao Tong University, Shanghai, China.,Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| |
Collapse
|
47
|
CircNet: an encoder–decoder-based convolution neural network (CNN) for circular RNA identification. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05673-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
48
|
Wang Z, Lei X. Identifying the sequence specificities of circRNA-binding proteins based on a capsule network architecture. BMC Bioinformatics 2021; 22:19. [PMID: 33413092 PMCID: PMC7792089 DOI: 10.1186/s12859-020-03942-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Accepted: 12/18/2020] [Indexed: 02/07/2023] Open
Abstract
Background Circular RNAs (circRNAs) are widely expressed in cells and tissues and are involved in biological processes and human diseases. Recent studies have demonstrated that circRNAs can interact with RNA-binding proteins (RBPs), which is considered an important aspect for investigating the function of circRNAs. Results In this study, we design a slight variant of the capsule network, called circRB, to identify the sequence specificities of circRNAs binding to RBPs. In this model, the sequence features of circRNAs are extracted by convolution operations, and then, two dynamic routing algorithms in a capsule network are employed to discriminate between different binding sites by analysing the convolution features of binding sites. The experimental results show that the circRB method outperforms the existing computational methods. Afterwards, the trained models are applied to detect the sequence motifs on the seven circRNA-RBP bound sequence datasets and matched to known human RNA motifs. Some motifs on circular RNAs overlap with those on linear RNAs. Finally, we also predict binding sites on the reported full-length sequences of circRNAs interacting with RBPs, attempting to assist current studies. We hope that our model will contribute to better understanding the mechanisms of the interactions between RBPs and circRNAs. Conclusion In view of the poor studies about the sequence specificities of circRNA-binding proteins, we designed a classification framework called circRB based on the capsule network. The results show that the circRB method is an effective method, and it achieves higher prediction accuracy than other methods.
Collapse
Affiliation(s)
- Zhengfeng Wang
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.,College of Information Science and Engineering, Guilin University of Technology, Guilin, 541004, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.
| |
Collapse
|
49
|
Pan X, Fang Y, Li X, Yang Y, Shen HB. RBPsuite: RNA-protein binding sites prediction suite based on deep learning. BMC Genomics 2020; 21:884. [PMID: 33297946 PMCID: PMC7724624 DOI: 10.1186/s12864-020-07291-6] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Accepted: 11/28/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND RNA-binding proteins (RBPs) play crucial roles in various biological processes. Deep learning-based methods have been demonstrated powerful on predicting RBP sites on RNAs. However, the training of deep learning models is very time-intensive and computationally intensive. RESULTS Here we present a deep learning-based RBPsuite, an easy-to-use webserver for predicting RBP binding sites on linear and circular RNAs. For linear RNAs, RBPsuite predicts the RBP binding scores with them using our updated iDeepS. For circular RNAs (circRNAs), RBPsuite predicts the RBP binding scores with them using our developed CRIP. RBPsuite first breaks the input RNA sequence into segments of 101 nucleotides and scores the interaction between the segments and the RBPs. RBPsuite further detects the verified motifs on the binding segments gives the binding scores distribution along the full-length sequence. CONCLUSIONS RBPsuite is an easy-to-use online webserver for predicting RBP binding sites and freely available at http://www.csbio.sjtu.edu.cn/bioinf/RBPsuite/ .
Collapse
Affiliation(s)
- Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| | - Yi Fang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Xianfeng Li
- Key laboratory of Carcinogenesis and Translational Research, Peking University Cancer Hospital, Beijing, 100142, China
| | - Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Center for Brain-Like Computing and Machine Intelligence, Shanghai, 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| |
Collapse
|
50
|
Yang Y, Hou Z, Ma Z, Li X, Wong KC. iCircRBP-DHN: identification of circRNA-RBP interaction sites using deep hierarchical network. Brief Bioinform 2020; 22:5943796. [PMID: 33126261 DOI: 10.1093/bib/bbaa274] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Revised: 09/07/2020] [Accepted: 09/21/2020] [Indexed: 12/19/2022] Open
Abstract
Circular RNAs (circRNAs) are widely expressed in eukaryotes. The genome-wide interactions between circRNAs and RNA-binding proteins (RBPs) can be probed from cross-linking immunoprecipitation with sequencing data. Therefore, computational methods have been developed for identifying RBP binding sites on circRNAs. Unfortunately, those computational methods often suffer from the low discriminative power of feature representations, numerical instability and poor scalability. To address those limitations, we propose a novel computational method called iCircRBP-DHN using deep hierarchical network for discriminating circRNA-RBP binding sites. The network architecture can be regarded as a deep multi-scale residual network followed by bidirectional gated recurrent units (BiGRUs) with the self-attention mechanism, which can simultaneously extract local and global contextual information. Meanwhile, we propose novel encoding schemes by integrating CircRNA2Vec and the K-tuple nucleotide frequency pattern to represent different degrees of nucleotide dependencies. To validate the effectiveness of our proposed iCircRBP-DHN, we compared its performance with other computational methods on 37 circRNAs datasets and 31 linear RNAs datasets, respectively. The experimental results reveal that iCircRBP-DHN can achieve superior performance over those state-of-the-art algorithms. Moreover, we perform motif analysis on circRNAs bound by those different RBPs, demonstrating that our proposed CircRNA2Vec encoding scheme can be promising. The iCircRBP-DHN method is made available at https://github.com/houzl3416/iCircRBP-DHN.
Collapse
Affiliation(s)
- Yuning Yang
- School of Information Science and Technology, Northeast Normal University
| | - Zilong Hou
- School of Artificial Intelligence, Jilin University
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University
| | - Ka-Chun Wong
- School of Artificial Intelligence, Jilin University
| |
Collapse
|