1
|
Chen K, Litfin T, Singh J, Zhan J, Zhou Y. MARS and RNAcmap3: The Master Database of All Possible RNA Sequences Integrated with RNAcmap for RNA Homology Search. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae018. [PMID: 38872612 DOI: 10.1093/gpbjnl/qzae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 09/24/2023] [Accepted: 10/31/2023] [Indexed: 06/15/2024]
Abstract
Recent success of AlphaFold2 in protein structure prediction relied heavily on co-evolutionary information derived from homologous protein sequences found in the huge, integrated database of protein sequences (Big Fantastic Database). In contrast, the existing nucleotide databases were not consolidated to facilitate wider and deeper homology search. Here, we built a comprehensive database by incorporating the non-coding RNA (ncRNA) sequences from RNAcentral, the transcriptome assembly and metagenome assembly from metagenomics RAST (MG-RAST), the genomic sequences from Genome Warehouse (GWH), and the genomic sequences from MGnify, in addition to the nucleotide (nt) database and its subsets in National Center of Biotechnology Information (NCBI). The resulting Master database of All possible RNA sequences (MARS) is 20-fold larger than NCBI's nt database or 60-fold larger than RNAcentral. The new dataset along with a new split-search strategy allows a substantial improvement in homology search over existing state-of-the-art techniques. It also yields more accurate and more sensitive multiple sequence alignments (MSAs) than manually curated MSAs from Rfam for the majority of structured RNAs mapped to Rfam. The results indicate that MARS coupled with the fully automatic homology search tool RNAcmap will be useful for improved structural and functional inference of ncRNAs and RNA language models based on MSAs. MARS is accessible at https://ngdc.cncb.ac.cn/omix/release/OMIX003037, and RNAcmap3 is accessible at http://zhouyq-lab.szbl.ac.cn/download/.
Collapse
Affiliation(s)
- Ke Chen
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
- Peking University Shenzhen Graduate School, Shenzhen 518055, China
- University of Science and Technology of China, Hefei 230026, China
- Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou 215123, China
| | - Thomas Litfin
- Institute for Glycomics, Griffith University, Southport, QLD 4222, Australia
| | - Jaswinder Singh
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| | - Jian Zhan
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| | - Yaoqi Zhou
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
- Peking University Shenzhen Graduate School, Shenzhen 518055, China
- Institute for Glycomics, Griffith University, Southport, QLD 4222, Australia
| |
Collapse
|
2
|
Zhou B, Ding M, Feng J, Ji B, Huang P, Zhang J, Yu X, Cao Z, Yang Y, Zhou Y, Wang J. EVlncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning. Brief Bioinform 2022; 24:6961472. [PMID: 36573492 PMCID: PMC9851331 DOI: 10.1093/bib/bbac583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 11/02/2022] [Accepted: 11/29/2022] [Indexed: 12/28/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) played essential roles in nearly every biological process and disease. Many algorithms were developed to distinguish lncRNAs from mRNAs in transcriptomic data and facilitated discoveries of more than 600 000 of lncRNAs. However, only a tiny fraction (<1%) of lncRNA transcripts (~4000) were further validated by low-throughput experiments (EVlncRNAs). Given the cost and labor-intensive nature of experimental validations, it is necessary to develop computational tools to prioritize those potentially functional lncRNAs because many lncRNAs from high-throughput sequencing (HTlncRNAs) could be resulted from transcriptional noises. Here, we employed deep learning algorithms to separate EVlncRNAs from HTlncRNAs and mRNAs. For overcoming the challenge of small datasets, we employed a three-layer deep-learning neural network (DNN) with a K-mer feature as the input and a small convolutional neural network (CNN) with one-hot encoding as the input. Three separate models were trained for human (h), mouse (m) and plant (p), respectively. The final concatenated models (EVlncRNA-Dpred (h), EVlncRNA-Dpred (m) and EVlncRNA-Dpred (p)) provided substantial improvement over a previous model based on support-vector-machines (EVlncRNA-pred). For example, EVlncRNA-Dpred (h) achieved 0.896 for the area under receiver-operating characteristic curve, compared with 0.582 given by sequence-based EVlncRNA-pred model. The models developed here should be useful for screening lncRNA transcripts for experimental validations. EVlncRNA-Dpred is available as a web server at https://www.sdklab-biophysics-dzu.net/EVlncRNA-Dpred/index.html, and the data and source code can be freely available along with the web server.
Collapse
Affiliation(s)
- Bailing Zhou
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Maolin Ding
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Jing Feng
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Baohua Ji
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Pingping Huang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Junye Zhang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Xue Yu
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Zanxia Cao
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yaoqi Zhou
- Corresponding authors: Yaoqi Zhou, Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China. Tel.: +86 (755) 6275 2684; E-mail: ; Jihua Wang, Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China. Tel.: +86 (534) 898 5933; E-mail:
| | - Jihua Wang
- Corresponding authors: Yaoqi Zhou, Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China. Tel.: +86 (755) 6275 2684; E-mail: ; Jihua Wang, Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China. Tel.: +86 (534) 898 5933; E-mail:
| |
Collapse
|
3
|
Jiang S, Zhang Q, Li J, Raziq K, Kang X, Liang S, Sun C, Liang X, Zhao D, Fu S, Cai M. New Sights Into Long Non-Coding RNA LINC01133 in Cancer. Front Oncol 2022; 12:908162. [PMID: 35747817 PMCID: PMC9209730 DOI: 10.3389/fonc.2022.908162] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 05/13/2022] [Indexed: 11/13/2022] Open
Abstract
LINC01133 is a long intergenic non-coding RNA that regulates malignancy in several cancers, including those of the digestive, female reproductive, respiratory, and urinary system. LINC01133 is an extensively studied lncRNA that is highly conserved, and its relatively stable expression is essential for its robust biological function. Its expression is highly tissue-specific with a distinct subcellular localization. It functions as an oncogene or a tumor suppressor gene in different cancers via multiple mechanisms, such as those that involve competing with endogenous RNA and binding to RNA-binding proteins or DNA. Moreover, the secretion and transportation of LINC01133 by extracellular vesicles in the tumor micro-environment is regulated by other cells in the tumor micro-environment. To date, two mechanisms, an increase in copy number and regulation of transcription elements, have been found to regulate LINC01133 expression. Clinically, LINC01133 is an ideal marker for cancer prognosis and a potential therapeutic target in cancer treatment regimes. In this review, we aimed to summarize the aforementioned information as well as posit future directions for LINC01133 research.
Collapse
Affiliation(s)
- Shengnan Jiang
- Key Laboratory of Preservation of Human Genetic Resources and DiseaseControl, Ministry of Education, Harbin Medical University, Harbin, China
- Laboratory of Medical Genetics, Harbin Medical University, Harbin, China
| | - Qian Zhang
- Key Laboratory of Preservation of Human Genetic Resources and DiseaseControl, Ministry of Education, Harbin Medical University, Harbin, China
- Laboratory of Medical Genetics, Harbin Medical University, Harbin, China
| | - Jiaqi Li
- Key Laboratory of Preservation of Human Genetic Resources and DiseaseControl, Ministry of Education, Harbin Medical University, Harbin, China
- Laboratory of Medical Genetics, Harbin Medical University, Harbin, China
| | - Khadija Raziq
- Key Laboratory of Preservation of Human Genetic Resources and DiseaseControl, Ministry of Education, Harbin Medical University, Harbin, China
- Laboratory of Medical Genetics, Harbin Medical University, Harbin, China
| | - Xinyu Kang
- Key Laboratory of Preservation of Human Genetic Resources and DiseaseControl, Ministry of Education, Harbin Medical University, Harbin, China
- Laboratory of Medical Genetics, Harbin Medical University, Harbin, China
| | - Shiyin Liang
- Key Laboratory of Preservation of Human Genetic Resources and DiseaseControl, Ministry of Education, Harbin Medical University, Harbin, China
- Laboratory of Medical Genetics, Harbin Medical University, Harbin, China
| | - Chaoyue Sun
- Key Laboratory of Preservation of Human Genetic Resources and DiseaseControl, Ministry of Education, Harbin Medical University, Harbin, China
- Laboratory of Medical Genetics, Harbin Medical University, Harbin, China
| | - Xiao Liang
- Key Laboratory of Preservation of Human Genetic Resources and DiseaseControl, Ministry of Education, Harbin Medical University, Harbin, China
- Laboratory of Medical Genetics, Harbin Medical University, Harbin, China
| | - Di Zhao
- Department of Genecology and Obstetrics, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Songbin Fu
- Key Laboratory of Preservation of Human Genetic Resources and DiseaseControl, Ministry of Education, Harbin Medical University, Harbin, China
- Laboratory of Medical Genetics, Harbin Medical University, Harbin, China
| | - Mengdi Cai
- Key Laboratory of Preservation of Human Genetic Resources and DiseaseControl, Ministry of Education, Harbin Medical University, Harbin, China
- Laboratory of Medical Genetics, Harbin Medical University, Harbin, China
- *Correspondence: Mengdi Cai,
| |
Collapse
|
4
|
Chen Q, Liu K, Yu R, Zhou B, Huang P, Cao Z, Zhou Y, Wang J. From "Dark Matter" to "Star": Insight Into the Regulation Mechanisms of Plant Functional Long Non-Coding RNAs. FRONTIERS IN PLANT SCIENCE 2021; 12:650926. [PMID: 34163498 PMCID: PMC8215657 DOI: 10.3389/fpls.2021.650926] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Accepted: 05/05/2021] [Indexed: 05/27/2023]
Abstract
Long non-coding RNAs (lncRNAs) play a vital role in a variety of biological functions in plant growth and development. In this study, we provided an overview of the molecular mechanisms of lncRNAs in interacting with other biomolecules with an emphasis on those lncRNAs validated only by low-throughput experiments. LncRNAs function through playing multiple roles, including sponger for sequestering RNA or DNA, guider or decoy for recruiting or hijacking transcription factors or peptides, and scaffold for binding with chromatin modification complexes, as well as precursor of microRNAs or small interfering RNAs. These regulatory roles have been validated in several plant species with a comprehensive list of 73 lncRNA-molecule interaction pairs in 16 plant species found so far, suggesting their commonality in the plant kingdom. Such initial findings of a small number of functional plant lncRNAs represent the beginning of what is to come as lncRNAs with unknown functions were found in orders of magnitude more than proteins.
Collapse
Affiliation(s)
- Qingshuai Chen
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Kui Liu
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Ru Yu
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Bailing Zhou
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Pingping Huang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Zanxia Cao
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Yaoqi Zhou
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
- Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, China
- Peking University Shenzhen Graduate School, Shenzhen, China
| | - Jihua Wang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| |
Collapse
|
5
|
Zhou B, Ji B, Liu K, Hu G, Wang F, Chen Q, Yu R, Huang P, Ren J, Guo C, Zhao H, Zhang H, Zhao D, Li Z, Zeng Q, Yu J, Bian Y, Cao Z, Xu S, Yang Y, Zhou Y, Wang J. EVLncRNAs 2.0: an updated database of manually curated functional long non-coding RNAs validated by low-throughput experiments. Nucleic Acids Res 2021; 49:D86-D91. [PMID: 33221906 PMCID: PMC7778902 DOI: 10.1093/nar/gkaa1076] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/21/2020] [Accepted: 10/22/2020] [Indexed: 12/25/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) play important functional roles in many diverse biological processes. However, not all expressed lncRNAs are functional. Thus, it is necessary to manually collect all experimentally validated functional lncRNAs (EVlncRNA) with their sequences, structures, and functions annotated in a central database. The first release of such a database (EVLncRNAs) was made using the literature prior to 1 May 2016. Since then (till 15 May 2020), 19 245 articles related to lncRNAs have been published. In EVLncRNAs 2.0, these articles were manually examined for a major expansion of the data collected. Specifically, the number of annotated EVlncRNAs, associated diseases, lncRNA-disease associations, and interaction records were increased by 260%, 320%, 484% and 537%, respectively. Moreover, the database has added several new categories: 8 lncRNA structures, 33 exosomal lncRNAs, 188 circular RNAs, and 1079 drug-resistant, chemoresistant, and stress-resistant lncRNAs. All records have checked against known retraction and fake articles. This release also comes with a highly interactive visual interaction network that facilitates users to track the underlying relations among lncRNAs, miRNAs, proteins, genes and other functional elements. Furthermore, it provides links to four new bioinformatics tools with improved data browsing and searching functionality. EVLncRNAs 2.0 is freely available at https://www.sdklab-biophysics-dzu.net/EVLncRNAs2/.
Collapse
Affiliation(s)
- Bailing Zhou
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Baohua Ji
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
- College of Physics and Electronic Information, Dezhou University, Dezhou 253023, China
| | - Kui Liu
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Guodong Hu
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Fei Wang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Qingshuai Chen
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Ru Yu
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Pingping Huang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Jing Ren
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Chengang Guo
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou 510120, China
| | - Hongmei Zhang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
- College of Life Science, Dezhou University, Dezhou 253023, China
| | - Dongbo Zhao
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Zhiwei Li
- Department of General Surgery, Dezhou Municipal Hospital, Dezhou 253012, China
| | - Qiangcheng Zeng
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
- College of Life Science, Dezhou University, Dezhou 253023, China
| | - Jiafeng Yu
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Yunqiang Bian
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Zanxia Cao
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Shicai Xu
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510275, China
| | - Yaoqi Zhou
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, QLD 4222, Australia
| | - Jihua Wang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| |
Collapse
|
6
|
Towards a comprehensive pipeline to identify and functionally annotate long noncoding RNA (lncRNA). Comput Biol Med 2020; 127:104028. [PMID: 33126123 DOI: 10.1016/j.compbiomed.2020.104028] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 09/28/2020] [Accepted: 09/29/2020] [Indexed: 12/20/2022]
Abstract
Long noncoding RNAs (lncRNAs) are implicated in various genetic diseases and cancer, attributed to their critical role in gene regulation. They are a divergent group of RNAs and are easily differentiated from other types with unique characteristics, functions, and mechanisms of action. In this review, we provide a list of some of the prominent data repositories containing lncRNAs, their interactome, and predicted and validated disease associations. Next, we discuss various wet-lab experiments formulated to obtain the data for these repositories. We also provide a critical review of in silico methods available for the identification purpose and suggest techniques to further improve their performance. The bulk of the methods currently focus on distinguishing lncRNA transcripts from the coding ones. Functional annotation of these transcripts still remains a grey area and more efforts are needed in that space. Finally, we provide details of current progress, discuss impediments, and illustrate a roadmap for developing a generalized computational pipeline for comprehensive annotation of lncRNAs, which is essential to accelerate research in this area.
Collapse
|
7
|
Sun K, Wang H, Sun H. NAMS webserver: coding potential assessment and functional annotation of plant transcripts. Brief Bioinform 2020; 22:5906158. [PMID: 33080021 PMCID: PMC8138890 DOI: 10.1093/bib/bbaa200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Revised: 07/23/2020] [Accepted: 08/04/2020] [Indexed: 11/16/2022] Open
Abstract
Recent advances in transcriptomics have uncovered lots of novel transcripts in plants. To annotate such transcripts, dissecting their coding potential is a critical step. Computational approaches have been proven fruitful in this task; however, most current tools are designed/optimized for mammals and only a few of them have been tested on a limited number of plant species. In this work, we present NAMS webserver, which contains a novel coding potential classifier, NAMS, specifically optimized for plants. We have evaluated the performance of NAMS using a comprehensive dataset containing more than 3 million transcripts from various plant species, where NAMS demonstrates high accuracy and remarkable performance improvements over state-of-the-art software. Moreover, our webserver also furnishes functional annotations, aiming to provide users informative clues to the functions of their transcripts. Considering that most plant species are poorly characterized, our NAMS webserver could serve as a valuable resource to facilitate the transcriptomic studies. The webserver with testing dataset is freely available at http://sunlab.cpy.cuhk.edu.hk/NAMS/.
Collapse
Affiliation(s)
- Kun Sun
- Corresponding authors: Kun Sun, Shenzhen Bay Laboratory, Shenzhen 518132, China. Tel.: +86-0755-2641-9310; Fax: +86-755-8696-7710. E-mail: ; Hao Sun, Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong SAR 999077, China. Tel.: +852-3763-6048; Fax: +852-2636-5090. E-mail:
| | | | | |
Collapse
|