1
|
Kumar N, Srivastava R. Deep learning in structural bioinformatics: current applications and future perspectives. Brief Bioinform 2024; 25:bbae042. [PMID: 38701422 PMCID: PMC11066934 DOI: 10.1093/bib/bbae042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 01/05/2024] [Accepted: 01/18/2024] [Indexed: 05/05/2024] Open
Abstract
In this review article, we explore the transformative impact of deep learning (DL) on structural bioinformatics, emphasizing its pivotal role in a scientific revolution driven by extensive data, accessible toolkits and robust computing resources. As big data continue to advance, DL is poised to become an integral component in healthcare and biology, revolutionizing analytical processes. Our comprehensive review provides detailed insights into DL, featuring specific demonstrations of its notable applications in bioinformatics. We address challenges tailored for DL, spotlight recent successes in structural bioinformatics and present a clear exposition of DL-from basic shallow neural networks to advanced models such as convolution, recurrent, artificial and transformer neural networks. This paper discusses the emerging use of DL for understanding biomolecular structures, anticipating ongoing developments and applications in the realm of structural bioinformatics.
Collapse
Affiliation(s)
- Niranjan Kumar
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Rakesh Srivastava
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| |
Collapse
|
2
|
Li Z, Gao E, Zhou J, Han W, Xu X, Gao X. Applications of deep learning in understanding gene regulation. CELL REPORTS METHODS 2023; 3:100384. [PMID: 36814848 PMCID: PMC9939384 DOI: 10.1016/j.crmeth.2022.100384] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Gene regulation is a central topic in cell biology. Advances in omics technologies and the accumulation of omics data have provided better opportunities for gene regulation studies than ever before. For this reason deep learning, as a data-driven predictive modeling approach, has been successfully applied to this field during the past decade. In this article, we aim to give a brief yet comprehensive overview of representative deep-learning methods for gene regulation. Specifically, we discuss and compare the design principles and datasets used by each method, creating a reference for researchers who wish to replicate or improve existing methods. We also discuss the common problems of existing approaches and prospectively introduce the emerging deep-learning paradigms that will potentially alleviate them. We hope that this article will provide a rich and up-to-date resource and shed light on future research directions in this area.
Collapse
Affiliation(s)
- Zhongxiao Li
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Elva Gao
- The KAUST School, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Juexiao Zhou
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Wenkai Han
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Xiaopeng Xu
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| |
Collapse
|
3
|
Li Z, Li Y, Zhang B, Li Y, Long Y, Zhou J, Zou X, Zhang M, Hu Y, Chen W, Gao X. DeeReCT-APA: Prediction of Alternative Polyadenylation Site Usage Through Deep Learning. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:483-495. [PMID: 33662629 PMCID: PMC9801043 DOI: 10.1016/j.gpb.2020.05.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 03/28/2020] [Accepted: 06/12/2020] [Indexed: 01/26/2023]
Abstract
Alternative polyadenylation (APA) is a crucial step in post-transcriptional regulation. Previous bioinformatic studies have mainly focused on the recognition of polyadenylation sites (PASs) in a given genomic sequence, which is a binary classification problem. Recently, computational methods for predicting the usage level of alternative PASs in the same gene have been proposed. However, all of them cast the problem as a non-quantitative pairwise comparison task and do not take the competition among multiple PASs into account. To address this, here we propose a deep learning architecture, Deep Regulatory Code and Tools for Alternative Polyadenylation (DeeReCT-APA), to quantitatively predict the usage of all alternative PASs of a given gene. To accommodate different genes with potentially different numbers of PASs, DeeReCT-APA treats the problem as a regression task with a variable-length target. Based on a convolutional neural network-long short-term memory (CNN-LSTM) architecture, DeeReCT-APA extracts sequence features with CNN layers, uses bidirectional LSTM to explicitly model the interactions among competing PASs, and outputs percentage scores representing the usage levels of all PASs of a gene. In addition to the fact that only our method can quantitatively predict the usage of all the PASs within a gene, we show that our method consistently outperforms other existing methods on three different tasks for which they are trained: pairwise comparison task, highest usage prediction task, and ranking task. Finally, we demonstrate that our method can be used to predict the effect of genetic variations on APA patterns and sheds light on future mechanistic understanding in APA regulation. Our code and data are available at https://github.com/lzx325/DeeReCT-APA-repo.
Collapse
Affiliation(s)
- Zhongxiao Li
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia
| | - Yisheng Li
- Department of Biology, Southern University of Science and Technology (SUSTech), Shenzhen 518055, China
| | - Bin Zhang
- Cancer Science Institute of Singapore, Singapore 117599, Singapore
| | - Yu Li
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia
| | - Yongkang Long
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia,Department of Biology, Southern University of Science and Technology (SUSTech), Shenzhen 518055, China
| | - Juexiao Zhou
- Department of Biology, Southern University of Science and Technology (SUSTech), Shenzhen 518055, China
| | - Xudong Zou
- Department of Biology, Southern University of Science and Technology (SUSTech), Shenzhen 518055, China
| | - Min Zhang
- Department of Biology, Southern University of Science and Technology (SUSTech), Shenzhen 518055, China
| | - Yuhui Hu
- Department of Biology, Southern University of Science and Technology (SUSTech), Shenzhen 518055, China,Corresponding authors.
| | - Wei Chen
- Department of Biology, Southern University of Science and Technology (SUSTech), Shenzhen 518055, China,Corresponding authors.
| | - Xin Gao
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia,Corresponding authors.
| |
Collapse
|
4
|
Tian S, Zhang B, He Y, Sun Z, Li J, Li Y, Yi H, Zhao Y, Zou X, Li Y, Cui H, Fang L, Gao X, Hu Y, Chen W. OUP accepted manuscript. Nucleic Acids Res 2022; 50:e26. [PMID: 35191504 PMCID: PMC8934656 DOI: 10.1093/nar/gkac108] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 02/01/2022] [Accepted: 02/19/2022] [Indexed: 11/14/2022] Open
Affiliation(s)
| | | | - Yuhao He
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Zhiyuan Sun
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Jun Li
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
- Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen 518055, China
| | - Yisheng Li
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Hongyang Yi
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Yan Zhao
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Xudong Zou
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Yunfei Li
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Huanhuan Cui
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
- Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen 518055, China
| | - Liang Fang
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
- Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen 518055, China
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Yuhui Hu
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
- Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen 518055, China
| | - Wei Chen
- To whom correspondence should be addressed. Tel: +86 755 88018449;
| |
Collapse
|
5
|
Liang W, Zou X, Li G, Zhou S, Tian C, Schaefke B. Systematic Analysis of Monoallelic Gene Expression and Chromatin Accessibility Across Multiple Tissues in Hybrid Mice. Front Cell Dev Biol 2021; 9:717555. [PMID: 34631706 PMCID: PMC8495204 DOI: 10.3389/fcell.2021.717555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 09/01/2021] [Indexed: 11/13/2022] Open
Abstract
In diploid eukaryotic organisms, both alleles of each autosomal gene are usually assumed to be simultaneously expressed at similar levels. However, some genes can be expressed preferentially or strictly from a single allele, a process known as monoallelic expression. Classic monoallelic expression of X-chromosome-linked genes, olfactory receptor genes and developmentally imprinted genes is the result of epigenetic modifications. Genetic-origin-dependent monoallelic expression, however, is caused by cis-regulatory differences between the alleles. There is a paucity of systematic study to investigate these phenomena across multiple tissues, and the mechanisms underlying such monoallelic expression are not yet fully understood. Here we provide a detailed portrait of monoallelic gene expression across multiple tissues/cell lines in a hybrid mouse cross between the Mus musculus strain C57BL/6J and the Mus spretus strain SPRET/EiJ. We observed pervasive tissue-dependent allele-specific gene expression: in total, 1,839 genes exhibited monoallelic expression in at least one tissue, and 410 genes in at least two tissues. Among these 88 are monoallelic genes with different active alleles between tissues, probably representing genetic-origin-dependent monoallelic expression. We also identified six autosomal monoallelic genes with the active allele being identical in all eight tissues, which are likely novel candidates of imprinted genes. To depict the underlying regulatory mechanisms at the chromatin layer, we performed ATAC-seq in two different cell lines derived from the F1 mouse. Consistent with the global expression pattern, cell-type dependent monoallelic peaks were found, and a higher proportion of C57BL/6J-active peaks were observed in both cell types, implying possible species-specific regulation. Finally, only a small part of monoallelic gene expression could be explained by allelic differences in chromatin organization in promoter regions, suggesting that other distal elements may play important roles in shaping the patterns of allelic gene expression across tissues.
Collapse
Affiliation(s)
- Weizheng Liang
- Harbin Institute of Technology, Harbin, China
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Department of Biology, Southern University of Science and Technology, Shenzhen, China
| | - Xudong Zou
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Department of Biology, Southern University of Science and Technology, Shenzhen, China
| | - Guipeng Li
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Department of Biology, Southern University of Science and Technology, Shenzhen, China
- Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen, China
| | - Shaojie Zhou
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Department of Biology, Southern University of Science and Technology, Shenzhen, China
| | - Chi Tian
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Department of Biology, Southern University of Science and Technology, Shenzhen, China
| | - Bernhard Schaefke
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Department of Biology, Southern University of Science and Technology, Shenzhen, China
- Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
6
|
Meng X, Kuang K, Zhang Y, Guan K, Liu B, Zhou X. Alternative polyadenylation events differ dramatically between Tongcheng and Large White pigs in response to PRRSV infection. Anim Genet 2021; 52:744-748. [PMID: 34309053 DOI: 10.1111/age.13125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/05/2021] [Indexed: 12/01/2022]
Abstract
Alternative polyadenylation (APA) is a widespread post-transcriptional regulation mechanism that increases the biological complexity of transcriptome and proteome. However, it is unclear whether APA regulation plays a role in genetic resistance to porcine reproductive and respiratory syndrome virus (PRRSV). Here, we reported genome-wide APA regulation of porcine alveolar macrophages in PRRSV-resistant Tongcheng (TC) pigs and PRRSV-susceptible Large White (LW) pigs upon PRRSV infection. Using 3' mRNA sequencing strategy, we detected 75 981 high-quality APA sites in porcine alveolar macrophages of TC and LW pigs. Furthermore, 1202 and 1089 differentially expressed APA sites, as well as 79 and 117 untranslated region-APA switching genes were identified in TC pigs and LW pigs upon PRRSV infection respectively. The APA events in TC pigs and LW pigs were involved in different biological pathways, while APA events in TC pigs are directly associated with the immune response to PRRSV infection. In addition, we identified genetic variations affecting polyadenylation signal between TC pigs and LW pigs. These findings would provide helpful information on APA regulation for further understanding of genetic resistance to PRRSV.
Collapse
Affiliation(s)
- X Meng
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - K Kuang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Y Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - K Guan
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - B Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China.,The Engineering Technology Research Center of Hubei Province Local Pig Breed Improvement, Wuhan, 430070, China
| | - X Zhou
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China.,The Engineering Technology Research Center of Hubei Province Local Pig Breed Improvement, Wuhan, 430070, China
| |
Collapse
|
7
|
Bhat P, Burkard TR, Herzog VA, Pauli A, Ameres SL. Systematic refinement of gene annotations by parsing mRNA 3' end sequencing datasets. Methods Enzymol 2021; 655:205-223. [PMID: 34183122 DOI: 10.1016/bs.mie.2021.03.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Alternative cleavage and polyadenylation generates mRNA 3' isoforms in a cell type-specific manner. Due to finite available RNA sequencing data of organisms with vast cell type complexity, currently available gene annotation resources are incomplete, which poses significant challenges to the comprehensive interpretation and quantification of transcriptomes. In this chapter, we introduce 3'GAmES, a stand-alone computational pipeline for the identification and quantification of novel mRNA 3'end isoforms from 3'mRNA sequencing data. 3'GAmES expands available repositories and improves comprehensive gene-tag counting by cost-effective 3' mRNA sequencing, faithfully mirroring whole-transcriptome RNAseq measurements. By employing R and bash shell scripts (assembled in a Singularity container) 3'GAmES systematically augments cell type-specific 3' ends of RNA polymerase II transcripts and increases the sensitivity of quantitative gene expression profiling by 3' mRNA sequencing. Public access: https://github.com/AmeresLab/3-GAmES.git.
Collapse
Affiliation(s)
- Pooja Bhat
- Institute of Molecular Biotechnology (IMBA), Vienna BioCenter (VBC), Vienna, Austria; Vienna BioCenter PhD Program, Doctoral School of the University at Vienna and Medical University of Vienna, Vienna, Austria
| | - Thomas R Burkard
- Institute of Molecular Biotechnology (IMBA), Vienna BioCenter (VBC), Vienna, Austria
| | - Veronika A Herzog
- Institute of Molecular Biotechnology (IMBA), Vienna BioCenter (VBC), Vienna, Austria
| | - Andrea Pauli
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria
| | - Stefan L Ameres
- Institute of Molecular Biotechnology (IMBA), Vienna BioCenter (VBC), Vienna, Austria; Max Perutz Labs, University of Vienna, Vienna BioCenter (VBC), Vienna, Austria.
| |
Collapse
|
8
|
Molecular and evolutionary processes generating variation in gene expression. Nat Rev Genet 2020; 22:203-215. [PMID: 33268840 DOI: 10.1038/s41576-020-00304-w] [Citation(s) in RCA: 111] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2020] [Indexed: 12/18/2022]
Abstract
Heritable variation in gene expression is common within and between species. This variation arises from mutations that alter the form or function of molecular gene regulatory networks that are then filtered by natural selection. High-throughput methods for introducing mutations and characterizing their cis- and trans-regulatory effects on gene expression (particularly, transcription) are revealing how different molecular mechanisms generate regulatory variation, and studies comparing these mutational effects with variation seen in the wild are teasing apart the role of neutral and non-neutral evolutionary processes. This integration of molecular and evolutionary biology allows us to understand how the variation in gene expression we see today came to be and to predict how it is most likely to evolve in the future.
Collapse
|
9
|
Xia Z, Li Y, Zhang B, Li Z, Hu Y, Chen W, Gao X. DeeReCT-PolyA: a robust and generic deep learning method for PAS identification. Bioinformatics 2020; 35:2371-2379. [PMID: 30500881 PMCID: PMC6612895 DOI: 10.1093/bioinformatics/bty991] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Revised: 11/06/2018] [Accepted: 11/29/2018] [Indexed: 02/06/2023] Open
Abstract
Motivation Polyadenylation is a critical step for gene expression regulation during the maturation of mRNA. An accurate and robust method for poly(A) signals (PASs) identification is not only desired for the purpose of better transcripts’ end annotation, but can also help us gain a deeper insight of the underlying regulatory mechanism. Although many methods have been proposed for PAS recognition, most of them are PAS motif- and human-specific, which leads to high risks of overfitting, low generalization power, and inability to reveal the connections between the underlying mechanisms of different mammals. Results In this work, we propose a robust, PAS motif agnostic, and highly interpretable and transferrable deep learning model for accurate PAS recognition, which requires no prior knowledge or human-designed features. We show that our single model trained over all human PAS motifs not only outperforms the state-of-the-art methods trained on specific motifs, but can also be generalized well to two mouse datasets. Moreover, we further increase the prediction accuracy by transferring the deep learning model trained on the data of one species to the data of a different species. Several novel underlying poly(A) patterns are revealed through the visualization of important oligomers and positions in our trained models. Finally, we interpret the deep learning models by converting the convolutional filters into sequence logos and quantitatively compare the sequence logos between human and mouse datasets. Availability and implementation https://github.com/likesum/DeeReCT-PolyA Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhihao Xia
- Department of Computer Science and Engineering (CSE), Washington University in St Louis, St Louis, MO, USA
| | - Yu Li
- Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, Saudi Arabia
| | - Bin Zhang
- Department of Biology, Southern University of Science and Technology (SUSTC), Shenzhen, China
| | - Zhongxiao Li
- Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, Saudi Arabia
| | - Yuhui Hu
- Department of Biology, Southern University of Science and Technology (SUSTC), Shenzhen, China
| | - Wei Chen
- Department of Biology, Southern University of Science and Technology (SUSTC), Shenzhen, China
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, Saudi Arabia
| |
Collapse
|
10
|
Characterization and Functional Analysis of Polyadenylation Sites in Fast and Slow Muscles. BIOMED RESEARCH INTERNATIONAL 2020; 2020:2626584. [PMID: 32258109 PMCID: PMC7102456 DOI: 10.1155/2020/2626584] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 12/01/2019] [Accepted: 01/16/2020] [Indexed: 12/05/2022]
Abstract
Many increasing documents have proved that alternative polyadenylation (APA) events with different polyadenylation sites (PAS) contribute to posttranscriptional regulation. However, little is known about the detailed molecular features of PASs and its role in porcine fast and slow skeletal muscles through microRNAs (miRNAs) and RNA binding proteins (RBPs). In this study, we combined single-molecule real-time sequencing and Illumina RNA-seq datasets to comprehensively analyze polyadenylation in pigs. We identified a total of 10,334 PASs, of which 8734 were characterized by reference genome annotation. 32.86% of PAS-associated genes were determined to have more than one PAS. Further analysis demonstrated that tissue-specific PASs between fast and slow muscles were enriched in skeletal muscle development pathways. In addition, we obtained 1407 target genes regulated by APA events through potential binding 69 miRNAs and 28 RBPs in variable 3′ UTR regions and some are involved in myofiber transformation. Furthermore, the de novo motif search confirmed that the most common usage of canonical motif AAUAAA and three types of PASs may be related to the strength of motifs. In summary, our results provide a useful annotation of PASs for pig transcriptome and suggest that APA may serve as a role in fast and slow muscle development under the regulation of miRNAs and RBPs.
Collapse
|
11
|
Li Y, Schaefke B, Zou X, Zhang M, Heyd F, Sun W, Zhang B, Li G, Liang W, He Y, Zhou J, Li Y, Fang L, Hu Y, Chen W. Pan-tissue analysis of allelic alternative polyadenylation suggests widespread functional regulation. Mol Syst Biol 2020; 16:e9367. [PMID: 32311237 PMCID: PMC7170663 DOI: 10.15252/msb.20199367] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 02/29/2020] [Accepted: 03/11/2020] [Indexed: 12/14/2022] Open
Abstract
Alternative polyadenylation (APA) is a major layer of gene regulation. However, it has recently been argued that most APA represents molecular noise. To clarify their functional relevance and evolution, we quantified allele-specific APA patterns in multiple tissues from an F1 hybrid mouse. We found a clearly negative correlation between gene expression and APA diversity for the 2,866 genes (24.9%) with a dominant polyadenylation site (PAS) usage above or equal to 90%, suggesting that their other PASs represent molecular errors. Among the remaining genes with multiple PASs, 3,971 genes (34.5%) express two or more isoforms with potentially functional importance. Interestingly, the genes with potentially functional minor PASs specific to neuronal tissues often express two APA isoforms with distinct subcellular localizations. Furthermore, our analysis of cis-APA divergence shows its pattern across tissues is distinct from that of gene expression. Finally, we demonstrate that the relative usage of alternative PASs is not only affected by their cis-regulatory elements, but also by potential coupling between transcriptional and APA regulation as well as competition kinetics between alternative sites.
Collapse
Affiliation(s)
- Yisheng Li
- Department of BiologySouthern University of Science and TechnologyShenzhenChina
- Laboratory of RNA BiochemistryInstitute of Chemistry and BiochemistryFreie Universität BerlinBerlinGermany
| | - Bernhard Schaefke
- Department of BiologySouthern University of Science and TechnologyShenzhenChina
- Academy for Advanced Interdisciplinary StudiesSouthern University of Science and TechnologyShenzhenChina
| | - Xudong Zou
- Department of BiologySouthern University of Science and TechnologyShenzhenChina
| | - Min Zhang
- Department of BiologySouthern University of Science and TechnologyShenzhenChina
| | - Florian Heyd
- Laboratory of RNA BiochemistryInstitute of Chemistry and BiochemistryFreie Universität BerlinBerlinGermany
| | - Wei Sun
- Department of BiologySouthern University of Science and TechnologyShenzhenChina
- Present address:
Department of Pharmaceutical Chemistry and the Cardiovascular Research InstituteUniversity of California San FranciscoSan FranciscoCAUSA
| | - Bin Zhang
- Department of BiologySouthern University of Science and TechnologyShenzhenChina
- Present address:
Cancer Science Institute of SingaporeNational University of SingaporeSingapore CitySingapore
| | - Guipeng Li
- Department of BiologySouthern University of Science and TechnologyShenzhenChina
- Academy for Advanced Interdisciplinary StudiesSouthern University of Science and TechnologyShenzhenChina
| | - Weizheng Liang
- Department of BiologySouthern University of Science and TechnologyShenzhenChina
| | - Yuhao He
- Department of BiologySouthern University of Science and TechnologyShenzhenChina
| | - Juexiao Zhou
- Department of BiologySouthern University of Science and TechnologyShenzhenChina
| | - Yunfei Li
- Department of BiologySouthern University of Science and TechnologyShenzhenChina
| | - Liang Fang
- Department of BiologySouthern University of Science and TechnologyShenzhenChina
- Academy for Advanced Interdisciplinary StudiesSouthern University of Science and TechnologyShenzhenChina
| | - Yuhui Hu
- Department of BiologySouthern University of Science and TechnologyShenzhenChina
| | - Wei Chen
- Department of BiologySouthern University of Science and TechnologyShenzhenChina
- Academy for Advanced Interdisciplinary StudiesSouthern University of Science and TechnologyShenzhenChina
| |
Collapse
|
12
|
Deep learning in bioinformatics: Introduction, application, and perspective in the big data era. Methods 2019; 166:4-21. [PMID: 31022451 DOI: 10.1016/j.ymeth.2019.04.008] [Citation(s) in RCA: 125] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 03/23/2019] [Accepted: 04/15/2019] [Indexed: 12/13/2022] Open
Abstract
Deep learning, which is especially formidable in handling big data, has achieved great success in various fields, including bioinformatics. With the advances of the big data era in biology, it is foreseeable that deep learning will become increasingly important in the field and will be incorporated in vast majorities of analysis pipelines. In this review, we provide both the exoteric introduction of deep learning, and concrete examples and implementations of its representative applications in bioinformatics. We start from the recent achievements of deep learning in the bioinformatics field, pointing out the problems which are suitable to use deep learning. After that, we introduce deep learning in an easy-to-understand fashion, from shallow neural networks to legendary convolutional neural networks, legendary recurrent neural networks, graph neural networks, generative adversarial networks, variational autoencoder, and the most recent state-of-the-art architectures. After that, we provide eight examples, covering five bioinformatics research directions and all the four kinds of data type, with the implementation written in Tensorflow and Keras. Finally, we discuss the common issues, such as overfitting and interpretability, that users will encounter when adopting deep learning methods and provide corresponding suggestions. The implementations are freely available at https://github.com/lykaust15/Deep_learning_examples.
Collapse
|
13
|
Identification and Characterization of Transcripts Regulated by Circadian Alternative Polyadenylation in Mouse Liver. G3-GENES GENOMES GENETICS 2018; 8:3539-3548. [PMID: 30181259 PMCID: PMC6222568 DOI: 10.1534/g3.118.200559] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Dynamic control of gene expression is a hallmark of the circadian system. In mouse liver, approximately 5–20% of RNAs are expressed rhythmically, and over 50% of mouse genes are rhythmically expressed in at least one tissue. Recent genome-wide analyses unveiled that, in addition to rhythmic transcription, various post-transcriptional mechanisms play crucial roles in driving rhythmic gene expression. Alternative polyadenylation (APA) is an emerging post-transcriptional mechanism that changes the 3′-ends of transcripts by alternating poly(A) site usage. APA can thus result in changes in RNA processing, such as mRNA localization, stability, translation efficiency, and sometimes even in the localization of the encoded protein. It remains unclear, however, if and how APA is regulated by the circadian clock. To address this, we used an in silico approach and demonstrated in mouse liver that 57.4% of expressed genes undergo APA and each gene has 2.53 poly(A) sites on average. Among all expressed genes, 2.9% of genes alternate their poly(A) site usage with a circadian (i.e., approximately 24 hr) period. APA transcripts use distal sites with canonical poly(A) signals (PASs) more frequently; however, circadian APA transcripts exhibit less distinct usage preference between proximal and distal sites and use proximal sites more frequently. Circadian APA transcripts also harbor longer 3′UTRs, making them more susceptible to post-transcriptional regulation. Overall, our study serves as a platform to ultimately understand the mechanisms of circadian APA regulation.
Collapse
|
14
|
Wang R, Zheng D, Yehia G, Tian B. A compendium of conserved cleavage and polyadenylation events in mammalian genes. Genome Res 2018; 28:1427-1441. [PMID: 30143597 PMCID: PMC6169888 DOI: 10.1101/gr.237826.118] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2018] [Accepted: 08/08/2018] [Indexed: 12/22/2022]
Abstract
Cleavage and polyadenylation is essential for 3' end processing of almost all eukaryotic mRNAs. Recent studies have shown widespread alternative cleavage and polyadenylation (APA) events leading to mRNA isoforms with different 3' UTRs and/or coding sequences. Here, we present a compendium of conserved cleavage and polyadenylation sites (PASs) in mammalian genes, based on approximately 1.2 billion 3' end sequencing reads from more than 360 human, mouse, and rat samples. We show that ∼80% of mammalian mRNA genes contain at least one conserved PAS, and ∼50% have conserved APA events. PAS conservation generally reduces promiscuous 3' end processing, stabilizing gene expression levels across species. Conservation of APA correlates with gene age, gene expression features, and gene functions. Genes with certain functions, such as cell morphology, cell proliferation, and mRNA metabolism, are particularly enriched with conserved APA events. Whereas tissue-specific genes typically have a low APA rate, brain-specific genes tend to evolve APA. In addition, we show enrichment of mRNA destabilizing motifs in alternative 3' UTR sequences, leading to substantial differences in mRNA stability between 3' UTR isoforms. Using conserved PASs, we reveal sequence motifs surrounding APA sites and a preference of adenosine at the cleavage site. Furthermore, we show that mutations of U-rich motifs around the PAS often accompany APA profile differences between species. Analysis of lncRNA PASs indicates a mechanism of PAS fixation through evolution of A-rich motifs. Taken together, our results present a comprehensive view of PAS evolution in mammals, and a phylogenic perspective on APA functions.
Collapse
Affiliation(s)
- Ruijia Wang
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, New Jersey 07103, USA
- Rutgers Cancer Institute of New Jersey, Newark, New Jersey 07103, USA
| | - Dinghai Zheng
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, New Jersey 07103, USA
- Rutgers Cancer Institute of New Jersey, Newark, New Jersey 07103, USA
| | - Ghassan Yehia
- Genome Editing Core Facility, Rutgers University, New Brunswick, New Jersey 08901, USA
| | - Bin Tian
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, New Jersey 07103, USA
- Rutgers Cancer Institute of New Jersey, Newark, New Jersey 07103, USA
| |
Collapse
|
15
|
Wang R, Zheng D, Yehia G, Tian B. A compendium of conserved cleavage and polyadenylation events in mammalian genes. Genome Res 2018. [PMID: 30143597 DOI: 10.1101/gr.237826.118.28] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2023]
Abstract
Cleavage and polyadenylation is essential for 3' end processing of almost all eukaryotic mRNAs. Recent studies have shown widespread alternative cleavage and polyadenylation (APA) events leading to mRNA isoforms with different 3' UTRs and/or coding sequences. Here, we present a compendium of conserved cleavage and polyadenylation sites (PASs) in mammalian genes, based on approximately 1.2 billion 3' end sequencing reads from more than 360 human, mouse, and rat samples. We show that ∼80% of mammalian mRNA genes contain at least one conserved PAS, and ∼50% have conserved APA events. PAS conservation generally reduces promiscuous 3' end processing, stabilizing gene expression levels across species. Conservation of APA correlates with gene age, gene expression features, and gene functions. Genes with certain functions, such as cell morphology, cell proliferation, and mRNA metabolism, are particularly enriched with conserved APA events. Whereas tissue-specific genes typically have a low APA rate, brain-specific genes tend to evolve APA. In addition, we show enrichment of mRNA destabilizing motifs in alternative 3' UTR sequences, leading to substantial differences in mRNA stability between 3' UTR isoforms. Using conserved PASs, we reveal sequence motifs surrounding APA sites and a preference of adenosine at the cleavage site. Furthermore, we show that mutations of U-rich motifs around the PAS often accompany APA profile differences between species. Analysis of lncRNA PASs indicates a mechanism of PAS fixation through evolution of A-rich motifs. Taken together, our results present a comprehensive view of PAS evolution in mammals, and a phylogenic perspective on APA functions.
Collapse
Affiliation(s)
- Ruijia Wang
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, New Jersey 07103, USA
- Rutgers Cancer Institute of New Jersey, Newark, New Jersey 07103, USA
| | - Dinghai Zheng
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, New Jersey 07103, USA
- Rutgers Cancer Institute of New Jersey, Newark, New Jersey 07103, USA
| | - Ghassan Yehia
- Genome Editing Core Facility, Rutgers University, New Brunswick, New Jersey 08901, USA
| | - Bin Tian
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, New Jersey 07103, USA
- Rutgers Cancer Institute of New Jersey, Newark, New Jersey 07103, USA
| |
Collapse
|
16
|
Schaefke B, Sun W, Li YS, Fang L, Chen W. The evolution of posttranscriptional regulation. WILEY INTERDISCIPLINARY REVIEWS-RNA 2018; 9:e1485. [PMID: 29851258 DOI: 10.1002/wrna.1485] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 04/23/2018] [Accepted: 04/26/2018] [Indexed: 12/13/2022]
Abstract
"DNA makes RNA makes protein." After transcription, mRNAs undergo a series of intertwining processes to be finally translated into functional proteins. The "posttranscriptional" regulation (PTR) provides cells an extended option to fine-tune their proteomes. To meet the demands of complex organism development and the appropriate response to environmental stimuli, every step in these processes needs to be finely regulated. Moreover, changes in these regulatory processes are important driving forces underlying the evolution of phenotypic differences across different species. The major PTR mechanisms discussed in this review include the regulation of splicing, polyadenylation, decay, and translation. For alternative splicing and polyadenylation, we mainly discuss their evolutionary dynamics and the genetic changes underlying the regulatory differences in cis-elements versus trans-factors. For mRNA decay and translation, which, together with transcription, determine the cellular RNA or protein abundance, we focus our discussion on how their divergence coordinates with transcriptional changes to shape the evolution of gene expression. Then to highlight the importance of PTR in the evolution of higher complexity, we focus on their roles in two major phenomena during eukaryotic evolution: the evolution of multicellularity and the division of labor between different cell types and tissues; and the emergence of diverse, often highly specialized individual phenotypes, especially those concerning behavior in eusocial insects. This article is categorized under: RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution Translation > Translation Regulation RNA Processing > Splicing Regulation/Alternative Splicing.
Collapse
Affiliation(s)
- Bernhard Schaefke
- Department of Biology, Southern University of Science and Technology, Shenzhen, China
| | - Wei Sun
- Department of Biology, Southern University of Science and Technology, Shenzhen, China.,Department of Pharmaceutical Chemistry and Cardiovascular Research Institute, University of California San Francisco, San Francisco
| | - Yi-Sheng Li
- Department of Biology, Southern University of Science and Technology, Shenzhen, China
| | - Liang Fang
- Department of Biology, Southern University of Science and Technology, Shenzhen, China.,Medi-X Institute, SUSTech Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen, China
| | - Wei Chen
- Department of Biology, Southern University of Science and Technology, Shenzhen, China.,Medi-X Institute, SUSTech Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
17
|
Xiao MS, Zhang B, Li YS, Gao Q, Sun W, Chen W. Global analysis of regulatory divergence in the evolution of mouse alternative polyadenylation. Mol Syst Biol 2016; 12:890. [PMID: 27932516 PMCID: PMC5199128 DOI: 10.15252/msb.20167375] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Alternative polyadenylation (APA), which is regulated by both cis‐elements and trans‐factors, plays an important role in post‐transcriptional regulation of eukaryotic gene expression. However, comparing to the extensively studied transcription and alternative splicing, the extent of APA divergence during evolution and the relative cis‐ and trans‐contribution remain largely unexplored. To directly address these questions for the first time in mammals, by using deep sequencing‐based methods, we measured APA divergence between C57BL/6J and SPRET/EiJ mouse strains as well as allele‐specific APA pattern in their F1 hybrids. Among the 24,721 polyadenylation sites (pAs) from 7,271 genes expressing multiple pAs, we identified 3,747 pAs showing significant divergence between the two strains. After integrating the allele‐specific data from F1 hybrids, we demonstrated that these events could be predominately attributed to cis‐regulatory effects. Further systematic sequence analysis of the regions in proximity to cis‐divergent pAs revealed that the local RNA secondary structure and a poly(U) tract in the upstream region could negatively modulate the pAs usage.
Collapse
Affiliation(s)
- Mei-Sheng Xiao
- Laboratory for Functional Genomics and Systems Biology, Berlin Institute for Medical Systems Biology, Berlin, Germany
| | - Bin Zhang
- Laboratory for Functional Genomics and Systems Biology, Berlin Institute for Medical Systems Biology, Berlin, Germany.,Department of Biology, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Yi-Sheng Li
- Laboratory for Functional Genomics and Systems Biology, Berlin Institute for Medical Systems Biology, Berlin, Germany
| | - Qingsong Gao
- Laboratory for Functional Genomics and Systems Biology, Berlin Institute for Medical Systems Biology, Berlin, Germany
| | - Wei Sun
- Laboratory for Functional Genomics and Systems Biology, Berlin Institute for Medical Systems Biology, Berlin, Germany.,Department of Biology, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Wei Chen
- Department of Biology, Southern University of Science and Technology, Shenzhen, Guangdong, China .,Medi-X Institute, SUSTech Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen, Guangdong, China
| |
Collapse
|