1
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024:10.1038/s12276-024-01243-w. [PMID: 38871816 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
2
|
Guo Y, Zhou D, Li P, Li C, Cao J. Context-Aware Poly(A) Signal Prediction Model via Deep Spatial-Temporal Neural Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8241-8253. [PMID: 37015693 DOI: 10.1109/tnnls.2022.3226301] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Polyadenylation [Poly(A)] is an essential process during messenger RNA (mRNA) maturation in biological eukaryote systems. Identifying Poly(A) signals (PASs) from the genome level is the key to understanding the mechanism of translation regulation and mRNA metabolism. In this work, we propose a deep dual-dynamic context-aware Poly(A) signal prediction model, called multiscale convolution with self-attention networks (MCANet), to adaptively uncover the spatial-temporal contextual dependence information. Specifically, the model automatically learns and strengthens informative features from the temporalwise and the spatialwise dimension. The identity connectivity performs contextual feature maps of Poly(A) data by direct connections from previous layers to subsequent layers. Then, a fully parametric rectified linear unit (FP-RELU) with dual-dynamic coefficients is devised to make the training of the model easier and enhance the generalization ability. A cross-entropy loss (CL) function is designed to make the model focus on samples that are easy to misclassify. Experiments on different Poly(A) signals demonstrate the superior performance of the proposed MCANet, and an ablation study shows the effectiveness of the network design for the feature learning and prediction of Poly(A) signals.
Collapse
|
3
|
Liu X, Chen H, Li Z, Yang X, Jin W, Wang Y, Zheng J, Li L, Xuan C, Yuan J, Yang Y. InPACT: a computational method for accurate characterization of intronic polyadenylation from RNA sequencing data. Nat Commun 2024; 15:2583. [PMID: 38519498 PMCID: PMC10960005 DOI: 10.1038/s41467-024-46875-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 03/12/2024] [Indexed: 03/25/2024] Open
Abstract
Alternative polyadenylation can occur in introns, termed intronic polyadenylation (IPA), has been implicated in diverse biological processes and diseases, as it can produce noncoding transcripts or transcripts with truncated coding regions. However, a reliable method is required to accurately characterize IPA. Here, we propose a computational method called InPACT, which allows for the precise characterization of IPA from conventional RNA-seq data. InPACT successfully identifies numerous previously unannotated IPA transcripts in human cells, many of which are translated, as evidenced by ribosome profiling data. We have demonstrated that InPACT outperforms other methods in terms of IPA identification and quantification. Moreover, InPACT applied to monocyte activation reveals temporally coordinated IPA events. Further application on single-cell RNA-seq data of human fetal bone marrow reveals the expression of several IPA isoforms in a context-specific manner. Therefore, InPACT represents a powerful tool for the accurate characterization of IPA from RNA-seq data.
Collapse
Affiliation(s)
- Xiaochuan Liu
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Hao Chen
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Zekun Li
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Xiaoxiao Yang
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Wen Jin
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Yuting Wang
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Jian Zheng
- Department of Immunology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Long Li
- Department of Immunology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Chenghao Xuan
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China.
| | - Jiapei Yuan
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, 300020, China.
- Tianjin Institutes of Health Science, Tianjin, 301600, China.
| | - Yang Yang
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China.
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China.
| |
Collapse
|
4
|
Bryce-Smith S, Brown AL, Mehta PR, Mattedi F, Mikheenko A, Barattucci S, Zanovello M, Dattilo D, Yome M, Hill SE, Qi YA, Wilkins OG, Sun K, Ryadnov E, Wan Y, Vargas JNS, Birsa N, Raj T, Humphrey J, Keuss M, Ward M, Secrier M, Fratta P. TDP-43 loss induces extensive cryptic polyadenylation in ALS/FTD. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.22.576625. [PMID: 38313254 PMCID: PMC10836071 DOI: 10.1101/2024.01.22.576625] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
Nuclear depletion and cytoplasmic aggregation of the RNA-binding protein TDP-43 is the hallmark of ALS, occurring in over 97% of cases. A key consequence of TDP-43 nuclear loss is the de-repression of cryptic exons. Whilst TDP-43 regulated cryptic splicing is increasingly well catalogued, cryptic alternative polyadenylation (APA) events, which define the 3' end of last exons, have been largely overlooked, especially when not associated with novel upstream splice junctions. We developed a novel bioinformatic approach to reliably identify distinct APA event types: alternative last exons (ALE), 3'UTR extensions (3'Ext) and intronic polyadenylation (IPA) events. We identified novel neuronal cryptic APA sites induced by TDP-43 loss of function by systematically applying our pipeline to a compendium of publicly available and in house datasets. We find that TDP-43 binding sites and target motifs are enriched at these cryptic events and that TDP-43 can have both repressive and enhancing action on APA. Importantly, all categories of cryptic APA can also be identified in ALS and FTD post mortem brain regions with TDP-43 proteinopathy underlining their potential disease relevance. RNA-seq and Ribo-seq analyses indicate that distinct cryptic APA categories have different downstream effects on transcript and translation. Intriguingly, cryptic 3'Exts occur in multiple transcription factors, such as ELK1, SIX3, and TLX1, and lead to an increase in wild-type protein levels and function. Finally, we show that an increase in RNA stability leading to a higher cytoplasmic localisation underlies these observations. In summary, we demonstrate that TDP-43 nuclear depletion induces a novel category of cryptic RNA processing events and we expand the palette of TDP-43 loss consequences by showing this can also lead to an increase in normal protein translation.
Collapse
Affiliation(s)
- Sam Bryce-Smith
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Anna-Leigh Brown
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Puja R. Mehta
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Francesca Mattedi
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Alla Mikheenko
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Simone Barattucci
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Matteo Zanovello
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Dario Dattilo
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Matthew Yome
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Sarah E. Hill
- National Institute of Neurological Disorders and Stroke, NIH, Bethesda, MD, USA
| | - Yue A. Qi
- National Institute of Neurological Disorders and Stroke, NIH, Bethesda, MD, USA
| | - Oscar G. Wilkins
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
- The Francis Crick Institute, London, UK
| | - Kai Sun
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Eugeni Ryadnov
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Yixuan Wan
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | | | - Jose Norberto S. Vargas
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Nicol Birsa
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Towfique Raj
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences & Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jack Humphrey
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences & Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Matthew Keuss
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Michael Ward
- National Institute of Neurological Disorders and Stroke, NIH, Bethesda, MD, USA
| | - Maria Secrier
- UCL Genetics Institute, Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Pietro Fratta
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
- The Francis Crick Institute, London, UK
| |
Collapse
|
5
|
Bryce-Smith S, Burri D, Gazzara MR, Herrmann CJ, Danecka W, Fitzsimmons CM, Wan YK, Zhuang F, Fansler MM, Fernández JM, Ferret M, Gonzalez-Uriarte A, Haynes S, Herdman C, Kanitz A, Katsantoni M, Marini F, McDonnel E, Nicolet B, Poon CL, Rot G, Schärfen L, Wu PJ, Yoon Y, Barash Y, Zavolan M. Extensible benchmarking of methods that identify and quantify polyadenylation sites from RNA-seq data. RNA (NEW YORK, N.Y.) 2023; 29:1839-1855. [PMID: 37816550 PMCID: PMC10653393 DOI: 10.1261/rna.079849.123] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 09/21/2023] [Indexed: 10/12/2023]
Abstract
The tremendous rate with which data is generated and analysis methods emerge makes it increasingly difficult to keep track of their domain of applicability, assumptions, limitations, and consequently, of the efficacy and precision with which they solve specific tasks. Therefore, there is an increasing need for benchmarks, and for the provision of infrastructure for continuous method evaluation. APAeval is an international community effort, organized by the RNA Society in 2021, to benchmark tools for the identification and quantification of the usage of alternative polyadenylation (APA) sites from short-read, bulk RNA-sequencing (RNA-seq) data. Here, we reviewed 17 tools and benchmarked eight on their ability to perform APA identification and quantification, using a comprehensive set of RNA-seq experiments comprising real, synthetic, and matched 3'-end sequencing data. To support continuous benchmarking, we have incorporated the results into the OpenEBench online platform, which allows for continuous extension of the set of methods, metrics, and challenges. We envisage that our analyses will assist researchers in selecting the appropriate tools for their studies, while the containers and reproducible workflows could easily be deployed and extended to evaluate new methods or data sets.
Collapse
Affiliation(s)
- Sam Bryce-Smith
- Department of Neuromuscular Diseases, UCL Queen Square Motor Neuron Disease Centre, UCL Queen Square Institute of Neurology, UCL, London WC1N 3BG, United Kingdom
| | - Dominik Burri
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Matthew R Gazzara
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Christina J Herrmann
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Weronika Danecka
- Institute for Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3FF, United Kingdom
| | - Christina M Fitzsimmons
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Yuk Kei Wan
- Genome Institute of Singapore, Buona Vista, Singapore 138672
- Yong Loo Lin School of Medicine, National University of Singapore, Kent Ridge, Singapore 119228
| | - Farica Zhuang
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Mervin M Fansler
- Tri-Institutional Program in Computational Biology and Medicine, Weill Cornell Graduate Studies, New York, New York 10065, USA
- Cancer Biology and Genetics, Sloan-Kettering Institute, MSKCC, New York, New York 10065, USA
| | - José M Fernández
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), 28029 Madrid, Spain
| | - Meritxell Ferret
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), 28029 Madrid, Spain
| | - Asier Gonzalez-Uriarte
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), 28029 Madrid, Spain
| | - Samuel Haynes
- Institute for Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3FF, United Kingdom
| | - Chelsea Herdman
- Department of Neurobiology, University of Utah, Salt Lake City, Utah 84132, USA
| | - Alexander Kanitz
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Maria Katsantoni
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Federico Marini
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg-University Mainz, 55118 Mainz, Germany
| | - Euan McDonnel
- Leeds Institute for Data Analytics, School of Molecular and Cellular Biology, University of Leeds, Leeds LS2 9NL, United Kingdom
| | - Ben Nicolet
- Department of Hematopoiesis, Sanquin Research, Landsteiner Laboratory, Amsterdam UMC, University of Amsterdam, 1066 CX Amsterdam, The Netherlands
- Oncode Institute, 3521 AL Utrecht, The Netherlands
| | - Chi-Lam Poon
- Graduate School of Medical Sciences, Weill Cornell Medicine, New York, New York 10065, USA
| | - Gregor Rot
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Institute of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
| | - Leonard Schärfen
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Pin-Jou Wu
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, 72076 Tübingen, Germany
| | - Yoseop Yoon
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California Irvine, Irvine, California 92617, USA
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Mihaela Zavolan
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
6
|
Bryce-Smith S, Burri D, Gazzara MR, Herrmann CJ, Danecka W, Fitzsimmons CM, Wan YK, Zhuang F, Fansler MM, Fernández JM, Ferret M, Gonzalez-Uriarte A, Haynes S, Herdman C, Kanitz A, Katsantoni M, Marini F, McDonnel E, Nicolet B, Poon CL, Rot G, Schärfen L, Wu PJ, Yoon Y, Barash Y, Zavolan M. Extensible benchmarking of methods that identify and quantify polyadenylation sites from RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.23.546284. [PMID: 37425672 PMCID: PMC10327023 DOI: 10.1101/2023.06.23.546284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
The tremendous rate with which data is generated and analysis methods emerge makes it increasingly difficult to keep track of their domain of applicability, assumptions, and limitations and consequently, of the efficacy and precision with which they solve specific tasks. Therefore, there is an increasing need for benchmarks, and for the provision of infrastructure for continuous method evaluation. APAeval is an international community effort, organized by the RNA Society in 2021, to benchmark tools for the identification and quantification of the usage of alternative polyadenylation (APA) sites from short-read, bulk RNA-sequencing (RNA-seq) data. Here, we reviewed 17 tools and benchmarked eight on their ability to perform APA identification and quantification, using a comprehensive set of RNA-seq experiments comprising real, synthetic, and matched 3'-end sequencing data. To support continuous benchmarking, we have incorporated the results into the OpenEBench online platform, which allows for seamless extension of the set of methods, metrics, and challenges. We envisage that our analyses will assist researchers in selecting the appropriate tools for their studies. Furthermore, the containers and reproducible workflows generated in the course of this project can be seamlessly deployed and extended in the future to evaluate new methods or datasets.
Collapse
Affiliation(s)
- Sam Bryce-Smith
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Dominik Burri
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Matthew R. Gazzara
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Christina J. Herrmann
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Weronika Danecka
- Institute for Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh, United Kingdom
| | - Christina M. Fitzsimmons
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Yuk Kei Wan
- Genome Institute of Singapore, Buona Vista, Singapore
- National University of Singapore, Kent Ridge, Singapore
| | - Farica Zhuang
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, USA
| | - Mervin M. Fansler
- Tri-Institutional Program in Computational Biology and Medicine, Weill Cornell GraduateStudies, New York, NY, USA
- Cancer Biology and Genetics, Sloan-Kettering Institute, MSKCC, New York, NY, USA
| | - José M. Fernández
- Barcelona Supercomputing Center, Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES)
| | - Meritxell Ferret
- Barcelona Supercomputing Center, Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES)
| | - Asier Gonzalez-Uriarte
- Barcelona Supercomputing Center, Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES)
| | - Samuel Haynes
- Institute for Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh, United Kingdom
| | | | - Alexander Kanitz
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Maria Katsantoni
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Federico Marini
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI) - UniversityMedical Center of the Johannes Gutenberg, University Mainz, Germany
| | - Euan McDonnel
- Leeds Institute for Data Analytics, School of Molecular and Cellular Biology, University of Leeds, United Kingdom
| | - Ben Nicolet
- Department of Hematopoiesis, Sanquin Research, Landsteiner Laboratory, AmsterdamUMC, University of Amsterdam, and Oncode Institute, Amsterdam, The Netherlands
| | | | - Gregor Rot
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Life Sciences, Zurich, Switzerland
| | - Leonard Schärfen
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven CT, USA
| | - Pin-Jou Wu
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, Germany
| | - Yoseop Yoon
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California Irvine, Irvine, California, USA
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, USA
| | - Mihaela Zavolan
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
7
|
Ko Y, Chun J, Yang H, Kim D. Hypoviral-regulated HSP90 co-chaperone p23 (CpCop23) determines the colony morphology, virulence, and viral response of chestnut blight fungus Cryphonectria parasitica. MOLECULAR PLANT PATHOLOGY 2023; 24:413-424. [PMID: 36762926 PMCID: PMC10098053 DOI: 10.1111/mpp.13308] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 01/20/2023] [Accepted: 01/22/2023] [Indexed: 05/03/2023]
Abstract
We previously identified a protein spot that showed down-regulation in the presence of Cryphonectria hypovirus 1 (CHV1) and tannic acid supplementation as a Hsp90 co-chaperone p23 gene (CpCop23). The CpCop23-null mutant strain showed retarded growth with less aerial mycelia and intense pigmentation. Conidia of the CpCop23-null mutant were significantly decreased and their viability was dramatically diminished. The CpCop23-null mutant showed hypersensitivity to Hsp90 inhibitors. However, no differences in responsiveness were observed after exposure to other stressors such as temperature, reactive oxygen species, and high osmosis, the exception being cell wall-disturbing agents. A severe reduction in virulence was observed in the CpCop23-null mutant. Interestingly, viral transfer to the CpCop23-null mutant from CHV1-infected strain via anastomosis was more inefficient than a comparable transfer with the wild type as a result of decreased hyphal branching of the CpCop23-null mutant around the peripheral region, which resulted in less fusion of the hyphae. The CHV1-infected CpCop23-null mutant exhibited recovered mycelial growth with less pigmentation and sporulation. The CHV1-transfected CpCop23-null mutant demonstrated almost no virulence, that is, even less than that of the CHV1-infected wild type (UEP1), a further indication that reduced virulence of the mutant is not attributable exclusively to the retarded growth but rather is a function of the CpCop23 gene. Thus, this study indicates that CpCop23 plays a role in ensuring appropriate mycelial growth and development, spore viability, responses to antifungal drugs, and fungal virulence. Moreover, the CpCop23 gene acts as a host factor that affects CHV1-infected fungal growth and maintains viral symptom development.
Collapse
Affiliation(s)
- Yo‐Han Ko
- Department of Molecular Biology, Department of Bioactive Material Sciences, Institute for Molecular Biology and GeneticsJeonbuk National UniversityJeonjuSouth Korea
| | - Jeesun Chun
- Department of Molecular Biology, Department of Bioactive Material Sciences, Institute for Molecular Biology and GeneticsJeonbuk National UniversityJeonjuSouth Korea
| | - Han‐Eul Yang
- Department of Molecular Biology, Department of Bioactive Material Sciences, Institute for Molecular Biology and GeneticsJeonbuk National UniversityJeonjuSouth Korea
| | - Dae‐Hyuk Kim
- Department of Molecular Biology, Department of Bioactive Material Sciences, Institute for Molecular Biology and GeneticsJeonbuk National UniversityJeonjuSouth Korea
| |
Collapse
|
8
|
Shi L, Liu Z, Yang L, Fan W. Effects of oil pollution on soil microbial diversity in the Loess hilly areas, China. ANN MICROBIOL 2022. [DOI: 10.1186/s13213-022-01683-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Abstract
Purpose
Data support and theoretical basis for bioremediation and treatment of petroleum-contaminated soils in the Loess hills of Yan’an, northern Shaanxi.
Methods
The evolutionary characteristics of soil microbial diversity and community structure under different levels of oil pollution were studied by field sampling, indoor simulation experiments, and analyzed through assays, using the mine soils from Yan’an, Shaanxi Province, as the research object.
Results
Compared with clean soil, the microbial species in contaminated soil were significantly reduced, the dominant flora changed, and the flora capable of degrading petroleum pollutants increased significantly. The soil microbial diversity and community structure differed, although not significantly, between different pollution levels, but significantly from clean soil. In the uncontaminated soil (CK), the dominant soil microbial genera were mainly Pantoea, Sphingomonas, Thiothrix, and Nocardioides. The abundance of Pseudomonas, Pedobacter, Massilia, Nocardioides, and Acinetobacter in the soil increased after oil contamination, while Thiothrix, Sphingomonas, and Gemmatimonas decreased significantly.
Conclusions
After the soil was contaminated with petroleum, the microbial species in the soil decreased significantly, the dominant genera in the soil changed, and the relative abundance of bacteria groups capable of degrading petroleum pollutants increased. The genera that can degrade petroleum pollutants in the petroleum-contaminated soil in the study area mainly include Pseudomonas, Acinetobacter, Pedobacter, Acinetobacter, and Nocardioides, which provide a scientific basis for exploring It provides a scientific basis for exploring remediation methods suitable for petroleum-contaminated soil in this region.
Collapse
|
9
|
Guo Y, Shen H, Li W, Li C, Jin C. Deep Effective k-mer representation learning for polyadenylation signal prediction via co-occurrence embedding. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
10
|
Ye W, Lian Q, Ye C, Wu X. A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00121-8. [PMID: 36167284 PMCID: PMC10372920 DOI: 10.1016/j.gpb.2022.09.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 08/17/2022] [Accepted: 09/19/2022] [Indexed: 05/08/2023]
Abstract
Alternative polyadenylation (APA) plays important roles in modulating mRNA stability, translation, and subcellular localization, and contributes extensively to shaping eukaryotic transcriptome complexity and proteome diversity. Identification of poly(A) sites (pAs) on a genome-wide scale is a critical step toward understanding the underlying mechanism of APA-mediated gene regulation. A number of established computational tools have been proposed to predict pAs from diverse genomic data. Here we provided an exhaustive overview of computational approaches for predicting pAs from DNA sequences, bulk RNA sequencing (RNA-seq) data, and single-cell RNA sequencing (scRNA-seq) data. Particularly, we examined several representative tools using bulk RNA-seq and scRNA-seq data from peripheral blood mononuclear cells and put forward operable suggestions on how to assess the reliability of pAs predicted by different tools. We also proposed practical guidelines on choosing appropriate methods applicable to diverse scenarios. Moreover, we discussed in depth the challenges in improving the performance of pA prediction and benchmarking different methods. Additionally, we highlighted outstanding challenges and opportunities using new machine learning and integrative multi-omics techniques, and provided our perspective on how computational methodologies might evolve in the future for non-3' untranslated region, tissue-specific, cross-species, and single-cell pA prediction.
Collapse
Affiliation(s)
- Wenbin Ye
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
| | - Qiwei Lian
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China; Department of Automation, Xiamen University, Xiamen 361005, China
| | - Congting Ye
- Key Laboratory of the Coastal and Wetland Ecosystems, Ministry of Education, College of the Environment and Ecology, Xiamen University, Xiamen 361005, China
| | - Xiaohui Wu
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China.
| |
Collapse
|
11
|
Shi L, Liu Z, Yang L, Fan W. Effect of plant waste addition as exogenous nutrients on microbial remediation of petroleum-contaminated soil. ANN MICROBIOL 2022. [DOI: 10.1186/s13213-022-01679-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Abstract
Purpose
This study investigates the feasibility of bio-enhanced microbial remediation of petroleum-contaminated soil, and analyzes the effect of different plant wastes as exogenous stimulants on microbial remediation of petroleum-contaminated soil and the effect on soil microbial community structure, in order to guide the remediation of soil in long-term petroleum-contaminated areas with nutrient-poor soils.
Methods
The study was conducted in a representative oil extraction area in the Loess Hills, a typical ecologically fragile area in China. Through indoor simulated addition tests, combined with the determination of soil chemical and microbiological properties, the degradation efficiency of petroleum pollutants and the response characteristics of soil microbial community structure to the addition of different plant wastes in the area were comprehensively analyzed to obtain the optimal exogenous additive and explore the strengthening mechanism of plant wastes on microbial remediation of petroleum-contaminated soil.
Results
Compared with the naturally decaying petroleum-contaminated soil, the addition of plant waste increased the degradation rate of petroleum pollutants, that is, it strengthened the degradation power of indigenous degrading bacteria on petroleum pollutants, among which the highest degradation rate of petroleum pollutants was achieved when the exogenous additive was soybean straw; compared with the naturally decaying petroleum-contaminated soil, the addition of soybean straw and dead and fallen leaves of lemon mallow made the microbial species in the contaminated soil significantly reduced and the main dominant flora changed, but the flora capable of degrading petroleum pollutants increased significantly; the addition of exogenous nutrients had significant effects on soil microbial diversity and community structure.
Conclusions
Soybean straw can be added to the contaminated soil as the optimal exogenous organic nutrient system, which improves the physicochemical properties of the soil and gives a good living environment for indigenous microorganisms with the function of degrading petroleum pollutants, thus activating the indigenous degrading bacteria in the petroleum-contaminated soil and accelerating their growth and proliferation and new city metabolic activities, laying a foundation for further obtaining efficient, environmentally friendly and low-cost microbial enhanced remediation technology solutions. The foundation for further acquisition of efficient, environmentally friendly, and low-cost microbial-enhanced remediation technology solutions. It is important for improving soil remediation in areas with long-term oil contamination and nutrient-poor soils.
Collapse
|
12
|
Leveraging omic features with F3UTER enables identification of unannotated 3'UTRs for synaptic genes. Nat Commun 2022; 13:2270. [PMID: 35477703 PMCID: PMC9046390 DOI: 10.1038/s41467-022-30017-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 03/18/2022] [Indexed: 11/08/2022] Open
Abstract
There is growing evidence for the importance of 3' untranslated region (3'UTR) dependent regulatory processes. However, our current human 3'UTR catalogue is incomplete. Here, we develop a machine learning-based framework, leveraging both genomic and tissue-specific transcriptomic features to predict previously unannotated 3'UTRs. We identify unannotated 3'UTRs associated with 1,563 genes across 39 human tissues, with the greatest abundance found in the brain. These unannotated 3'UTRs are significantly enriched for RNA binding protein (RBP) motifs and exhibit high human lineage-specificity. We find that brain-specific unannotated 3'UTRs are enriched for the binding motifs of important neuronal RBPs such as TARDBP and RBFOX1, and their associated genes are involved in synaptic function. Our data is shared through an online resource F3UTER ( https://astx.shinyapps.io/F3UTER/ ). Overall, our data improves 3'UTR annotation and provides additional insights into the mRNA-RBP interactome in the human brain, with implications for our understanding of neurological and neurodevelopmental diseases.
Collapse
|
13
|
Context-aware dynamic neural computational models for accurate Poly(A) signal prediction. Neural Netw 2022; 152:287-299. [DOI: 10.1016/j.neunet.2022.04.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 03/03/2022] [Accepted: 04/22/2022] [Indexed: 11/21/2022]
|
14
|
Xu SM, Curry-Hyde A, Sytnyk V, Janitz M. RNA polyadenylation patterns in the human transcriptome. Gene 2022; 816:146133. [PMID: 34998928 DOI: 10.1016/j.gene.2021.146133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 12/03/2021] [Accepted: 12/20/2021] [Indexed: 11/30/2022]
Abstract
The eukaryotic transcriptome undergoes various post-transcriptional modifications which assists gene expression. Polyadenylation is a molecular process occurring at the 3'-end of the RNA molecule which involves the poly(A) polymerase attaching adenine monophosphate molecules in a chain-like fashion to assemble a poly(A) tail. Multiple RNA isoforms are produced with differing 3'-UTR and exonic compositions through alternative polyadenylation (APA) which enhances the diversification of alternatively spliced mRNA transcripts. To study polyadenylation patterns, novel methods have been developed using short-read and long-read sequencing technologies to analyse the 3'-ends of the transcript. Recent studies have identified unique polyadenylation patterns in different cellular functions, including oncogenic activity, which could prove valuable in the understanding of medical genetics, particularly in the discovery of biomarkers in diseased states. We present a review of current literature reporting on polyadenylation and the biological relevance in the mammalian transcriptome, with a focus on the human transcriptome. Additionally, we have explored the various methods available to detect polyadenylation patterns using second and third generation sequencing technologies.
Collapse
Affiliation(s)
- Si-Mei Xu
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Australia
| | - Ashton Curry-Hyde
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Australia
| | - Vladimir Sytnyk
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Australia; Paul-Flechsig-Institute for Brain Research, University of Leipzig, Leipzig, Germany.
| |
Collapse
|
15
|
Lusk R, Hoffman PL, Mahaffey S, Rosean S, Smith H, Silhavy J, Pravenec M, Tabakoff B, Saba LM. Beyond Genes: Inclusion of Alternative Splicing and Alternative Polyadenylation to Assess the Genetic Architecture of Predisposition to Voluntary Alcohol Consumption in Brain of the HXB/BXH Recombinant Inbred Rat Panel. Front Genet 2022; 13:821026. [PMID: 35368676 PMCID: PMC8965255 DOI: 10.3389/fgene.2022.821026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 02/10/2022] [Indexed: 12/02/2022] Open
Abstract
Post transcriptional modifications of RNA are powerful mechanisms by which eukaryotes expand their genetic diversity. For instance, researchers estimate that most transcripts in humans undergo alternative splicing and alternative polyadenylation. These splicing events produce distinct RNA molecules, which in turn yield distinct protein isoforms and/or influence RNA stability, translation, nuclear export, and RNA/protein cellular localization. Due to their pervasiveness and impact, we hypothesized that alternative splicing and alternative polyadenylation in brain can contribute to a predisposition for voluntary alcohol consumption. Using the HXB/BXH recombinant inbred rat panel (a subset of the Hybrid Rat Diversity Panel), we generated over one terabyte of brain RNA sequencing data (total RNA) and identified novel splice variants (via StringTie) and alternative polyadenylation sites (via aptardi) to determine the transcriptional landscape in the brains of these animals. After establishing an analysis pipeline to ascertain high quality transcripts, we quantitated transcripts and integrated genotype data to identify candidate transcript coexpression networks and individual candidate transcripts associated with predisposition to voluntary alcohol consumption in the two-bottle choice paradigm. For genes that were previously associated with this trait (e.g., Lrap, Ift81, and P2rx4) (Saba et al., Febs. J., 282, 3556–3578, Saba et al., Genes. Brain. Behav., 20, e12698), we were able to distinguish between transcript variants to provide further information about the specific isoforms related to the trait. We also identified additional candidate transcripts associated with the trait of voluntary alcohol consumption (i.e., isoforms of Mapkapk5, Aldh1a7, and Map3k7). Consistent with our previous work, our results indicate that transcripts and networks related to inflammation and the immune system in brain can be linked to voluntary alcohol consumption. Overall, we have established a pipeline for including the quantitation of alternative splicing and alternative polyadenylation variants in the transcriptome in the analysis of the relationship between the transcriptome and complex traits.
Collapse
Affiliation(s)
- Ryan Lusk
- Department of Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Paula L. Hoffman
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Spencer Mahaffey
- Department of Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Samuel Rosean
- Department of Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Harry Smith
- Department of Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Jan Silhavy
- Institute of Physiology of the Czech Academy of Sciences, Prague, Czechia
| | - Michal Pravenec
- Institute of Physiology of the Czech Academy of Sciences, Prague, Czechia
| | - Boris Tabakoff
- Department of Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Laura M. Saba
- Department of Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
- *Correspondence: Laura M. Saba,
| |
Collapse
|
16
|
Arora A, Goering R, Lo HYG, Lo J, Moffatt C, Taliaferro JM. The Role of Alternative Polyadenylation in the Regulation of Subcellular RNA Localization. Front Genet 2022; 12:818668. [PMID: 35096024 PMCID: PMC8795681 DOI: 10.3389/fgene.2021.818668] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 12/21/2021] [Indexed: 11/13/2022] Open
Abstract
Alternative polyadenylation (APA) is a widespread and conserved regulatory mechanism that generates diverse 3' ends on mRNA. APA patterns are often tissue specific and play an important role in cellular processes such as cell proliferation, differentiation, and response to stress. Many APA sites are found in 3' UTRs, generating mRNA isoforms with different 3' UTR contents. These alternate 3' UTR isoforms can change how the transcript is regulated, affecting its stability and translation. Since the subcellular localization of a transcript is often regulated by 3' UTR sequences, this implies that APA can also change transcript location. However, this connection between APA and RNA localization has only recently been explored. In this review, we discuss the role of APA in mRNA localization across distinct subcellular compartments. We also discuss current challenges and future advancements that will aid our understanding of how APA affects RNA localization and molecular mechanisms that drive these processes.
Collapse
Affiliation(s)
- Ankita Arora
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Raeann Goering
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
- RNA Bioscience Initiative, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Hei Yong G. Lo
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
- RNA Bioscience Initiative, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Joelle Lo
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Charlie Moffatt
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - J. Matthew Taliaferro
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
- RNA Bioscience Initiative, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| |
Collapse
|
17
|
Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence. Nat Commun 2021; 12:1652. [PMID: 33712618 PMCID: PMC7955126 DOI: 10.1038/s41467-021-21894-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 02/18/2021] [Indexed: 02/01/2023] Open
Abstract
Annotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Here, we introduce aptardi (alternative polyadenylation transcriptome analysis from RNA-Seq data and DNA sequence information), which leverages both DNA sequence and RNA sequencing in a machine learning paradigm to predict expressed polyadenylation sites. Specifically, as input aptardi takes DNA nucleotide sequence, genome-aligned RNA-Seq data, and an initial transcriptome. The program evaluates these initial transcripts to identify expressed polyadenylation sites in the biological sample and refines transcript 3'-ends accordingly. The average precision of the aptardi model is twice that of a standard transcriptome assembler. In particular, the recall of the aptardi model (the proportion of true polyadenylation sites detected by the algorithm) is improved by over three-fold. Also, the model-trained using the Human Brain Reference RNA commercial standard-performs well when applied to RNA-sequencing samples from different tissues and different mammalian species. Finally, aptardi's input is simple to compile and its output is easily amenable to downstream analyses such as quantitation and differential expression.
Collapse
|