1
|
Qiao Y, Yang R, Liu Y, Chen J, Zhao L, Huo P, Wang Z, Bu D, Wu Y, Zhao Y. DeepFusion: A deep bimodal information fusion network for unraveling protein-RNA interactions using in vivo RNA structures. Comput Struct Biotechnol J 2024; 23:617-625. [PMID: 38274994 PMCID: PMC10808905 DOI: 10.1016/j.csbj.2023.12.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 12/04/2023] [Accepted: 12/26/2023] [Indexed: 01/27/2024] Open
Abstract
RNA-binding proteins (RBPs) are key post-transcriptional regulators, and the malfunctions of RBP-RNA binding lead to diverse human diseases. However, prediction of RBP binding sites is largely based on RNA sequence features, whereas in vivo RNA structural features based on high-throughput sequencing are rarely incorporated. Here, we designed a deep bimodal information fusion network called DeepFusion for unraveling protein-RNA interactions by incorporating structural features derived from DMS-seq data. DeepFusion integrates two sub-models to extract local motif-like information and long-term context information. We show that DeepFusion performs best compared with other cutting-edge methods with only sequence inputs on two datasets. DeepFusion's performance is further improved with bimodal input after adding in vivo DMS-seq structural features. Furthermore, DeepFusion can be used for analyzing RNA degradation, demonstrating significantly different RBP-binding scores in genes with slow degradation rates versus those with rapid degradation rates. DeepFusion thus provides enhanced abilities for further analysis of functional RNAs. DeepFusion's code and data are available at http://bioinfo.org/deepfusion/.
Collapse
Affiliation(s)
- Yixuan Qiao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Rui Yang
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yang Liu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiaxin Chen
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Lianhe Zhao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Peipei Huo
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Zhihao Wang
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Dechao Bu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Yang Wu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Yi Zhao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
2
|
Lécuyer E, Sauvageau M, Kothe U, Unrau PJ, Damha MJ, Perreault J, Abou Elela S, Bayfield MA, Claycomb JM, Scott MS. Canada's contributions to RNA research: past, present, and future perspectives. Biochem Cell Biol 2024; 102:472-491. [PMID: 39320985 DOI: 10.1139/bcb-2024-0176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/27/2024] Open
Abstract
The field of RNA research has provided profound insights into the basic mechanisms modulating the function and adaption of biological systems. RNA has also been at the center stage in the development of transformative biotechnological and medical applications, perhaps most notably was the advent of mRNA vaccines that were critical in helping humanity through the Covid-19 pandemic. Unbeknownst to many, Canada boasts a diverse community of RNA scientists, spanning multiple disciplines and locations, whose cutting-edge research has established a rich track record of contributions across various aspects of RNA science over many decades. Through this position paper, we seek to highlight key contributions made by Canadian investigators to the RNA field, via both thematic and historical viewpoints. We also discuss initiatives underway to organize and enhance the impact of the Canadian RNA research community, particularly focusing on the creation of the not-for-profit organization RNA Canada ARN. Considering the strategic importance of RNA research in biology and medicine, and its considerable potential to help address major challenges facing humanity, sustained support of this sector will be critical to help Canadian scientists play key roles in the ongoing RNA revolution and the many benefits this could bring about to Canada.
Collapse
Affiliation(s)
- Eric Lécuyer
- Institut de Recherches Cliniques de Montréal (IRCM), Montréal, QC, Canada
- Département de Biochimie et de Médecine Moléculaire, Université de Montréal, Montréal, QC, Canada
- Division of Experimental Medicine, McGill University, Montréal, QC, Canada
| | - Martin Sauvageau
- Institut de Recherches Cliniques de Montréal (IRCM), Montréal, QC, Canada
- Département de Biochimie et de Médecine Moléculaire, Université de Montréal, Montréal, QC, Canada
- Department of Biochemistry, McGill University, Montréal, QC, Canada
| | - Ute Kothe
- Department of Chemistry, University of Manitoba, Winnipeg, MB, Canada
| | - Peter J Unrau
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Masad J Damha
- Department of Chemistry, McGill University, Montréal, QC, Canada
| | - Jonathan Perreault
- Centre Armand-Frappier Santé Biotechnologie, Institut National de la Recherche Scientifique (INRS), Laval, QC, Canada
| | - Sherif Abou Elela
- Département de Microbiologie et Infectiologie, Université de Sherbrooke, Sherbrooke, QC, Canada
| | | | - Julie M Claycomb
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Michelle S Scott
- Département de Biochimie et de Génomique Fonctionnelle, Université de Sherbrooke, Sherbrooke, QC, Canada
| |
Collapse
|
3
|
Vorontsov IE, Kozin I, Abramov S, Boytsov A, Jolma A, Albu M, Ambrosini G, Faltejskova K, Gralak AJ, Gryzunov N, Inukai S, Kolmykov S, Kravchenko P, Kribelbauer-Swietek JF, Laverty KU, Nozdrin V, Patel ZM, Penzar D, Plescher ML, Pour SE, Razavi R, Yang AWH, Yevshin I, Zinkevich A, Weirauch MT, Bucher P, Deplancke B, Fornes O, Grau J, Grosse I, Kolpakov FA, Makeev VJ, Hughes TR, Kulakovskiy IV. Cross-platform DNA motif discovery and benchmarking to explore binding specificities of poorly studied human transcription factors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.11.619379. [PMID: 39605530 PMCID: PMC11601219 DOI: 10.1101/2024.11.11.619379] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
A DNA sequence pattern, or "motif", is an essential representation of DNA-binding specificity of a transcription factor (TF). Any particular motif model has potential flaws due to shortcomings of the underlying experimental data and computational motif discovery algorithm. As a part of the Codebook/GRECO-BIT initiative, here we evaluated at large scale the cross-platform recognition performance of positional weight matrices (PWMs), which remain popular motif models in many practical applications. We applied ten different DNA motif discovery tools to generate PWMs from the "Codebook" data comprised of 4,237 experiments from five different platforms profiling the DNA-binding specificity of 394 human proteins, focusing on understudied transcription factors of different structural families. For many of the proteins, there was no prior knowledge of a genuine motif. By benchmarking-supported human curation, we constructed an approved subset of experiments comprising about 30% of all experiments and 50% of tested TFs which displayed consistent motifs across platforms and replicates. We present the Codebook Motif Explorer (https://mex.autosome.org), a detailed online catalog of DNA motifs, including the top-ranked PWMs, and the underlying source and benchmarking data. We demonstrate that in the case of high-quality experimental data, most of the popular motif discovery tools detect valid motifs and generate PWMs, which perform well both on genomic and synthetic data. Yet, for each of the algorithms, there were problematic combinations of proteins and platforms, and the basic motif properties such as nucleotide composition and information content offered little help in detecting such pitfalls. By combining multiple PMWs in decision trees, we demonstrate how our setup can be readily adapted to train and test binding specificity models more complex than PWMs. Overall, our study provides a rich motif catalog as a solid baseline for advanced models and highlights the power of the multi-platform multi-tool approach for reliable mapping of DNA binding specificities.
Collapse
Affiliation(s)
- Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Life Improvement by Future Technologies (LIFT) Center, 121205, Moscow, Russia
| | - Ivan Kozin
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia
| | - Sergey Abramov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Altius Institute for Biomedical Sciences, 98121, Seattle, WA, USA
| | - Alexandr Boytsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Altius Institute for Biomedical Sciences, 98121, Seattle, WA, USA
| | - Arttu Jolma
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Mihai Albu
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | | | - Katerina Faltejskova
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 160 00 Praha 6, Czech Republic
- Computer Science Institute, Faculty of Mathematics and Physics, Charles University, 118 00 Praha 1, Czech Republic
| | - Antoni J Gralak
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Nikita Gryzunov
- Life Improvement by Future Technologies (LIFT) Center, 121205, Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia
| | - Sachi Inukai
- Chugai Pharmaceutical Co., Ltd, Tokyo, 103-8324, Japan
| | - Semyon Kolmykov
- Department of Computational Biology, Sirius University of Science and Technology, 354340, Sirius, Krasnodar region, Russia
| | | | - Judith F Kribelbauer-Swietek
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Kaitlin U Laverty
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Vladimir Nozdrin
- Life Improvement by Future Technologies (LIFT) Center, 121205, Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia
| | - Zain M Patel
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Dmitry Penzar
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
| | - Marie-Luise Plescher
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, 06099, Halle, Germany
| | - Sara E Pour
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Rozita Razavi
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Ally W H Yang
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | | | - Arsenii Zinkevich
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia
| | | | - Philipp Bucher
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Oriol Fornes
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Jan Grau
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, 06099, Halle, Germany
| | - Ivo Grosse
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, 06099, Halle, Germany
| | - Fedor A Kolpakov
- Department of Computational Biology, Sirius University of Science and Technology, 354340, Sirius, Krasnodar region, Russia
- Bioinformatics Laboratory, Federal Research Center for Information and Computational Technologies, 630090, Novosibirsk, Russia
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Moscow Center for Advanced Studies, 123592, Moscow, Russia
| | - Timothy R Hughes
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Life Improvement by Future Technologies (LIFT) Center, 121205, Moscow, Russia
- Institute of Protein Research, Russian Academy of Sciences, 142290, Pushchino, Russia
| |
Collapse
|
4
|
Jolma A, Hernandez-Corchado A, Yang AW, Fathi A, Laverty KU, Brechalov A, Razavi R, Albu M, Zheng H, Kulakovskiy IV, Najafabadi HS, Hughes TR. GHT-SELEX demonstrates unexpectedly high intrinsic sequence specificity and complex DNA binding of many human transcription factors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.11.618478. [PMID: 39605368 PMCID: PMC11601218 DOI: 10.1101/2024.11.11.618478] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
A long-standing challenge in human regulatory genomics is that transcription factor (TF) DNA-binding motifs are short and degenerate, while the genome is large. Motif scans therefore produce many false-positive binding site predictions. By surveying 179 TFs across 25 families using >1,500 cyclic in vitro selection experiments with fragmented, naked, and unmodified genomic DNA - a method we term GHT-SELEX (Genomic HT-SELEX) - we find that many human TFs possess much higher sequence specificity than anticipated. Moreover, genomic binding regions from GHT-SELEX are often surprisingly similar to those obtained in vivo (i.e. ChIP-seq peaks). We find that comparable specificity can also be obtained from motif scans, but performance is highly dependent on derivation and use of the motifs, including accounting for multiple local matches in the scans. We also observe alternative engagement of multiple DNA-binding domains within the same protein: long C2H2 zinc finger proteins often utilize modular DNA recognition, engaging different subsets of their DNA binding domain (DBD) arrays to recognize multiple types of distinct target sites, frequently evolving via internal duplication and divergence of one or more DBDs. Thus, contrary to conventional wisdom, it is common for TFs to possess sufficient intrinsic specificity to independently delineate cellular targets.
Collapse
Affiliation(s)
- Arttu Jolma
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Aldo Hernandez-Corchado
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada
- Victor P. Dahdaleh Institute of Genomic Medicine, Montréal, QC H3A 0G1, Canada
| | - Ally W.H. Yang
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Ali Fathi
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Kaitlin U. Laverty
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | | | - Rozita Razavi
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Mihai Albu
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Hong Zheng
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | | | - Ivan V. Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia and Institute of Protein Research, Russian Academy of Sciences, 142290, Pushchino, Russia
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
| | - Hamed S. Najafabadi
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada
- Victor P. Dahdaleh Institute of Genomic Medicine, Montréal, QC H3A 0G1, Canada
| | - Timothy R. Hughes
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| |
Collapse
|
5
|
Sasse A, Ray D, Laverty KU, Tam CL, Albu M, Zheng H, Lyudovyk O, Dalal T, Nie K, Magis C, Notredame C, Weirauch MT, Hughes TR, Morris Q. Reconstructing the sequence specificities of RNA-binding proteins across eukaryotes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.15.618476. [PMID: 39464061 PMCID: PMC11507768 DOI: 10.1101/2024.10.15.618476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
RNA-binding proteins (RBPs) are key regulators of gene expression. Here, we introduce EuPRI (Eukaryotic Protein-RNA Interactions) - a freely available resource of RNA motifs for 34,736 RBPs from 690 eukaryotes. EuPRI includes in vitro binding data for 504 RBPs, including newly collected RNAcompete data for 174 RBPs, along with thousands of reconstructed motifs. We reconstruct these motifs with a new computational platform - Joint Protein-Ligand Embedding (JPLE) - which can detect distant homology relationships and map specificity-determining peptides. EuPRI quadruples the number of known RBP motifs, expanding the motif repertoire across all major eukaryotic clades, and assigning motifs to the majority of human RBPs. EuPRI drastically improves knowledge of RBP motifs in flowering plants. For example, it increases the number of Arabidopsis thaliana RBP motifs 7-fold, from 14 to 105. EuPRI also has broad utility for inferring post-transcriptional function and evolutionary relationships. We demonstrate this by predicting a role for 12 Arabidopsis thaliana RBPs in RNA stability and identifying rapid and recent evolution of post-transcriptional regulatory networks in worms and plants. In contrast, the vertebrate RNA motif set has remained relatively stable after its drastic expansion between the metazoan and vertebrate ancestors. EuPRI represents a powerful resource for the study of gene regulation across eukaryotes.
Collapse
Affiliation(s)
- Alexander Sasse
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
- Donnelly Centre, University of Toronto, Toronto, ON Canada
- Department of Computer Science, University of Washington, Seattle, WA, USA
- Vector Institute, Toronto, ON Canada
| | - Debashish Ray
- Donnelly Centre, University of Toronto, Toronto, ON Canada
| | - Kaitlin U Laverty
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
- Donnelly Centre, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Cyrus L Tam
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
| | - Mihai Albu
- Donnelly Centre, University of Toronto, Toronto, ON Canada
| | - Hong Zheng
- Donnelly Centre, University of Toronto, Toronto, ON Canada
| | - Olga Lyudovyk
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
| | - Taykhoom Dalal
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
| | - Kate Nie
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
- Donnelly Centre, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
| | - Cedrik Magis
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Cedric Notredame
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Matthew T Weirauch
- Center for Autoimmune Genomics and Etiology, Divisions of Allergy & Immunology, Human Genetics, Biomedical Informatics and Developmental Biology, Cincinnati Children's Hospital, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
- Donnelly Centre, University of Toronto, Toronto, ON Canada
| | - Quaid Morris
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
- Donnelly Centre, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| |
Collapse
|
6
|
Hawkins S, Mondaini A, Namboori SC, Nguyen GG, Yeo GW, Javed A, Bhinge A. ePRINT: exonuclease assisted mapping of protein-RNA interactions. Genome Biol 2024; 25:140. [PMID: 38807229 PMCID: PMC11134894 DOI: 10.1186/s13059-024-03271-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 05/09/2024] [Indexed: 05/30/2024] Open
Abstract
RNA-binding proteins (RBPs) regulate key aspects of RNA processing including alternative splicing, mRNA degradation and localization by physically binding RNA molecules. Current methods to map these interactions, such as CLIP, rely on purifying single proteins at a time. Our new method, ePRINT, maps RBP-RNA interaction networks on a global scale without purifying individual RBPs. ePRINT uses exoribonuclease XRN1 to precisely map the 5' end of the RBP binding site and uncovers direct and indirect targets of an RBP of interest. Importantly, ePRINT can also uncover RBPs that are differentially activated between cell fate transitions, including neural progenitor differentiation into neurons.
Collapse
Affiliation(s)
- Sophie Hawkins
- College of Medicine and Health, University of Exeter, Exeter, EX1 2LU, UK
- Living Systems Institute, University of Exeter, Exeter, EX4 4QD, UK
| | - Alexandre Mondaini
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Seema C Namboori
- College of Medicine and Health, University of Exeter, Exeter, EX1 2LU, UK
- Living Systems Institute, University of Exeter, Exeter, EX4 4QD, UK
| | - Grady G Nguyen
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Center for RNA Technologies and Therapeutics, UC San Diego, La Jolla, CA, USA
| | - Gene W Yeo
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Center for RNA Technologies and Therapeutics, UC San Diego, La Jolla, CA, USA
| | - Asif Javed
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
| | - Akshay Bhinge
- College of Medicine and Health, University of Exeter, Exeter, EX1 2LU, UK.
- Living Systems Institute, University of Exeter, Exeter, EX4 4QD, UK.
| |
Collapse
|
7
|
Rennie S. Deep Learning for Elucidating Modifications to RNA-Status and Challenges Ahead. Genes (Basel) 2024; 15:629. [PMID: 38790258 PMCID: PMC11121098 DOI: 10.3390/genes15050629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/11/2024] [Accepted: 05/11/2024] [Indexed: 05/26/2024] Open
Abstract
RNA-binding proteins and chemical modifications to RNA play vital roles in the co- and post-transcriptional regulation of genes. In order to fully decipher their biological roles, it is an essential task to catalogue their precise target locations along with their preferred contexts and sequence-based determinants. Recently, deep learning approaches have significantly advanced in this field. These methods can predict the presence or absence of modification at specific genomic regions based on diverse features, particularly sequence and secondary structure, allowing us to decipher the highly non-linear sequence patterns and structures that underlie site preferences. This article provides an overview of how deep learning is being applied to this area, with a particular focus on the problem of mRNA-RBP binding, while also considering other types of chemical modification to RNA. It discusses how different types of model can handle sequence-based and/or secondary-structure-based inputs, the process of model training, including choice of negative regions and separating sets for testing and training, and offers recommendations for developing biologically relevant models. Finally, it highlights four key areas that are crucial for advancing the field.
Collapse
Affiliation(s)
- Sarah Rennie
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| |
Collapse
|
8
|
Lou LL, Qiu WR, Liu Z, Xu ZC, Xiao X, Huang SF. Stacking-ac4C: an ensemble model using mixed features for identifying n4-acetylcytidine in mRNA. Front Immunol 2023; 14:1267755. [PMID: 38094296 PMCID: PMC10716444 DOI: 10.3389/fimmu.2023.1267755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 11/14/2023] [Indexed: 12/18/2023] Open
Abstract
N4-acetylcytidine (ac4C) is a modification of cytidine at the nitrogen-4 position, playing a significant role in the translation process of mRNA. However, the precise mechanism and details of how ac4C modifies translated mRNA remain unclear. Since identifying ac4C sites using conventional experimental methods is both labor-intensive and time-consuming, there is an urgent need for a method that can promptly recognize ac4C sites. In this paper, we propose a comprehensive ensemble learning model, the Stacking-based heterogeneous integrated ac4C model, engineered explicitly to identify ac4C sites. This innovative model integrates three distinct feature extraction methodologies: Kmer, electron-ion interaction pseudo-potential values (PseEIIP), and pseudo-K-tuple nucleotide composition (PseKNC). The model also incorporates the robust Cluster Centroids algorithm to enhance its performance in dealing with imbalanced data and alleviate underfitting issues. Our independent testing experiments indicate that our proposed model improves the Mcc by 15.61% and the ROC by 5.97% compared to existing models. To test our model's adaptability, we also utilized a balanced dataset assembled by the authors of iRNA-ac4C. Our model showed an increase in Sn of 4.1%, an increase in Acc of nearly 1%, and ROC improvement of 0.35% on this balanced dataset. The code for our model is freely accessible at https://github.com/louliliang/ST-ac4C.git, allowing users to quickly build their model without dealing with complicated mathematical equations.
Collapse
Affiliation(s)
- Li-Liang Lou
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Wang-Ren Qiu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Zi Liu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Zhao-Chun Xu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Shun-Fa Huang
- School of Information Engineering , Jingdezhen University, Jingdezhen, China
| |
Collapse
|
9
|
Zhu H, Yang Y, Wang Y, Wang F, Huang Y, Chang Y, Wong KC, Li X. Dynamic characterization and interpretation for protein-RNA interactions across diverse cellular conditions using HDRNet. Nat Commun 2023; 14:6824. [PMID: 37884495 PMCID: PMC10603054 DOI: 10.1038/s41467-023-42547-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 10/13/2023] [Indexed: 10/28/2023] Open
Abstract
RNA-binding proteins play crucial roles in the regulation of gene expression, and understanding the interactions between RNAs and RBPs in distinct cellular conditions forms the basis for comprehending the underlying RNA function. However, current computational methods pose challenges to the cross-prediction of RNA-protein binding events across diverse cell lines and tissue contexts. Here, we develop HDRNet, an end-to-end deep learning-based framework to precisely predict dynamic RBP binding events under diverse cellular conditions. Our results demonstrate that HDRNet can accurately and efficiently identify binding sites, particularly for dynamic prediction, outperforming other state-of-the-art models on 261 linear RNA datasets from both eCLIP and CLIP-seq, supplemented with additional tissue data. Moreover, we conduct motif and interpretation analyses to provide fresh insights into the pathological mechanisms underlying RNA-RBP interactions from various perspectives. Our functional genomic analysis further explores the gene-human disease associations, uncovering previously uncharacterized observations for a broad range of genetic disorders.
Collapse
Affiliation(s)
- Haoran Zhu
- School of Artificial Intelligence, Jilin University, 130012, Changchun, China
| | - Yuning Yang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Yunhe Wang
- School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong, Hong Kong SAR
| | - Yujian Huang
- College of Computer Science and Cyber Security, Chengdu University of Technology, 610059, Chengdu, China
| | - Yi Chang
- School of Artificial Intelligence, Jilin University, 130012, Changchun, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong, Hong Kong SAR.
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, 130012, Changchun, China.
| |
Collapse
|
10
|
Ray D, Laverty KU, Jolma A, Nie K, Samson R, Pour SE, Tam CL, von Krosigk N, Nabeel-Shah S, Albu M, Zheng H, Perron G, Lee H, Najafabadi H, Blencowe B, Greenblatt J, Morris Q, Hughes TR. RNA-binding proteins that lack canonical RNA-binding domains are rarely sequence-specific. Sci Rep 2023; 13:5238. [PMID: 37002329 PMCID: PMC10066285 DOI: 10.1038/s41598-023-32245-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 03/23/2023] [Indexed: 04/03/2023] Open
Abstract
Thousands of RNA-binding proteins (RBPs) crosslink to cellular mRNA. Among these are numerous unconventional RBPs (ucRBPs)-proteins that associate with RNA but lack known RNA-binding domains (RBDs). The vast majority of ucRBPs have uncharacterized RNA-binding specificities. We analyzed 492 human ucRBPs for intrinsic RNA-binding in vitro and identified 23 that bind specific RNA sequences. Most (17/23), including 8 ribosomal proteins, were previously associated with RNA-related function. We identified the RBDs responsible for sequence-specific RNA-binding for several of these 23 ucRBPs and surveyed whether corresponding domains from homologous proteins also display RNA sequence specificity. CCHC-zf domains from seven human proteins recognized specific RNA motifs, indicating that this is a major class of RBD. For Nudix, HABP4, TPR, RanBP2-zf, and L7Ae domains, however, only isolated members or closely related homologs yielded motifs, consistent with RNA-binding as a derived function. The lack of sequence specificity for most ucRBPs is striking, and we suggest that many may function analogously to chromatin factors, which often crosslink efficiently to cellular DNA, presumably via indirect recruitment. Finally, we show that ucRBPs tend to be highly abundant proteins and suggest their identification in RNA interactome capture studies could also result from weak nonspecific interactions with RNA.
Collapse
Affiliation(s)
- Debashish Ray
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Kaitlin U Laverty
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Arttu Jolma
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Kate Nie
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Reuben Samson
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Sara E Pour
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Cyrus L Tam
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Niklas von Krosigk
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Syed Nabeel-Shah
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Mihai Albu
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Hong Zheng
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Gabrielle Perron
- Department of Human Genetics, McGill University, Montréal, QC, H3A 0C7, Canada
- McGill Genome Centre, Montréal, QC, H3A 0G1, Canada
| | - Hyunmin Lee
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Hamed Najafabadi
- Department of Human Genetics, McGill University, Montréal, QC, H3A 0C7, Canada
- McGill Genome Centre, Montréal, QC, H3A 0G1, Canada
| | - Benjamin Blencowe
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Jack Greenblatt
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Quaid Morris
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada.
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY, USA.
| | - Timothy R Hughes
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada.
| |
Collapse
|
11
|
Wu Z, Basu S, Wu X, Kurgan L. qNABpredict: Quick, accurate, and taxonomy-aware sequence-based prediction of content of nucleic acid binding amino acids. Protein Sci 2023; 32:e4544. [PMID: 36519304 PMCID: PMC9798252 DOI: 10.1002/pro.4544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/07/2022] [Accepted: 12/08/2022] [Indexed: 12/23/2022]
Abstract
Protein sequence-based predictors of nucleic acid (NA)-binding include methods that predict NA-binding proteins and NA-binding residues. The residue-level tools produce more details but suffer high computational cost since they must predict every amino acid in the input sequence and rely on multiple sequence alignments. We propose an alternative approach that predicts content (fraction) of the NA-binding residues, offering more information than the protein-level prediction and much shorter runtime than the residue-level tools. Our first-of-its-kind content predictor, qNABpredict, relies on a small, rationally designed and fast-to-compute feature set that represents relevant characteristics extracted from the input sequence and a well-parametrized support vector regression model. We provide two versions of qNABpredict, a taxonomy-agnostic model that can be used for proteins of unknown taxonomic origin and more accurate taxonomy-aware models that are tailored to specific taxonomic kingdoms: archaea, bacteria, eukaryota, and viruses. Empirical tests on a low-similarity test dataset show that qNABpredict is 100 times faster and generates statistically more accurate content predictions when compared to the content extracted from results produced by the residue-level predictors. We also show that qNABpredict's content predictions can be used to improve results generated by the residue-level predictors. We release qNABpredict as a convenient webserver and source code at http://biomine.cs.vcu.edu/servers/qNABpredict/. This new tool should be particularly useful to predict details of protein-NA interactions for large protein families and proteomes.
Collapse
Affiliation(s)
- Zhonghua Wu
- School of Mathematical Sciences and LPMCNankai UniversityTianjinChina
| | - Sushmita Basu
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Xuantai Wu
- School of Mathematical Sciences and LPMCNankai UniversityTianjinChina
| | - Lukasz Kurgan
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| |
Collapse
|
12
|
Fan S, Sun W, Fan L, Wu N, Sun W, Ma H, Chen S, Li Z, Li Y, Zhang J, Yan J. The highly conserved RNA-binding specificity of nucleocapsid protein facilitates the identification of drugs with broad anti-coronavirus activity. Comput Struct Biotechnol J 2022; 20:5040-5044. [PMID: 36097552 PMCID: PMC9454191 DOI: 10.1016/j.csbj.2022.09.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 09/05/2022] [Accepted: 09/05/2022] [Indexed: 11/06/2022] Open
Abstract
The binding of SARS-CoV-2 nucleocapsid (N) protein to both the 5'- and 3'-ends of genomic RNA has different implications arising from its binding to the central region during virion assembly. However, the mechanism underlying selective binding remains unknown. Herein, we performed the high-throughput RNA-SELEX (HTR-SELEX) to determine the RNA-binding specificity of the N proteins of various SARS-CoV-2 variants as well as other β-coronaviruses and showed that N proteins could bind two unrelated sequences, both of which were highly conserved across all variants and species. Interestingly, both sequences are virtually absent from the human transcriptome; however, they exhibit a highly enriched, mutually complementary distribution in the coronavirus genome, highlighting their varied functions in genome packaging. Our results provide mechanistic insights into viral genome packaging, thereby increasing the feasibility of developing drugs with broad-spectrum anti-coronavirus activity by targeting RNA binding by N proteins.
Collapse
Affiliation(s)
- Shaorong Fan
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education and Provincial Key Laboratory of Biotechnology, School of Medicine, Northwest University, Xi’an, China
- Department of Biomedical Sciences, The Tung Biomedical Sciences Centre, City University of Hong Kong, Kowloon Tong, Hong Kong Special Administrative Region
| | - Wenju Sun
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education and Provincial Key Laboratory of Biotechnology, School of Medicine, Northwest University, Xi’an, China
| | - Ligang Fan
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education and Provincial Key Laboratory of Biotechnology, School of Medicine, Northwest University, Xi’an, China
- Department of Biomedical Sciences, The Tung Biomedical Sciences Centre, City University of Hong Kong, Kowloon Tong, Hong Kong Special Administrative Region
- Department of Precision Diagnostic and Therapeutic Technology, The City University of Hong Kong Shenzhen Futian Research Institute, Shenzhen, China
| | - Nan Wu
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education and Provincial Key Laboratory of Biotechnology, School of Medicine, Northwest University, Xi’an, China
- Department of Biomedical Sciences, The Tung Biomedical Sciences Centre, City University of Hong Kong, Kowloon Tong, Hong Kong Special Administrative Region
| | - Wei Sun
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education and Provincial Key Laboratory of Biotechnology, School of Medicine, Northwest University, Xi’an, China
| | - Haiqian Ma
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education and Provincial Key Laboratory of Biotechnology, School of Medicine, Northwest University, Xi’an, China
| | - Siyuan Chen
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong Special Administrative Region
| | - Zitong Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong Special Administrative Region
| | - Yu Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong Special Administrative Region
| | - Jilin Zhang
- Department of Biomedical Sciences, The Tung Biomedical Sciences Centre, City University of Hong Kong, Kowloon Tong, Hong Kong Special Administrative Region
| | - Jian Yan
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education and Provincial Key Laboratory of Biotechnology, School of Medicine, Northwest University, Xi’an, China
- Department of Biomedical Sciences, The Tung Biomedical Sciences Centre, City University of Hong Kong, Kowloon Tong, Hong Kong Special Administrative Region
- Department of Precision Diagnostic and Therapeutic Technology, The City University of Hong Kong Shenzhen Futian Research Institute, Shenzhen, China
| |
Collapse
|