1
|
Lécuyer E, Sauvageau M, Kothe U, Unrau PJ, Damha MJ, Perreault J, Abou Elela S, Bayfield MA, Claycomb JM, Scott MS. Canada's contributions to RNA research: past, present, and future perspectives. Biochem Cell Biol 2024; 102:472-491. [PMID: 39320985 DOI: 10.1139/bcb-2024-0176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/27/2024] Open
Abstract
The field of RNA research has provided profound insights into the basic mechanisms modulating the function and adaption of biological systems. RNA has also been at the center stage in the development of transformative biotechnological and medical applications, perhaps most notably was the advent of mRNA vaccines that were critical in helping humanity through the Covid-19 pandemic. Unbeknownst to many, Canada boasts a diverse community of RNA scientists, spanning multiple disciplines and locations, whose cutting-edge research has established a rich track record of contributions across various aspects of RNA science over many decades. Through this position paper, we seek to highlight key contributions made by Canadian investigators to the RNA field, via both thematic and historical viewpoints. We also discuss initiatives underway to organize and enhance the impact of the Canadian RNA research community, particularly focusing on the creation of the not-for-profit organization RNA Canada ARN. Considering the strategic importance of RNA research in biology and medicine, and its considerable potential to help address major challenges facing humanity, sustained support of this sector will be critical to help Canadian scientists play key roles in the ongoing RNA revolution and the many benefits this could bring about to Canada.
Collapse
Affiliation(s)
- Eric Lécuyer
- Institut de Recherches Cliniques de Montréal (IRCM), Montréal, QC, Canada
- Département de Biochimie et de Médecine Moléculaire, Université de Montréal, Montréal, QC, Canada
- Division of Experimental Medicine, McGill University, Montréal, QC, Canada
| | - Martin Sauvageau
- Institut de Recherches Cliniques de Montréal (IRCM), Montréal, QC, Canada
- Département de Biochimie et de Médecine Moléculaire, Université de Montréal, Montréal, QC, Canada
- Department of Biochemistry, McGill University, Montréal, QC, Canada
| | - Ute Kothe
- Department of Chemistry, University of Manitoba, Winnipeg, MB, Canada
| | - Peter J Unrau
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Masad J Damha
- Department of Chemistry, McGill University, Montréal, QC, Canada
| | - Jonathan Perreault
- Centre Armand-Frappier Santé Biotechnologie, Institut National de la Recherche Scientifique (INRS), Laval, QC, Canada
| | - Sherif Abou Elela
- Département de Microbiologie et Infectiologie, Université de Sherbrooke, Sherbrooke, QC, Canada
| | | | - Julie M Claycomb
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Michelle S Scott
- Département de Biochimie et de Génomique Fonctionnelle, Université de Sherbrooke, Sherbrooke, QC, Canada
| |
Collapse
|
2
|
Yang Y, Li G, Pang K, Cao W, Zhang Z, Li X. Deciphering 3'UTR Mediated Gene Regulation Using Interpretable Deep Representation Learning. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2407013. [PMID: 39159140 PMCID: PMC11497048 DOI: 10.1002/advs.202407013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 07/23/2024] [Indexed: 08/21/2024]
Abstract
The 3' untranslated regions (3'UTRs) of messenger RNAs contain many important cis-regulatory elements that are under functional and evolutionary constraints. It is hypothesized that these constraints are similar to grammars and syntaxes in human languages and can be modeled by advanced natural language techniques such as Transformers, which has been very effective in modeling complex protein sequence and structures. Here 3UTRBERT is described, which implements an attention-based language model, i.e., Bidirectional Encoder Representations from Transformers (BERT). 3UTRBERT is pre-trained on aggregated 3'UTR sequences of human mRNAs in a task-agnostic manner; the pre-trained model is then fine-tuned for specific downstream tasks such as identifying RBP binding sites, m6A RNA modification sites, and predicting RNA sub-cellular localizations. Benchmark results show that 3UTRBERT generally outperformed other contemporary methods in each of these tasks. More importantly, the self-attention mechanism within 3UTRBERT allows direct visualization of the semantic relationship between sequence elements and effectively identifies regions with important regulatory potential. It is expected that 3UTRBERT model can serve as the foundational tool to analyze various sequence labeling tasks within the 3'UTR fields, thus enhancing the decipherability of post-transcriptional regulatory mechanisms.
Collapse
Affiliation(s)
- Yuning Yang
- School of Information Science and TechnologyNortheast Normal UniversityChangchunJilin130117China
| | - Gen Li
- Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoONM5S 3E1Canada
| | - Kuan Pang
- Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoONM5S 3E1Canada
| | - Wuxinhao Cao
- Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoONM5S 3E1Canada
| | - Zhaolei Zhang
- Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoONM5S 3E1Canada
- Department of Computer ScienceUniversity of TorontoTorontoONM5S 3E1Canada
- Department of Molecular GeneticsUniversity of TorontoTorontoONM5S 3E1Canada
| | - Xiangtao Li
- School of Artificial IntelligenceJilin UniversityChangchunJilin130012China
| |
Collapse
|
3
|
Li Y, Wang Y, Wang C, Ma A, Ma Q, Liu B. A weighted two-stage sequence alignment framework to identify motifs from ChIP-exo data. PATTERNS (NEW YORK, N.Y.) 2024; 5:100927. [PMID: 38487805 PMCID: PMC10935504 DOI: 10.1016/j.patter.2024.100927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/18/2023] [Accepted: 01/10/2024] [Indexed: 03/17/2024]
Abstract
In this study, we introduce TESA (weighted two-stage alignment), an innovative motif prediction tool that refines the identification of DNA-binding protein motifs, essential for deciphering transcriptional regulatory mechanisms. Unlike traditional algorithms that rely solely on sequence data, TESA integrates the high-resolution chromatin immunoprecipitation (ChIP) signal, specifically from ChIP-exonuclease (ChIP-exo), by assigning weights to sequence positions, thereby enhancing motif discovery. TESA employs a nuanced approach combining a binomial distribution model with a graph model, further supported by a "bookend" model, to improve the accuracy of predicting motifs of varying lengths. Our evaluation, utilizing an extensive compilation of 90 prokaryotic ChIP-exo datasets from proChIPdb and 167 H. sapiens datasets, compared TESA's performance against seven established tools. The results indicate TESA's improved precision in motif identification, suggesting its valuable contribution to the field of genomic research.
Collapse
Affiliation(s)
- Yang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Yizhong Wang
- School of Mathematics, Shandong University, Jinan, Shandong 250100, China
| | - Cankun Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong 250100, China
| |
Collapse
|
4
|
Alsenan S, Al-Turaiki I, Aldayel M, Tounsi M. Role of Optimization in RNA-Protein-Binding Prediction. Curr Issues Mol Biol 2024; 46:1360-1373. [PMID: 38392205 PMCID: PMC11154364 DOI: 10.3390/cimb46020087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 01/25/2024] [Accepted: 01/31/2024] [Indexed: 02/24/2024] Open
Abstract
RNA-binding proteins (RBPs) play an important role in regulating biological processes, such as gene regulation. Understanding their behaviors, for example, their binding site, can be helpful in understanding RBP-related diseases. Studies have focused on predicting RNA binding by means of machine learning algorithms including deep convolutional neural network models. One of the integral parts of modeling deep learning is achieving optimal hyperparameter tuning and minimizing a loss function using optimization algorithms. In this paper, we investigate the role of optimization in the RBP classification problem using the CLIP-Seq 21 dataset. Three optimization methods are employed on the RNA-protein binding CNN prediction model; namely, grid search, random search, and Bayesian optimizer. The empirical results show an AUC of 94.42%, 93.78%, 93.23% and 92.68% on the ELAVL1C, ELAVL1B, ELAVL1A, and HNRNPC datasets, respectively, and a mean AUC of 85.30 on 24 datasets. This paper's findings provide evidence on the role of optimizers in improving the performance of RNA-protein binding prediction.
Collapse
Affiliation(s)
- Shrooq Alsenan
- Information Systems Department, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Isra Al-Turaiki
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11653, Saudi Arabia;
| | - Mashael Aldayel
- Information Technology Department, College of Computer and Information Sciences, King Saud University, Riyadh 11451, Saudi Arabia;
| | - Mohamed Tounsi
- Department of Computer Science, College of Computer and information Sciences, Prince Sultan University, P.O. Box 66833, Riyadh 12435, Saudi Arabia;
| |
Collapse
|
5
|
Wang Y, Li Y, Wang C, Lio CWJ, Ma Q, Liu B. CEMIG: prediction of the cis-regulatory motif using the de Bruijn graph from ATAC-seq. Brief Bioinform 2023; 25:bbad505. [PMID: 38189539 PMCID: PMC10772951 DOI: 10.1093/bib/bbad505] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 11/21/2023] [Accepted: 12/03/2023] [Indexed: 01/09/2024] Open
Abstract
Sequence motif discovery algorithms enhance the identification of novel deoxyribonucleic acid sequences with pivotal biological significance, especially transcription factor (TF)-binding motifs. The advent of assay for transposase-accessible chromatin using sequencing (ATAC-seq) has broadened the toolkit for motif characterization. Nonetheless, prevailing computational approaches have focused on delineating TF-binding footprints, with motif discovery receiving less attention. Herein, we present Cis rEgulatory Motif Influence using de Bruijn Graph (CEMIG), an algorithm leveraging de Bruijn and Hamming distance graph paradigms to predict and map motif sites. Assessment on 129 ATAC-seq datasets from the Cistrome Data Browser demonstrates CEMIG's exceptional performance, surpassing three established methodologies on four evaluative metrics. CEMIG accurately identifies both cell-type-specific and common TF motifs within GM12878 and K562 cell lines, demonstrating its comparative genomic capabilities in the identification of evolutionary conservation and cell-type specificity. In-depth transcriptional and functional genomic studies have validated the functional relevance of CEMIG-identified motifs across various cell types. CEMIG is available at https://github.com/OSU-BMBL/CEMIG, developed in C++ to ensure cross-platform compatibility with Linux, macOS and Windows operating systems.
Collapse
Affiliation(s)
- Yizhong Wang
- School of Mathematics, Shandong University, Jinan, 250100, China
| | - Yang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Cankun Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Chan-Wang Jerry Lio
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, 250100, China
| |
Collapse
|
6
|
Wu Z, Basu S, Wu X, Kurgan L. qNABpredict: Quick, accurate, and taxonomy-aware sequence-based prediction of content of nucleic acid binding amino acids. Protein Sci 2023; 32:e4544. [PMID: 36519304 PMCID: PMC9798252 DOI: 10.1002/pro.4544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/07/2022] [Accepted: 12/08/2022] [Indexed: 12/23/2022]
Abstract
Protein sequence-based predictors of nucleic acid (NA)-binding include methods that predict NA-binding proteins and NA-binding residues. The residue-level tools produce more details but suffer high computational cost since they must predict every amino acid in the input sequence and rely on multiple sequence alignments. We propose an alternative approach that predicts content (fraction) of the NA-binding residues, offering more information than the protein-level prediction and much shorter runtime than the residue-level tools. Our first-of-its-kind content predictor, qNABpredict, relies on a small, rationally designed and fast-to-compute feature set that represents relevant characteristics extracted from the input sequence and a well-parametrized support vector regression model. We provide two versions of qNABpredict, a taxonomy-agnostic model that can be used for proteins of unknown taxonomic origin and more accurate taxonomy-aware models that are tailored to specific taxonomic kingdoms: archaea, bacteria, eukaryota, and viruses. Empirical tests on a low-similarity test dataset show that qNABpredict is 100 times faster and generates statistically more accurate content predictions when compared to the content extracted from results produced by the residue-level predictors. We also show that qNABpredict's content predictions can be used to improve results generated by the residue-level predictors. We release qNABpredict as a convenient webserver and source code at http://biomine.cs.vcu.edu/servers/qNABpredict/. This new tool should be particularly useful to predict details of protein-NA interactions for large protein families and proteomes.
Collapse
Affiliation(s)
- Zhonghua Wu
- School of Mathematical Sciences and LPMCNankai UniversityTianjinChina
| | - Sushmita Basu
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Xuantai Wu
- School of Mathematical Sciences and LPMCNankai UniversityTianjinChina
| | - Lukasz Kurgan
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| |
Collapse
|