1
|
Zhao Y, Dhani S, Gogvadze V, Zhivotovsky B. The crosstalk between SND1 and PDCD4 is associated with chemoresistance of non-small cell lung carcinoma cells. Cell Death Discov 2025; 11:34. [PMID: 39885142 PMCID: PMC11782486 DOI: 10.1038/s41420-025-02310-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Revised: 12/18/2024] [Accepted: 01/20/2025] [Indexed: 02/01/2025] Open
Abstract
Lung cancer is the leading cause of cancer-related deaths worldwide. Non-small cell lung cancer (NSCLC) is highly resistant to chemo- or radiation therapy, which poses a huge challenge for treatment of advanced NSCLC. Previously, we demonstrated the oncogenic role of Tudor Staphylococcal nuclease (TSN, also known as Staphylococcal nuclease domain-containing protein 1, SND1), in regulating chemoresistance in NSCLC cells. Here, we showed that silencing of SND1 augmented the sensitivity of NSCLC cells to different chemotherapeutic drugs. Additionally, the expression of PDCD4 (a tumor suppressor highly associated with lung cancer) in NSCLC cells with low endogenous levels was attenuated by SND1 silencing, implying that SND1 might function as a molecular regulator upstream of PDCD4. PDCD4 is differentially expressed in various NSCLC cells. In the NSCLC cells (A549 and H23 cells) with low expression of PDCD4, despite the downregulation of PDCD4, silencing of SND1 still led to sensitization of NSCLC cells to treatment with different chemotherapeutic agents by the inhibition of autophagic activity. Thus, a novel correlation interlinking SND1 and PDCD4 in the regulation of NSCLC cells concerning chemotherapy was revealed, which contributes to understanding the mechanisms of chemoresistance in NSCLC.
Collapse
Affiliation(s)
- Yun Zhao
- Department of Occupational and Environmental Health, School of Public Health, Medical College of Soochow University, Suzhou, China
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Shanel Dhani
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Vladimir Gogvadze
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
- Faculty of Medicine, MV Lomonosov Moscow State University, Moscow, Russia
| | - Boris Zhivotovsky
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden.
- Faculty of Medicine, MV Lomonosov Moscow State University, Moscow, Russia.
- Engelhardt Institute of Molecular Biology, RAS, Moscow, Russia.
| |
Collapse
|
2
|
Zheng D, Persyn L, Wang J, Liu Y, Montoya FU, Cenik C, Agarwal V. Predicting the translation efficiency of messenger RNA in mammalian cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.08.11.607362. [PMID: 39149337 PMCID: PMC11326250 DOI: 10.1101/2024.08.11.607362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
The degree to which translational control is specified by mRNA sequence is poorly understood in mammalian cells. Here, we constructed and leveraged a compendium of 3,819 ribosomal profiling datasets, distilling them into a transcriptome-wide atlas of translation efficiency (TE) measurements encompassing >140 human and mouse cell types. We subsequently developed RiboNN, a multitask deep convolutional neural network, and classic machine learning models to predict TEs in hundreds of cell types from sequence-encoded mRNA features, achieving state-of-the-art performance (r=0.79 in human and r=0.78 in mouse for mean TE across cell types). While the majority of earlier models solely considered 5' UTR sequence, RiboNN integrates contributions from the full-length mRNA sequence, learning that the 5' UTR, CDS, and 3' UTR respectively possess ~67%, 31%, and 2% per-nucleotide information density in the specification of mammalian TEs. Interpretation of RiboNN revealed that the spatial positioning of low-level di- and tri-nucleotide features (i.e., including codons) largely explain model performance, capturing mechanistic principles such as how ribosomal processivity and tRNA abundance control translational output. RiboNN is predictive of the translational behavior of base-modified therapeutic RNA, and can explain evolutionary selection pressures in human 5' UTRs. Finally, it detects a common language governing mRNA regulatory control and highlights the interconnectedness of mRNA translation, stability, and localization in mammalian organisms.
Collapse
|
3
|
Agarwal V, Inoue F, Schubach M, Penzar D, Martin BK, Dash PM, Keukeleire P, Zhang Z, Sohota A, Zhao J, Georgakopoulos-Soares I, Noble WS, Yardımcı GG, Kulakovskiy IV, Kircher M, Shendure J, Ahituv N. Massively parallel characterization of transcriptional regulatory elements. Nature 2025:10.1038/s41586-024-08430-9. [PMID: 39814889 DOI: 10.1038/s41586-024-08430-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 11/20/2024] [Indexed: 01/18/2025]
Abstract
The human genome contains millions of candidate cis-regulatory elements (cCREs) with cell-type-specific activities that shape both health and many disease states1. However, we lack a functional understanding of the sequence features that control the activity and cell-type-specific features of these cCREs. Here we used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test the regulatory activity of more than 680,000 sequences, representing an extensive set of annotated cCREs among three cell types (HepG2, K562 and WTC11), and found that 41.7% of these sequences were active. By testing sequences in both orientations, we find promoters to have strand-orientation biases and their 200-nucleotide cores to function as non-cell-type-specific 'on switches' that provide similar expression levels to their associated gene. By contrast, enhancers have weaker orientation biases, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, we develop sequence-based models to predict cCRE function and variant effects with high accuracy, delineate regulatory motifs and model their combinatorial effects. Testing a lentiMPRA library encompassing 60,000 cCREs in all three cell types further identified factors that determine cell-type specificity. Collectively, our work provides an extensive catalogue of functional CREs in three widely used cell lines and showcases how large-scale functional measurements can be used to dissect regulatory grammar.
Collapse
Affiliation(s)
- Vikram Agarwal
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- mRNA Center of Excellence, Sanofi, Waltham, MA, USA.
| | - Fumitaka Inoue
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Max Schubach
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Dmitry Penzar
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
- Institute of Translational Medicine, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Beth K Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Pyaree Mohan Dash
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Pia Keukeleire
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Zicong Zhang
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Ajuni Sohota
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Jingjing Zhao
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Galip Gürkan Yardımcı
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Knight Cancer Institute, Oregon Health and Science University, Portland, OR, USA
- Cancer Early Detection Advanced Research Center, Oregon Health and Science University, Portland, OR, USA
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
- Life Improvement by Future Technologies (LIFT) Center, Moscow, Russia
| | - Martin Kircher
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA.
- Seattle Hub for Synthetic Biology, Seattle, Washington, USA.
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
4
|
Harper NW, Birdsall GA, Honeywell ME, Pai AA, Lee MJ. Pol II degradation activates cell death independently from the loss of transcription. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.12.09.627542. [PMID: 39713309 PMCID: PMC11661175 DOI: 10.1101/2024.12.09.627542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Pol II-mediated transcription is essential for eukaryotic life. While loss of transcription is thought to be universally lethal, the associated mechanisms promoting cell death are not yet known. Here, we show that death following loss of Pol II is not caused by dysregulated gene expression. Instead, death occurs in response to the loss of Pol II protein itself, specifically loss of the enzymatic subunit, Rbp1. Loss of Pol II exclusively activates apoptosis, and expression of a transcriptionally inactive version of Rpb1 rescues cell viability. Using functional genomics, we identify a previously uncharacterized mechanism that regulates lethality following loss of Pol II, which we call the Pol II Degradation-dependent Apoptotic Response (PDAR). Using the genetic dependencies of PDAR, we identify clinically used drugs that owe their efficacy to a PDAR-dependent mechanism. Our findings unveil a novel apoptotic signaling response that contributes to the efficacy of a wide array of anti-cancer therapies.
Collapse
Affiliation(s)
- Nicholas W. Harper
- Department of Systems Biology, UMass Chan Medical School; Worcester, MA, USA
| | - Gavin A. Birdsall
- Department of Systems Biology, UMass Chan Medical School; Worcester, MA, USA
| | - Megan E. Honeywell
- Department of Systems Biology, UMass Chan Medical School; Worcester, MA, USA
| | - Athma A. Pai
- RNA Therapeutics Institute, UMass Chan Medical School; Worcester, MA, USA
| | - Michael J. Lee
- Department of Systems Biology, UMass Chan Medical School; Worcester, MA, USA
- Program in Molecular Medicine, and Department of Molecular, Cell and Cancer Biology, UMass Chan Medical School; Worcester, MA, USA
| |
Collapse
|
5
|
Linder J, Srivastava D, Yuan H, Agarwal V, Kelley DR. Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation. Nat Genet 2025:10.1038/s41588-024-02053-6. [PMID: 39779956 DOI: 10.1038/s41588-024-02053-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 12/04/2024] [Indexed: 01/11/2025]
Abstract
Sequence-based machine-learning models trained on genomics data improve genetic variant interpretation by providing functional predictions describing their impact on the cis-regulatory code. However, current tools do not predict RNA-seq expression profiles because of modeling challenges. Here, we introduce Borzoi, a model that learns to predict cell-type-specific and tissue-specific RNA-seq coverage from DNA sequence. Using statistics derived from Borzoi's predicted coverage, we isolate and accurately score DNA variant effects across multiple layers of regulation, including transcription, splicing and polyadenylation. Evaluated on quantitative trait loci, Borzoi is competitive with and often outperforms state-of-the-art models trained on individual regulatory functions. By applying attribution methods to the derived statistics, we extract cis-regulatory motifs driving RNA expression and post-transcriptional regulation in normal tissues. The wide availability of RNA-seq data across species, conditions and assays profiling specific aspects of regulation emphasizes the potential of this approach to decipher the mapping from DNA sequence to regulatory function.
Collapse
Affiliation(s)
| | | | - Han Yuan
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Vikram Agarwal
- mRNA Center of Excellence, Sanofi Pasteur Inc., Cambridge, MA, USA
| | | |
Collapse
|
6
|
Fradkin P, Shi R, Isaev K, Frey BJ, Morris Q, Lee LJ, Wang B. Orthrus: Towards Evolutionary and Functional RNA Foundation Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.10.617658. [PMID: 39416135 PMCID: PMC11482885 DOI: 10.1101/2024.10.10.617658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
In the face of rapidly accumulating genomic data, our ability to accurately predict key mature RNA properties that underlie transcript function and regulation remains limited. Pre-trained genomic foundation models offer an avenue to adapt learned RNA representations to biological prediction tasks. However, existing genomic foundation models are trained using strategies borrowed from textual or visual domains that do not leverage biological domain knowledge. Here, we introduce Orthrus, a Mamba-based mature RNA foundation model pre-trained using a novel self-supervised contrastive learning objective with biological augmentations. Orthrus is trained by maximizing embedding similarity between curated pairs of RNA transcripts, where pairs are formed from splice isoforms of 10 model organisms and transcripts from orthologous genes in 400+ mammalian species from the Zoonomia Project. This training objective results in a latent representation that clusters RNA sequences with functional and evolutionary similarities. We find that the generalized mature RNA isoform representations learned by Orthrus significantly outperform existing genomic foundation models on five mRNA property prediction tasks, and requires only a fraction of fine-tuning data to do so. Finally, we show that Orthrus is capable of capturing divergent biological function of individual transcript isoforms.
Collapse
Affiliation(s)
- Philip Fradkin
- Vector Institute, Ontario, Canada
- Computer Science, University of Toronto, Ontario, Canada
| | - Ruian Shi
- Vector Institute, Ontario, Canada
- Computer Science, University of Toronto, Ontario, Canada
- Computational and Systems Biology Program, Sloan Kettering Institute, New York, United States
| | - Keren Isaev
- New York Genome Center, New York, United States
- Systems Biology, Columbia University, New York, United States
| | - Brendan J Frey
- Vector Institute, Ontario, Canada
- Computer Science, University of Toronto, Ontario, Canada
- Electrical and Computer Engineering, University of Toronto, Ontario, Canada
| | - Quaid Morris
- Computational and Systems Biology Program, Sloan Kettering Institute, New York, United States
| | - Leo J Lee
- Vector Institute, Ontario, Canada
- Electrical and Computer Engineering, University of Toronto, Ontario, Canada
| | - Bo Wang
- Vector Institute, Ontario, Canada
- Computer Science, University of Toronto, Ontario, Canada
- Peter Munk Cardiac Center, University Health Network, Ontario, Canada
| |
Collapse
|
7
|
Guo X, Guo M, Cai R, Hu M, Rao L, Su W, Liu H, Gao F, Zhang X, Liu J, Chen C. mRNA compartmentalization via multimodule DNA nanostructure assembly augments the immunogenicity and efficacy of cancer mRNA vaccine. SCIENCE ADVANCES 2024; 10:eadp3680. [PMID: 39576858 PMCID: PMC11584007 DOI: 10.1126/sciadv.adp3680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 10/23/2024] [Indexed: 11/24/2024]
Abstract
Messenger RNA (mRNA) vaccine has fueled a great hope for cancer immunotherapy. However, low immunogenicity, caused by inefficient mRNA expression and weak immune stimulation, hampers the efficacy of mRNA vaccines. Here, we present an mRNA compartmentalization-based cancer vaccine, comprising a multimodule DNA nanostructure (MMDNS)-assembled compartment for efficient mRNA translation via in situ localizing mRNA concentration and relevant reaction molecules. The MMDNS is constructed via programmable DNA hybridization chain reaction (HCR)-based strategy, with integrating antigen-coded mRNA, CpG oligodeoxynucleotides (ODNs), acidic-responsive DNA sequence, and dendritic cells targeting aptamer. MMDNS undergoes in situ assembly in acidic lysosomes to form a micro-sized aggregate, inducing an enhanced CpG ODN adjuvant efficacy. Subsequently, the aggregates escape into cytoplasm, providing a moderate compartment which supports the efficient translation of spatially proximal mRNA transcripts via localizing relevant reaction molecules. The mRNA compartmentalization-based vaccine boosts a strong immune response and effectively inhibits tumor growth and metastasis, offering a robust strategy for cancer immunotherapy.
Collapse
Affiliation(s)
- Xiaocui Guo
- New Cornerstone Science Laboratory, CAS Key Laboratory of Biomedical Effects of Nanomaterials and Nanosafety and CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology of China, Beijing, 100190, China
| | - Mengyu Guo
- New Cornerstone Science Laboratory, CAS Key Laboratory of Biomedical Effects of Nanomaterials and Nanosafety and CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology of China, Beijing, 100190, China
| | - Rong Cai
- New Cornerstone Science Laboratory, CAS Key Laboratory of Biomedical Effects of Nanomaterials and Nanosafety and CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology of China, Beijing, 100190, China
| | - Mingdi Hu
- New Cornerstone Science Laboratory, CAS Key Laboratory of Biomedical Effects of Nanomaterials and Nanosafety and CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology of China, Beijing, 100190, China
| | - Le Rao
- Health Management Institute, The Second Medical Center, Chinese PLA General Hospital, Beijing 100853, China
| | - Wen Su
- New Cornerstone Science Laboratory, CAS Key Laboratory of Biomedical Effects of Nanomaterials and Nanosafety and CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology of China, Beijing, 100190, China
| | - He Liu
- New Cornerstone Science Laboratory, CAS Key Laboratory of Biomedical Effects of Nanomaterials and Nanosafety and CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology of China, Beijing, 100190, China
| | - Fene Gao
- New Cornerstone Science Laboratory, CAS Key Laboratory of Biomedical Effects of Nanomaterials and Nanosafety and CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology of China, Beijing, 100190, China
| | - Xiaoyu Zhang
- New Cornerstone Science Laboratory, CAS Key Laboratory of Biomedical Effects of Nanomaterials and Nanosafety and CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology of China, Beijing, 100190, China
| | - Jing Liu
- New Cornerstone Science Laboratory, CAS Key Laboratory of Biomedical Effects of Nanomaterials and Nanosafety and CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology of China, Beijing, 100190, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Chunying Chen
- New Cornerstone Science Laboratory, CAS Key Laboratory of Biomedical Effects of Nanomaterials and Nanosafety and CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology of China, Beijing, 100190, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- GBA National Institute for Nanotechnology Innovation, Guangzhou, 510700, China
| |
Collapse
|
8
|
Oesinghaus L, Castillo-Hair S, Ludwig N, Keller A, Seelig G. Quantitative design of cell type-specific mRNA stability from microRNA expression data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.28.620728. [PMID: 39554011 PMCID: PMC11565874 DOI: 10.1101/2024.10.28.620728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Limiting expression to target cell types is a longstanding goal in gene therapy, which could be met by sensing endogenous microRNA. However, an unclear association between microRNA expression and activity currently hampers such an approach. Here, we probe this relationship by measuring the stability of synthetic microRNA-responsive 3'UTRs across 10 cell lines in a library format. By systematically addressing biases in microRNA expression data and confounding factors such as microRNA crosstalk, we demonstrate that a straightforward model can quantitatively predict reporter stability purely from expression data. We use this model to design constructs with previously unattainable response patterns across our cell lines. The rules we derive for microRNA expression data selection and processing should apply to microRNA- responsive devices for any environment with available expression data.
Collapse
|
9
|
La Fleur A, Shi Y, Seelig G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev 2024; 38:843-865. [PMID: 39362779 PMCID: PMC11535156 DOI: 10.1101/gad.351800.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding of cis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses on cis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.
Collapse
Affiliation(s)
- Alyssa La Fleur
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA;
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
10
|
Naghipourfar M, Chen S, Howard MK, Macdonald CB, Saberi A, Hagen T, Mofrad MRK, Coyote-Maestas W, Goodarzi H. A Suite of Foundation Models Captures the Contextual Interplay Between Codons. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.10.617568. [PMID: 39416097 PMCID: PMC11482952 DOI: 10.1101/2024.10.10.617568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
In the canonical genetic code, many amino acids are assigned more than one codon. Work by us and others has shown that the choice of these synonymous codon is not random, and carries regulatory and functional consequences. Existing protein foundation models ignore this context-dependent role of coding sequence in shaping the protein landscape of the cell. To address this gap, we introduce cdsFM, a suite of codon-resolution large language models, including both EnCodon and DeCodon models, with up to 1B parameters. Pre-trained on 60 million protein-coding sequences from more than 5,000 species, our models effectively learn the relationship between codons and amino acids, recapitualing the overall structure of the genetic code. In addition to outperforming state-of-the-art genomic foundation models in a variety of zero-shot and few-shot learning tasks, the larger pre-trained models were superior in predicting the choice of synonymous codons. To systematically assess the impact of synonymous codon choices on protein expression and our models' ability to capture these effects, we generated a large dataset measuring overall and surface expression levels of three proteins as a function of changes in their synonymous codons. We showed that our EnCodon models could be readily fine-tuned to predict the contextual consequences of synonymous codon choices. Armed with this knowledge, we applied EnCodon to existing clinical datasets of synonymous variants, and we identified a large number of synonymous codons that are likely pathogenic, several of which we experimentally confirmed in a cell-based model. Together, our findings establish the cdsFM suite as a powerful tool for decoding the complex functional grammar underlying the choice of synonymous codons.
Collapse
Affiliation(s)
- Mohsen Naghipourfar
- Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, Berkeley, CA, USA
- Arc Institute, Palo Alto, CA, USA
| | | | - Mathew K. Howard
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
- Tetrad Graduate Program, UCSF, San Francisco, CA, USA
| | - Christian B. Macdonald
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Ali Saberi
- Department of Electrical and Computer Engineering, McGill University, Montreal, Canada
- Victor P. Dahdaleh Institute of Genomic Medicine, Montreal, QC, Canada
| | | | - Mohammad R. K. Mofrad
- Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, Berkeley, CA, USA
| | - Willow Coyote-Maestas
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California, San Francisco, USA
| | - Hani Goodarzi
- Arc Institute, Palo Alto, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
| |
Collapse
|
11
|
Capitanchik C, Wilkins OG, Wagner N, Gagneur J, Ule J. From computational models of the splicing code to regulatory mechanisms and therapeutic implications. Nat Rev Genet 2024:10.1038/s41576-024-00774-2. [PMID: 39358547 DOI: 10.1038/s41576-024-00774-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/27/2024] [Indexed: 10/04/2024]
Abstract
Since the discovery of RNA splicing and its role in gene expression, researchers have sought a set of rules, an algorithm or a computational model that could predict the splice isoforms, and their frequencies, produced from any transcribed gene in a specific cellular context. Over the past 30 years, these models have evolved from simple position weight matrices to deep-learning models capable of integrating sequence data across vast genomic distances. Most recently, new model architectures are moving the field closer to context-specific alternative splicing predictions, and advances in sequencing technologies are expanding the type of data that can be used to inform and interpret such models. Together, these developments are driving improved understanding of splicing regulatory mechanisms and emerging applications of the splicing code to the rational design of RNA- and splicing-based therapeutics.
Collapse
Affiliation(s)
- Charlotte Capitanchik
- The Francis Crick Institute, London, UK
- UK Dementia Research Institute at King's College London, London, UK
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK
| | - Oscar G Wilkins
- The Francis Crick Institute, London, UK
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Nils Wagner
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| | - Jernej Ule
- The Francis Crick Institute, London, UK.
- UK Dementia Research Institute at King's College London, London, UK.
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK.
- National Institute of Chemistry, Ljubljana, Slovenia.
| |
Collapse
|
12
|
Sasse A, Chikina M, Mostafavi S. Quick and effective approximation of in silico saturation mutagenesis experiments with first-order taylor expansion. iScience 2024; 27:110807. [PMID: 39286491 PMCID: PMC11404212 DOI: 10.1016/j.isci.2024.110807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/08/2024] [Accepted: 08/20/2024] [Indexed: 09/19/2024] Open
Abstract
To understand the decision process of genomic sequence-to-function models, explainable AI algorithms determine the importance of each nucleotide in a given input sequence to the model's predictions and enable discovery of cis-regulatory motifs for gene regulation. The most commonly applied method is in silico saturation mutagenesis (ISM) because its per-nucleotide importance scores can be intuitively understood as the computational counterpart to in vivo saturation mutagenesis experiments. While ISM is highly interpretable, it is computationally challenging to perform for many sequences, and becomes prohibitive as the length of the input sequences and size of the model grows. Here, we use the first-order Taylor approximation to approximate ISM values from the model's gradient, which reduces its computation cost to a single forward pass for an input sequence. We show that the Taylor ISM (TISM) approximation is robust across different model ablations, random initializations, training parameters, and dataset sizes.
Collapse
Affiliation(s)
- Alexander Sasse
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Maria Chikina
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 16354, USA
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
- Canadian Institute for Advanced Research, Toronto, ON MG5 1ZB, Canada
| |
Collapse
|
13
|
Sun H, Vargas-Blanco D, Zhou Y, Masiello C, Kelly J, Moy J, Korkin D, Shell S. Diverse intrinsic properties shape transcript stability and stabilization in Mycolicibacterium smegmatis. NAR Genom Bioinform 2024; 6:lqae147. [PMID: 39498432 PMCID: PMC11532794 DOI: 10.1093/nargab/lqae147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 08/23/2024] [Accepted: 10/17/2024] [Indexed: 11/07/2024] Open
Abstract
Mycobacteria regulate transcript degradation to facilitate adaptation to environmental stress. However, the mechanisms underlying this regulation are unknown. Here we sought to gain understanding of the mechanisms controlling mRNA stability by investigating the transcript properties associated with variance in transcript stability and stress-induced transcript stabilization. We measured mRNA half-lives transcriptome-wide in Mycolicibacterium smegmatis in log phase growth and hypoxia-induced growth arrest. The transcriptome was globally stabilized in response to hypoxia, but transcripts of essential genes were generally stabilized more than those of non-essential genes. We then developed machine learning models that enabled us to identify the non-linear collective effect of a compendium of transcript properties on transcript stability and stabilization. We identified properties that were more predictive of half-life in log phase as well as properties that were more predictive in hypoxia, and many of these varied between leadered and leaderless transcripts. In summary, we found that transcript properties are differentially associated with transcript stability depending on both the transcript type and the growth condition. Our results reveal the complex interplay between transcript features and microenvironment that shapes transcript stability in mycobacteria.
Collapse
Affiliation(s)
- Huaming Sun
- Program in Bioinformatics and Computational Biology, Worcester Polytechnic Institute, Worcester, MA 01609, USA
| | - Diego A Vargas-Blanco
- Department of Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA
| | - Ying Zhou
- Department of Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA
| | - Catherine S Masiello
- Department of Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA
| | - Jessica M Kelly
- Program in Bioinformatics and Computational Biology, Worcester Polytechnic Institute, Worcester, MA 01609, USA
- Department of Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA
| | - Justin K Moy
- Program in Bioinformatics and Computational Biology, Worcester Polytechnic Institute, Worcester, MA 01609, USA
| | - Dmitry Korkin
- Program in Bioinformatics and Computational Biology, Worcester Polytechnic Institute, Worcester, MA 01609, USA
| | - Scarlet S Shell
- Program in Bioinformatics and Computational Biology, Worcester Polytechnic Institute, Worcester, MA 01609, USA
- Department of Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA
| |
Collapse
|
14
|
Modic M, Kuret K, Steinhauser S, Faraway R, van Genderen E, Ruiz de Los Mozos I, Novljan J, Vičič Ž, Lee FCY, Ten Berge D, Luscombe NM, Ule J. Poised PABP-RNA hubs implement signal-dependent mRNA decay in development. Nat Struct Mol Biol 2024; 31:1439-1447. [PMID: 39054355 PMCID: PMC11402784 DOI: 10.1038/s41594-024-01363-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 06/28/2024] [Indexed: 07/27/2024]
Abstract
Signaling pathways drive cell fate transitions largely by changing gene expression. However, the mechanisms for rapid and selective transcriptome rewiring in response to signaling cues remain elusive. Here we use deep learning to deconvolve both the sequence determinants and the trans-acting regulators that trigger extracellular signal-regulated kinase (ERK)-mitogen-activated protein kinase kinase (MEK)-induced decay of the naive pluripotency mRNAs. Timing of decay is coupled to embryo implantation through ERK-MEK phosphorylation of LIN28A, which repositions pLIN28A to the highly A+U-rich 3' untranslated region (3'UTR) termini of naive pluripotency mRNAs. Interestingly, these A+U-rich 3'UTR termini serve as poly(A)-binding protein (PABP)-binding hubs, poised for signal-induced convergence with LIN28A. The multivalency of AUU motifs determines the efficacy of pLIN28A-PABP convergence, which enhances PABP 3'UTR binding, decreases the protection of poly(A) tails and activates mRNA decay to enable progression toward primed pluripotency. Thus, the signal-induced convergence of LIN28A with PABP-RNA hubs drives the rapid selection of naive mRNAs for decay, enabling the transcriptome remodeling that ensures swift developmental progression.
Collapse
Affiliation(s)
- Miha Modic
- The Francis Crick Institute, London, UK.
- UK Dementia Research Institute at King's College London, London, UK.
- National Institute of Chemistry, Ljubljana, Slovenia.
| | - Klara Kuret
- The Francis Crick Institute, London, UK
- National Institute of Chemistry, Ljubljana, Slovenia
- Jozef Stefan International Postgraduate School, Ljubljana, Slovenia
| | | | - Rupert Faraway
- The Francis Crick Institute, London, UK
- UK Dementia Research Institute at King's College London, London, UK
| | - Emiel van Genderen
- Department of Cell Biology, Erasmus MC, University Medical Center, Rotterdam, The Netherlands
| | - Igor Ruiz de Los Mozos
- The Francis Crick Institute, London, UK
- Department of Gene Therapy and Regulation of Gene Expression, Center for Applied Medical Research, University of Navarra, Pamplona, Spain
| | - Jona Novljan
- National Institute of Chemistry, Ljubljana, Slovenia
| | - Žiga Vičič
- National Institute of Chemistry, Ljubljana, Slovenia
| | - Flora C Y Lee
- The Francis Crick Institute, London, UK
- UK Dementia Research Institute at King's College London, London, UK
- Centre for Developmental Neurobiology, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Derk Ten Berge
- Department of Cell Biology, Erasmus MC, University Medical Center, Rotterdam, The Netherlands
| | - Nicholas M Luscombe
- The Francis Crick Institute, London, UK
- Okinawa Institute of Science and Technology, Okinawa, Japan
| | - Jernej Ule
- The Francis Crick Institute, London, UK.
- UK Dementia Research Institute at King's College London, London, UK.
- National Institute of Chemistry, Ljubljana, Slovenia.
| |
Collapse
|
15
|
Li S, Moayedpour S, Li R, Bailey M, Riahi S, Kogler-Anele L, Miladi M, Miner J, Pertuy F, Zheng D, Wang J, Balsubramani A, Tran K, Zacharia M, Wu M, Gu X, Clinton R, Asquith C, Skaleski J, Boeglin L, Chivukula S, Dias A, Strugnell T, Montoya FU, Agarwal V, Bar-Joseph Z, Jager S. CodonBERT large language model for mRNA vaccines. Genome Res 2024; 34:1027-1035. [PMID: 38951026 PMCID: PMC11368176 DOI: 10.1101/gr.278870.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 06/25/2024] [Indexed: 07/03/2024]
Abstract
mRNA-based vaccines and therapeutics are gaining popularity and usage across a wide range of conditions. One of the critical issues when designing such mRNAs is sequence optimization. Even small proteins or peptides can be encoded by an enormously large number of mRNAs. The actual mRNA sequence can have a large impact on several properties, including expression, stability, immunogenicity, and more. To enable the selection of an optimal sequence, we developed CodonBERT, a large language model (LLM) for mRNAs. Unlike prior models, CodonBERT uses codons as inputs, which enables it to learn better representations. CodonBERT was trained using more than 10 million mRNA sequences from a diverse set of organisms. The resulting model captures important biological concepts. CodonBERT can also be extended to perform prediction tasks for various mRNA properties. CodonBERT outperforms previous mRNA prediction methods, including on a new flu vaccine data set.
Collapse
Affiliation(s)
- Sizhen Li
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA
| | | | - Ruijiang Li
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA
| | - Michael Bailey
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA
| | - Saleh Riahi
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA
| | | | - Milad Miladi
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Jacob Miner
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Fabien Pertuy
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Dinghai Zheng
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Jun Wang
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | | | - Khang Tran
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Minnie Zacharia
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Monica Wu
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Xiaobo Gu
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Ryan Clinton
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Carla Asquith
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Joseph Skaleski
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Lianne Boeglin
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Sudha Chivukula
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Anusha Dias
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Tod Strugnell
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | | | - Vikram Agarwal
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Ziv Bar-Joseph
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA;
| | - Sven Jager
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA
| |
Collapse
|
16
|
Ietswaart R, Smalec BM, Xu A, Choquet K, McShane E, Jowhar ZM, Guegler CK, Baxter-Koenigs AR, West ER, Fu BXH, Gilbert L, Floor SN, Churchman LS. Genome-wide quantification of RNA flow across subcellular compartments reveals determinants of the mammalian transcript life cycle. Mol Cell 2024; 84:2765-2784.e16. [PMID: 38964322 PMCID: PMC11315470 DOI: 10.1016/j.molcel.2024.06.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 05/15/2024] [Accepted: 06/11/2024] [Indexed: 07/06/2024]
Abstract
Dissecting the regulatory mechanisms controlling mammalian transcripts from production to degradation requires quantitative measurements of mRNA flow across the cell. We developed subcellular TimeLapse-seq to measure the rates at which RNAs are released from chromatin, exported from the nucleus, loaded onto polysomes, and degraded within the nucleus and cytoplasm in human and mouse cells. These rates varied substantially, yet transcripts from genes with related functions or targeted by the same transcription factors and RNA-binding proteins flowed across subcellular compartments with similar kinetics. Verifying these associations uncovered a link between DDX3X and nuclear export. For hundreds of RNA metabolism genes, most transcripts with retained introns were degraded by the nuclear exosome, while the remaining molecules were exported with stable cytoplasmic lifespans. Transcripts residing on chromatin for longer had extended poly(A) tails, whereas the reverse was observed for cytoplasmic mRNAs. Finally, machine learning identified molecular features that predicted the diverse life cycles of mRNAs.
Collapse
Affiliation(s)
- Robert Ietswaart
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA.
| | - Brendan M Smalec
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Albert Xu
- Department of Cell and Tissue Biology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Karine Choquet
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Erik McShane
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Ziad Mohamoud Jowhar
- Department of Cell and Tissue Biology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Chantal K Guegler
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Autum R Baxter-Koenigs
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Emma R West
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | | | - Luke Gilbert
- Arc Institute, Palo Alto, CA 94305, USA; Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Urology, University of California, San Francisco, San Francisco, CA 94518, USA
| | - Stephen N Floor
- Department of Cell and Tissue Biology, University of California, San Francisco, San Francisco, CA 94143, USA; Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94143, USA.
| | - L Stirling Churchman
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
17
|
Zeng X, Wei Z, Du Q, Li J, Xie Z, Wang X. Unveil cis-acting combinatorial mRNA motifs by interpreting deep neural network. Bioinformatics 2024; 40:i381-i389. [PMID: 38940172 PMCID: PMC11211823 DOI: 10.1093/bioinformatics/btae262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
SUMMARY Cis-acting mRNA elements play a key role in the regulation of mRNA stability and translation efficiency. Revealing the interactions of these elements and their impact plays a crucial role in understanding the regulation of the mRNA translation process, which supports the development of mRNA-based medicine or vaccines. Deep neural networks (DNN) can learn complex cis-regulatory codes from RNA sequences. However, extracting these cis-regulatory codes efficiently from DNN remains a significant challenge. Here, we propose a method based on our toolkit NeuronMotif and motif mutagenesis, which not only enables the discovery of diverse and high-quality motifs but also efficiently reveals motif interactions. By interpreting deep-learning models, we have discovered several crucial motifs that impact mRNA translation efficiency and stability, as well as some unknown motifs or motif syntax, offering novel insights for biologists. Furthermore, we note that it is challenging to enrich motif syntax in datasets composed of randomly generated sequences, and they may not contain sufficient biological signals. AVAILABILITY AND IMPLEMENTATION The source code and data used to produce the results and analyses presented in this manuscript are available from GitHub (https://github.com/WangLabTHU/combmotif).
Collapse
Affiliation(s)
- Xiaocheng Zeng
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Zheng Wei
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Qixiu Du
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Jiaqi Li
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Zhen Xie
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
18
|
Popitsch N, Neumann T, von Haeseler A, Ameres SL. Splice_sim: a nucleotide conversion-enabled RNA-seq simulation and evaluation framework. Genome Biol 2024; 25:166. [PMID: 38918865 PMCID: PMC11514792 DOI: 10.1186/s13059-024-03313-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Accepted: 06/17/2024] [Indexed: 06/27/2024] Open
Abstract
Nucleotide conversion RNA sequencing techniques interrogate chemical RNA modifications in cellular transcripts, resulting in mismatch-containing reads. Biases in mapping the resulting reads to reference genomes remain poorly understood. We present splice_sim, a splice-aware RNA-seq simulation and evaluation pipeline that introduces user-defined nucleotide conversions at set frequencies, creates mixture models of converted and unconverted reads, and calculates mapping accuracies per genomic annotation. By simulating nucleotide conversion RNA-seq datasets under realistic experimental conditions, including metabolic RNA labeling and RNA bisulfite sequencing, we measure mapping accuracies of state-of-the-art spliced-read mappers for mouse and human transcripts and derive strategies to prevent biases in the data interpretation.
Collapse
Affiliation(s)
- Niko Popitsch
- Max Perutz Labs, Vienna Biocenter Campus (VBC), Vienna, A-1030, Austria.
- Max Perutz Labs, Department of Biochemistry and Cell Biology, University of Vienna, Vienna, A-1030, Austria.
| | - Tobias Neumann
- Quantro Therapeutics, Vienna, A-1030, Austria
- Vienna Biocenter PhD Program, a Doctoral School of the University of Vienna and Medical University of Vienna, Vienna, A-1030, Austria
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna, Medical University of Vienna, Vienna, A-1030, Austria
| | - Arndt von Haeseler
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna, Medical University of Vienna, Vienna, A-1030, Austria
- Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, A-1090, Austria
| | - Stefan L Ameres
- Max Perutz Labs, Vienna Biocenter Campus (VBC), Vienna, A-1030, Austria
- Max Perutz Labs, Department of Biochemistry and Cell Biology, University of Vienna, Vienna, A-1030, Austria
- Institute of Molecular Biotechnology, IMBA, Vienna Biocenter Campus (VBC), Vienna, A-1030, Austria
| |
Collapse
|
19
|
Castillo-Hair S, Fedak S, Wang B, Linder J, Havens K, Certo M, Seelig G. Optimizing 5'UTRs for mRNA-delivered gene editing using deep learning. Nat Commun 2024; 15:5284. [PMID: 38902240 PMCID: PMC11189900 DOI: 10.1038/s41467-024-49508-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 06/07/2024] [Indexed: 06/22/2024] Open
Abstract
mRNA therapeutics are revolutionizing the pharmaceutical industry, but methods to optimize the primary sequence for increased expression are still lacking. Here, we design 5'UTRs for efficient mRNA translation using deep learning. We perform polysome profiling of fully or partially randomized 5'UTR libraries in three cell types and find that UTR performance is highly correlated across cell types. We train models on our datasets and use them to guide the design of high-performing 5'UTRs using gradient descent and generative neural networks. We experimentally test designed 5'UTRs with mRNA encoding megaTALTM gene editing enzymes for two different gene targets and in two different cell lines. We find that the designed 5'UTRs support strong gene editing activity. Editing efficiency is correlated between cell types and gene targets, although the best performing UTR was specific to one cargo and cell type. Our results highlight the potential of model-based sequence design for mRNA therapeutics.
Collapse
Affiliation(s)
- Sebastian Castillo-Hair
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA, USA
- eScience Institute, University of Washington, WA, Seattle, USA
| | | | - Ban Wang
- Department of Biology, Stanford University, Stanford, CA, USA
| | - Johannes Linder
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | | | | - Georg Seelig
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA, USA.
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA.
| |
Collapse
|
20
|
Li Y, Yi Y, Gao X, Wang X, Zhao D, Wang R, Zhang LS, Gao B, Zhang Y, Zhang L, Cao Q, Chen K. 2'-O-methylation at internal sites on mRNA promotes mRNA stability. Mol Cell 2024; 84:2320-2336.e6. [PMID: 38906115 PMCID: PMC11196006 DOI: 10.1016/j.molcel.2024.04.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 02/13/2024] [Accepted: 04/17/2024] [Indexed: 06/23/2024]
Abstract
2'-O-methylation (Nm) is a prominent RNA modification well known in noncoding RNAs and more recently also found at many mRNA internal sites. However, their function and base-resolution stoichiometry remain underexplored. Here, we investigate the transcriptome-wide effect of internal site Nm on mRNA stability. Combining nanopore sequencing with our developed machine learning method, NanoNm, we identify thousands of Nm sites on mRNAs with a single-base resolution. We observe a positive effect of FBL-mediated Nm modification on mRNA stability and expression level. Elevated FBL expression in cancer cells is associated with increased expression levels for 2'-O-methylated mRNAs of cancer pathways, implying the role of FBL in post-transcriptional regulation. Lastly, we find that FBL-mediated 2'-O-methylation connects to widespread 3' UTR shortening, a mechanism that globally increases RNA stability. Collectively, we demonstrate that FBL-mediated Nm modifications at mRNA internal sites regulate gene expression by enhancing mRNA stability.
Collapse
Affiliation(s)
- Yanqiang Li
- Basic and Translational Research Division, Department of Cardiology, Boston Children's Hospital, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Yang Yi
- Department of Urology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA; Robert H. Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Xinlei Gao
- Basic and Translational Research Division, Department of Cardiology, Boston Children's Hospital, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Xin Wang
- Basic and Translational Research Division, Department of Cardiology, Boston Children's Hospital, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Dongyu Zhao
- Basic and Translational Research Division, Department of Cardiology, Boston Children's Hospital, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Rui Wang
- Department of Urology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA; Robert H. Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Li-Sheng Zhang
- Department of Chemistry, Division of Life Science, The Hong Kong University of Science and Technology, Hong Kong SAR, China; Department of Chemistry and Institute for Biophysical Dynamics, University of Chicago, Chicago, IL, USA; Howard Hughes Medical Institute, Chicago, IL, USA
| | - Boyang Gao
- Department of Chemistry and Institute for Biophysical Dynamics, University of Chicago, Chicago, IL, USA; Howard Hughes Medical Institute, Chicago, IL, USA
| | - Yadong Zhang
- Basic and Translational Research Division, Department of Cardiology, Boston Children's Hospital, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Lili Zhang
- Basic and Translational Research Division, Department of Cardiology, Boston Children's Hospital, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Qi Cao
- Department of Urology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA; Robert H. Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.
| | - Kaifu Chen
- Basic and Translational Research Division, Department of Cardiology, Boston Children's Hospital, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Boston, MA, USA; Prostate Cancer Program, Dana-Farber/Harvard Cancer Center, Boston, MA, USA.
| |
Collapse
|
21
|
Munro V, Kelly V, Messner CB, Kustatscher G. Cellular control of protein levels: A systems biology perspective. Proteomics 2024; 24:e2200220. [PMID: 38012370 DOI: 10.1002/pmic.202200220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 11/13/2023] [Accepted: 11/15/2023] [Indexed: 11/29/2023]
Abstract
How cells regulate protein levels is a central question of biology. Over the past decades, molecular biology research has provided profound insights into the mechanisms and the molecular machinery governing each step of the gene expression process, from transcription to protein degradation. Recent advances in transcriptomics and proteomics have complemented our understanding of these fundamental cellular processes with a quantitative, systems-level perspective. Multi-omic studies revealed significant quantitative, kinetic and functional differences between the genome, transcriptome and proteome. While protein levels often correlate with mRNA levels, quantitative investigations have demonstrated a substantial impact of translation and protein degradation on protein expression control. In addition, protein-level regulation appears to play a crucial role in buffering protein abundances against undesirable mRNA expression variation. These findings have practical implications for many fields, including gene function prediction and precision medicine.
Collapse
Affiliation(s)
- Victoria Munro
- Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh, UK
| | - Van Kelly
- Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh, UK
| | - Christoph B Messner
- Precision Proteomics Center, Swiss Institute of Allergy and Asthma Research (SIAF), University of Zurich, Davos, Switzerland
| | - Georg Kustatscher
- Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
22
|
Geraud M, Cristini A, Salimbeni S, Bery N, Jouffret V, Russo M, Ajello AC, Fernandez Martinez L, Marinello J, Cordelier P, Trouche D, Favre G, Nicolas E, Capranico G, Sordet O. TDP1 mutation causing SCAN1 neurodegenerative syndrome hampers the repair of transcriptional DNA double-strand breaks. Cell Rep 2024; 43:114214. [PMID: 38761375 DOI: 10.1016/j.celrep.2024.114214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 03/05/2024] [Accepted: 04/24/2024] [Indexed: 05/20/2024] Open
Abstract
TDP1 removes transcription-blocking topoisomerase I cleavage complexes (TOP1ccs), and its inactivating H493R mutation causes the neurodegenerative syndrome SCAN1. However, the molecular mechanism underlying the SCAN1 phenotype is unclear. Here, we generate human SCAN1 cell models using CRISPR-Cas9 and show that they accumulate TOP1ccs along with changes in gene expression and genomic distribution of R-loops. SCAN1 cells also accumulate transcriptional DNA double-strand breaks (DSBs) specifically in the G1 cell population due to increased DSB formation and lack of repair, both resulting from abortive removal of transcription-blocking TOP1ccs. Deficient TDP1 activity causes increased DSB production, and the presence of mutated TDP1 protein hampers DSB repair by a TDP2-dependent backup pathway. This study provides powerful models to study TDP1 functions under physiological and pathological conditions and unravels that a gain of function of the mutated TDP1 protein, which prevents DSB repair, rather than a loss of TDP1 activity itself, could contribute to SCAN1 pathogenesis.
Collapse
Affiliation(s)
- Mathéa Geraud
- Cancer Research Center of Toulouse (CRCT), INSERM, Université de Toulouse, Université Toulouse III Paul Sabatier, CNRS, 31037 Toulouse, France
| | - Agnese Cristini
- Cancer Research Center of Toulouse (CRCT), INSERM, Université de Toulouse, Université Toulouse III Paul Sabatier, CNRS, 31037 Toulouse, France
| | - Simona Salimbeni
- Cancer Research Center of Toulouse (CRCT), INSERM, Université de Toulouse, Université Toulouse III Paul Sabatier, CNRS, 31037 Toulouse, France; Department of Pharmacy and Biotechnology, Alma Mater Studiorum, University of Bologna, 40126 Bologna, Italy
| | - Nicolas Bery
- Cancer Research Center of Toulouse (CRCT), INSERM, Université de Toulouse, Université Toulouse III Paul Sabatier, CNRS, 31037 Toulouse, France
| | - Virginie Jouffret
- MCD, Centre de Biologie Intégrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France; BigA Core Facility, Centre de Biologie Intégrative (CBI), Université de Toulouse, 31062 Toulouse, France
| | - Marco Russo
- Department of Pharmacy and Biotechnology, Alma Mater Studiorum, University of Bologna, 40126 Bologna, Italy
| | - Andrea Carla Ajello
- Cancer Research Center of Toulouse (CRCT), INSERM, Université de Toulouse, Université Toulouse III Paul Sabatier, CNRS, 31037 Toulouse, France
| | - Lara Fernandez Martinez
- Cancer Research Center of Toulouse (CRCT), INSERM, Université de Toulouse, Université Toulouse III Paul Sabatier, CNRS, 31037 Toulouse, France
| | - Jessica Marinello
- Department of Pharmacy and Biotechnology, Alma Mater Studiorum, University of Bologna, 40126 Bologna, Italy
| | - Pierre Cordelier
- Cancer Research Center of Toulouse (CRCT), INSERM, Université de Toulouse, Université Toulouse III Paul Sabatier, CNRS, 31037 Toulouse, France
| | - Didier Trouche
- MCD, Centre de Biologie Intégrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
| | - Gilles Favre
- Cancer Research Center of Toulouse (CRCT), INSERM, Université de Toulouse, Université Toulouse III Paul Sabatier, CNRS, 31037 Toulouse, France
| | - Estelle Nicolas
- MCD, Centre de Biologie Intégrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
| | - Giovanni Capranico
- Department of Pharmacy and Biotechnology, Alma Mater Studiorum, University of Bologna, 40126 Bologna, Italy.
| | - Olivier Sordet
- Cancer Research Center of Toulouse (CRCT), INSERM, Université de Toulouse, Université Toulouse III Paul Sabatier, CNRS, 31037 Toulouse, France.
| |
Collapse
|
23
|
Ginley-Hidinger M, Abewe H, Osborne K, Richey A, Kitchen N, Mortenson KL, Wissink EM, Lis J, Zhang X, Gertz J. Cis-regulatory control of transcriptional timing and noise in response to estrogen. CELL GENOMICS 2024; 4:100542. [PMID: 38663407 PMCID: PMC11099348 DOI: 10.1016/j.xgen.2024.100542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 10/26/2023] [Accepted: 03/27/2024] [Indexed: 05/07/2024]
Abstract
Cis-regulatory elements control transcription levels, temporal dynamics, and cell-cell variation or transcriptional noise. However, the combination of regulatory features that control these different attributes is not fully understood. Here, we used single-cell RNA-seq during an estrogen treatment time course and machine learning to identify predictors of expression timing and noise. We found that genes with multiple active enhancers exhibit faster temporal responses. We verified this finding by showing that manipulation of enhancer activity changes the temporal response of estrogen target genes. Analysis of transcriptional noise uncovered a relationship between promoter and enhancer activity, with active promoters associated with low noise and active enhancers linked to high noise. Finally, we observed that co-expression across single cells is an emergent property associated with chromatin looping, timing, and noise. Overall, our results indicate a fundamental tradeoff between a gene's ability to quickly respond to incoming signals and maintain low variation across cells.
Collapse
Affiliation(s)
- Matthew Ginley-Hidinger
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, USA; Department of Biomedical Engineering, University of Utah, Salt Lake City, UT 84112, USA
| | - Hosiana Abewe
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, USA; Department of Oncological Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | - Kyle Osborne
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, USA; Department of Oncological Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | - Alexandra Richey
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, USA; Department of Biomedical Engineering, University of Utah, Salt Lake City, UT 84112, USA
| | - Noel Kitchen
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, USA; Department of Oncological Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | - Katelyn L Mortenson
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, USA; Department of Oncological Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | - Erin M Wissink
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - John Lis
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Xiaoyang Zhang
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, USA; Department of Oncological Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | - Jason Gertz
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, USA; Department of Biomedical Engineering, University of Utah, Salt Lake City, UT 84112, USA; Department of Oncological Sciences, University of Utah, Salt Lake City, UT 84112, USA.
| |
Collapse
|
24
|
Mercier BC, Labaronne E, Cluet D, Guiguettaz L, Fontrodona N, Bicknell A, Corbin A, Wencker M, Aube F, Modolo L, Jouravleva K, Auboeuf D, Moore MJ, Ricci EP. Translation-dependent and -independent mRNA decay occur through mutually exclusive pathways defined by ribosome density during T cell activation. Genome Res 2024; 34:394-409. [PMID: 38508694 PMCID: PMC11067875 DOI: 10.1101/gr.277863.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 03/09/2024] [Indexed: 03/22/2024]
Abstract
mRNA translation and decay are tightly interconnected processes both in the context of mRNA quality-control pathways and for the degradation of functional mRNAs. Cotranslational mRNA degradation through codon usage, ribosome collisions, and the recruitment of specific proteins to ribosomes is an important determinant of mRNA turnover. However, the extent to which translation-dependent mRNA decay (TDD) and translation-independent mRNA decay (TID) pathways participate in the degradation of mRNAs has not been studied yet. Here we describe a comprehensive analysis of basal and signal-induced TDD and TID in mouse primary CD4+ T cells. Our results indicate that most cellular transcripts are decayed to some extent in a translation-dependent manner. Our analysis further identifies the length of untranslated regions, the density of ribosomes, and GC3 content as important determinants of TDD magnitude. Consistently, all transcripts that undergo changes in ribosome density within their coding sequence upon T cell activation display a corresponding change in their TDD level. Moreover, we reveal a dynamic modulation in the relationship between GC3 content and TDD upon T cell activation, with a reversal in the impact of GC3- and AU3-rich codons. Altogether, our data show a strong and dynamic interconnection between mRNA translation and decay in mammalian primary cells.
Collapse
Affiliation(s)
- Blandine C Mercier
- RNA Therapeutics Institute, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Emmanuel Labaronne
- Laboratory of Biology and Modeling of the Cell (LBMC), Université de Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, Inserm U1293, 69007 Lyon, France
- ADLIN Science, 9100 Evry-Courcouronnes, France
| | - David Cluet
- Laboratory of Biology and Modeling of the Cell (LBMC), Université de Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, Inserm U1293, 69007 Lyon, France
| | - Laura Guiguettaz
- Laboratory of Biology and Modeling of the Cell (LBMC), Université de Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, Inserm U1293, 69007 Lyon, France
| | - Nicolas Fontrodona
- Laboratory of Biology and Modeling of the Cell (LBMC), Université de Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, Inserm U1293, 69007 Lyon, France
| | - Alicia Bicknell
- RNA Therapeutics Institute, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Antoine Corbin
- Centre International de Recherche en Infectiologie Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, ENS de Lyon, F-69007 Lyon, France
| | - Mélanie Wencker
- Centre International de Recherche en Infectiologie Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, ENS de Lyon, F-69007 Lyon, France
| | - Fabien Aube
- Laboratory of Biology and Modeling of the Cell (LBMC), Université de Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, Inserm U1293, 69007 Lyon, France
| | - Laurent Modolo
- Laboratory of Biology and Modeling of the Cell (LBMC), Université de Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, Inserm U1293, 69007 Lyon, France
| | - Karina Jouravleva
- Laboratory of Biology and Modeling of the Cell (LBMC), Université de Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, Inserm U1293, 69007 Lyon, France
| | - Didier Auboeuf
- Laboratory of Biology and Modeling of the Cell (LBMC), Université de Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, Inserm U1293, 69007 Lyon, France
| | - Melissa J Moore
- RNA Therapeutics Institute, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA;
| | - Emiliano P Ricci
- Laboratory of Biology and Modeling of the Cell (LBMC), Université de Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, Inserm U1293, 69007 Lyon, France;
| |
Collapse
|
25
|
Lorenzo-Orts L, Pauli A. The molecular mechanisms underpinning maternal mRNA dormancy. Biochem Soc Trans 2024; 52:861-871. [PMID: 38477334 PMCID: PMC11088918 DOI: 10.1042/bst20231122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 02/28/2024] [Accepted: 03/04/2024] [Indexed: 03/14/2024]
Abstract
A large number of mRNAs of maternal origin are produced during oogenesis and deposited in the oocyte. Since transcription stops at the onset of meiosis during oogenesis and does not resume until later in embryogenesis, maternal mRNAs are the only templates for protein synthesis during this period. To ensure that a protein is made in the right place at the right time, the translation of maternal mRNAs must be activated at a specific stage of development. Here we summarize our current understanding of the sophisticated mechanisms that contribute to the temporal repression of maternal mRNAs, termed maternal mRNA dormancy. We discuss mechanisms at the level of the RNA itself, such as the regulation of polyadenine tail length and RNA modifications, as well as at the level of RNA-binding proteins, which often block the assembly of translation initiation complexes at the 5' end of an mRNA or recruit mRNAs to specific subcellular compartments. We also review microRNAs and other mechanisms that contribute to repressing translation, such as ribosome dormancy. Importantly, the mechanisms responsible for mRNA dormancy during the oocyte-to-embryo transition are also relevant to cellular quiescence in other biological contexts.
Collapse
Affiliation(s)
- Laura Lorenzo-Orts
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), 1030 Vienna, Austria
| | - Andrea Pauli
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), 1030 Vienna, Austria
| |
Collapse
|
26
|
Robson ES, Ioannidis NM. GUANinE v1.0: Benchmark Datasets for Genomic AI Sequence-to-Function Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.12.562113. [PMID: 37904945 PMCID: PMC10614795 DOI: 10.1101/2023.10.12.562113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
Computational genomics increasingly relies on machine learning methods for genome interpretation, and the recent adoption of neural sequence-to-function models highlights the need for rigorous model specification and controlled evaluation, problems familiar to other fields of AI. Research strategies that have greatly benefited other fields - including benchmarking, auditing, and algorithmic fairness - are also needed to advance the field of genomic AI and to facilitate model development. Here we propose a genomic AI benchmark, GUANinE, for evaluating model generalization across a number of distinct genomic tasks. Compared to existing task formulations in computational genomics, GUANinE is large-scale, de-noised, and suitable for evaluating pretrained models. GUANinE v1.0 primarily focuses on functional genomics tasks such as functional element annotation and gene expression prediction, and it also draws upon connections to evolutionary biology through sequence conservation tasks. The current GUANinE tasks provide insight into the performance of existing genomic AI models and non-neural baselines, with opportunities to be refined, revisited, and broadened as the field matures. Finally, the GUANinE benchmark allows us to evaluate new self-supervised T5 models and explore the tradeoffs between tokenization and model performance, while showcasing the potential for self-supervision to complement existing pretraining procedures.
Collapse
Affiliation(s)
- Eyes S Robson
- Center for Computational Biology, UC Berkeley, Berkeley, CA 94720
| | - Nilah M Ioannidis
- Department of Electrical Engineering and Computer Sciences, UC Berkeley, Berkeley, CA 94720
| |
Collapse
|
27
|
Michielsen L, Reinders MJT, Mahfouz A. Predicting cell population-specific gene expression from genomic sequence. FRONTIERS IN BIOINFORMATICS 2024; 4:1347276. [PMID: 38501113 PMCID: PMC10944912 DOI: 10.3389/fbinf.2024.1347276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 01/23/2024] [Indexed: 03/20/2024] Open
Abstract
Most regulatory elements, especially enhancer sequences, are cell population-specific. One could even argue that a distinct set of regulatory elements is what defines a cell population. However, discovering which non-coding regions of the DNA are essential in which context, and as a result, which genes are expressed, is a difficult task. Some computational models tackle this problem by predicting gene expression directly from the genomic sequence. These models are currently limited to predicting bulk measurements and mainly make tissue-specific predictions. Here, we present a model that leverages single-cell RNA-sequencing data to predict gene expression. We show that cell population-specific models outperform tissue-specific models, especially when the expression profile of a cell population and the corresponding tissue are dissimilar. Further, we show that our model can prioritize GWAS variants and learn motifs of transcription factor binding sites. We envision that our model can be useful for delineating cell population-specific regulatory elements.
Collapse
Affiliation(s)
- Lieke Michielsen
- Department of Human Genetics, Leiden University Medical Center, Leiden, Netherlands
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands
| | - Marcel J. T. Reinders
- Department of Human Genetics, Leiden University Medical Center, Leiden, Netherlands
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands
| | - Ahmed Mahfouz
- Department of Human Genetics, Leiden University Medical Center, Leiden, Netherlands
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands
| |
Collapse
|
28
|
Dos Santos OAL, Carneiro RL, Requião RD, Ribeiro-Alves M, Domitrovic T, Palhano FL. Transcriptional profile of ribosome-associated quality control components and their associated phenotypes in mammalian cells. Sci Rep 2024; 14:1439. [PMID: 38228636 PMCID: PMC10792078 DOI: 10.1038/s41598-023-50811-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 12/26/2023] [Indexed: 01/18/2024] Open
Abstract
During protein synthesis, organisms detect translation defects that induce ribosome stalling and result in protein aggregation. The Ribosome-associated Quality Control (RQC) complex, comprising TCF25, LTN1, and NEMF, is responsible for identifying incomplete protein products from unproductive translation events, targeting them for degradation. Although RQC disruption causes adverse effects on vertebrate neurons, data regarding mRNA/protein expression and regulation across tissues are lacking. Employing high-throughput methods, we analyzed public datasets to explore RQC gene expression and phenotypes. Our findings revealed widespread expression of RQC components in human tissues; however, silencing of RQC yielded only mild negative effects on cell growth. Notably, TCF25 exhibited elevated mRNA levels that were not reflected in the protein content. We experimentally demonstrated that this disparity arose from post-translational protein degradation by the proteasome. Additionally, we observed that cellular aging marginally influenced RQC expression, leading to reduced mRNA levels in specific tissues. Our results suggest the necessity of RQC expression in all mammalian tissues. Nevertheless, when RQC falters, alternative mechanisms seem to compensate, ensuring cell survival under nonstress conditions.
Collapse
Affiliation(s)
- Otávio Augusto Leitão Dos Santos
- Programa de Biologia Estrutural, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, 21941-902, Brazil
| | - Rodolfo L Carneiro
- Programa de Biologia Estrutural, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, 21941-902, Brazil
| | - Rodrigo D Requião
- Departamento de Genética, Evolução, Microbiologia e Imunologia, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, SP, Brazil
| | - Marcelo Ribeiro-Alves
- Fundação Oswaldo Cruz, Instituto Nacional de Infectologia Evandro Chagas, Rio de Janeiro, 21040-900, Brazil
| | - Tatiana Domitrovic
- Departamento de Virologia, Instituto de Microbiologia Paulo de Góes, Universidade Federal do Rio de Janeiro, Rio de Janeiro, 21941-902, Brazil
| | - Fernando L Palhano
- Programa de Biologia Estrutural, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, 21941-902, Brazil.
| |
Collapse
|
29
|
Kishimoto T, Nishimura K, Morishita K, Fukuda A, Miyamae Y, Kumagai Y, Sumaru K, Nakanishi M, Hisatake K, Sano M. An engineered ligand-responsive Csy4 endoribonuclease controls transgene expression from Sendai virus vectors. J Biol Eng 2024; 18:9. [PMID: 38229076 DOI: 10.1186/s13036-024-00404-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 01/04/2024] [Indexed: 01/18/2024] Open
Abstract
BACKGROUND Viral vectors are attractive gene delivery vehicles because of their broad tropism, high transduction efficiency, and durable expression. With no risk of integration into the host genome, the vectors developed from RNA viruses such as Sendai virus (SeV) are especially promising. However, RNA-based vectors have limited applicability because they lack a convenient method to control transgene expression by an external inducer. RESULTS We engineered a Csy4 switch in Sendai virus-based vectors by combining Csy4 endoribonuclease with mutant FKBP12 (DD: destabilizing domain) that becomes stabilized when a small chemical Shield1 is supplied. In this Shield1-responsive Csy4 (SrC) switch, Shield1 increases Csy4 fused with DD (DD-Csy4), which then cleaves and downregulates the transgene mRNA containing the Csy4 recognition sequence (Csy4RS). Moreover, when Csy4RS is inserted in the viral L gene, the SrC switch suppresses replication and transcription of the SeV vector in infected cells in a Shield1-dependent manner, thus enabling complete elimination of the vector from the cells. By temporally controlling BRN4 expression, a BRN4-expressing SeV vector equipped with the SrC switch achieves efficient, stepwise differentiation of embryonic stem cells into neural stem cells, and then into astrocytes. CONCLUSION SeV-based vectors with the SrC switch should find wide applications in stem cell research, regenerative medicine, and gene therapy, especially when precise control of reprogramming factor expression is desirable.
Collapse
Grants
- JP19H03203, JP19K22945, JP19K07343, JP21H02678, JP19K06501 Japan Society for the Promotion of Science
- JP19H03203, JP19K22945, JP19K07343, JP21H02678, JP19K06501 Japan Society for the Promotion of Science
- JP19H03203, JP19K22945, JP19K07343, JP21H02678, JP19K06501 Japan Society for the Promotion of Science
- JP19H03203, JP19K22945, JP19K07343, JP21H02678, JP19K06501 Japan Society for the Promotion of Science
Collapse
Affiliation(s)
- Takumi Kishimoto
- Laboratory of Gene Regulation, Institute of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8575, Japan
- Department of Clinical Application, Center for iPS Cell Research and Application (CiRA), Kyoto University, 53 Kawahara-cho, Shogoin, Sakyo-ku, Kyoto, 606-8507, Japan
| | - Ken Nishimura
- Laboratory of Gene Regulation, Institute of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8575, Japan.
| | - Kana Morishita
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Central 5, 1-1-1 Higashi, Tsukuba, Ibaraki, 305-8565, Japan
| | - Aya Fukuda
- Laboratory of Gene Regulation, Institute of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8575, Japan
| | - Yusaku Miyamae
- Institute of Life and Environment Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8572, Japan
| | - Yutaro Kumagai
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Central 5, 1-1-1 Higashi, Tsukuba, Ibaraki, 305-8565, Japan
| | - Kimio Sumaru
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Central 5, 1-1-1 Higashi, Tsukuba, Ibaraki, 305-8565, Japan
| | - Mahito Nakanishi
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Central 5, 1-1-1 Higashi, Tsukuba, Ibaraki, 305-8565, Japan
- TOKIWA-Bio, Inc, 2-1-6 Sengen, Tsukuba, Ibaraki, 305-0047, Japan
| | - Koji Hisatake
- Laboratory of Gene Regulation, Institute of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8575, Japan
| | - Masayuki Sano
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Central 5, 1-1-1 Higashi, Tsukuba, Ibaraki, 305-8565, Japan.
| |
Collapse
|
30
|
Palazzo AF, Qiu Y, Kang YM. mRNA nuclear export: how mRNA identity features distinguish functional RNAs from junk transcripts. RNA Biol 2024; 21:1-12. [PMID: 38091265 PMCID: PMC10732640 DOI: 10.1080/15476286.2023.2293339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 11/27/2023] [Accepted: 12/05/2023] [Indexed: 12/18/2023] Open
Abstract
The division of the cellular space into nucleoplasm and cytoplasm promotes quality control mechanisms that prevent misprocessed mRNAs and junk RNAs from gaining access to the translational machinery. Here, we explore how properly processed mRNAs are distinguished from both misprocessed mRNAs and junk RNAs by the presence or absence of various 'identity features'.
Collapse
Affiliation(s)
| | - Yi Qiu
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada
| | - Yoon Mo Kang
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
31
|
Valencia JD, Hendrix DA. Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task. PLoS Comput Biol 2023; 19:e1011526. [PMID: 37824580 PMCID: PMC10597526 DOI: 10.1371/journal.pcbi.1011526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 10/24/2023] [Accepted: 09/18/2023] [Indexed: 10/14/2023] Open
Abstract
Ribosomes are information-processing macromolecular machines that integrate complex sequence patterns in messenger RNA (mRNA) transcripts to synthesize proteins. Studies of the sequence features that distinguish mRNAs from long noncoding RNAs (lncRNAs) may yield insight into the information that directs and regulates translation. Computational methods for calculating protein-coding potential are important for distinguishing mRNAs from lncRNAs during genome annotation, but most machine learning methods for this task rely on previously known rules to define features. Sequence-to-sequence (seq2seq) models, particularly ones using transformer networks, have proven capable of learning complex grammatical relationships between words to perform natural language translation. Seeking to leverage these advancements in the biological domain, we present a seq2seq formulation for predicting protein-coding potential with deep neural networks and demonstrate that simultaneously learning translation from RNA to protein improves classification performance relative to a classification-only training objective. Inspired by classical signal processing methods for gene discovery and Fourier-based image-processing neural networks, we introduce LocalFilterNet (LFNet). LFNet is a network architecture with an inductive bias for modeling the three-nucleotide periodicity apparent in coding sequences. We incorporate LFNet within an encoder-decoder framework to test whether the translation task improves the classification of transcripts and the interpretation of their sequence features. We use the resulting model to compute nucleotide-resolution importance scores, revealing sequence patterns that could assist the cellular machinery in distinguishing mRNAs and lncRNAs. Finally, we develop a novel approach for estimating mutation effects from Integrated Gradients, a backpropagation-based feature attribution, and characterize the difficulty of efficient approximations in this setting.
Collapse
Affiliation(s)
- Joseph D. Valencia
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America
| | - David A. Hendrix
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon, United States of America
| |
Collapse
|
32
|
Schertzer MD, Stirn A, Isaev K, Pereira L, Das A, Harbison C, Park SH, Wessels HH, Sanjana NE, Knowles DA. Cas13d-mediated isoform-specific RNA knockdown with a unified computational and experimental toolbox. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.12.557474. [PMID: 37745416 PMCID: PMC10515814 DOI: 10.1101/2023.09.12.557474] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Alternative splicing is an essential mechanism for diversifying proteins, in which mature RNA isoforms produce proteins with potentially distinct functions. Two major challenges in characterizing the cellular function of isoforms are the lack of experimental methods to specifically and efficiently modulate isoform expression and computational tools for complex experimental design. To address these gaps, we developed and methodically tested a strategy which pairs the RNA-targeting CRISPR/Cas13d system with guide RNAs that span exon-exon junctions in the mature RNA. We performed a high-throughput essentiality screen, quantitative RT-PCR assays, and PacBio long read sequencing to affirm our ability to specifically target and robustly knockdown individual RNA isoforms. In parallel, we provide computational tools for experimental design and screen analysis. Considering all possible splice junctions annotated in GENCODE for multi-isoform genes and our gRNA efficacy predictions, we estimate that our junction-centric strategy can uniquely target up to 89% of human RNA isoforms, including 50,066 protein-coding and 11,415 lncRNA isoforms. Importantly, this specificity spans all splicing and transcriptional events, including exon skipping and inclusion, alternative 5' and 3' splice sites, and alternative starts and ends.
Collapse
Affiliation(s)
- Megan D Schertzer
- New York Genome Center, New York, NY
- Department of Computer Science, Columbia University, New York, NY
| | - Andrew Stirn
- New York Genome Center, New York, NY
- Department of Computer Science, Columbia University, New York, NY
| | - Keren Isaev
- New York Genome Center, New York, NY
- Department of Systems Biology, Columbia University, New York, NY
| | | | - Anjali Das
- New York Genome Center, New York, NY
- Department of Computer Science, Columbia University, New York, NY
| | | | - Stella H Park
- New York Genome Center, New York, NY
- Department of Biomedical Engineering, Columbia University, New York, NY
| | - Hans-Hermann Wessels
- New York Genome Center, New York, NY
- Department of Biology, New York University, New York, NY
| | - Neville E Sanjana
- New York Genome Center, New York, NY
- Department of Biology, New York University, New York, NY
| | - David A Knowles
- New York Genome Center, New York, NY
- Department of Computer Science, Columbia University, New York, NY
- Department of Systems Biology, Columbia University, New York, NY
- Data Science Institute, Columbia University, New York, NY
| |
Collapse
|
33
|
Bohn E, Lau TTY, Wagih O, Masud T, Merico D. A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction. Front Mol Biosci 2023; 10:1257550. [PMID: 37745687 PMCID: PMC10517338 DOI: 10.3389/fmolb.2023.1257550] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open
Abstract
Introduction: Variants in 5' and 3' untranslated regions (UTR) contribute to rare disease. While predictive algorithms to assist in classifying pathogenicity can potentially be highly valuable, the utility of these tools is often unclear, as it depends on carefully selected training and validation conditions. To address this, we developed a high confidence set of pathogenic (P) and likely pathogenic (LP) variants and assessed deep learning (DL) models for predicting their molecular effects. Methods: 3' and 5' UTR variants documented as P or LP (P/LP) were obtained from ClinVar and refined by reviewing the annotated variant effect and reassessing evidence of pathogenicity following published guidelines. Prediction scores from sequence-based DL models were compared between three groups: P/LP variants acting though the mechanism for which the model was designed (model-matched), those operating through other mechanisms (model-mismatched), and putative benign variants. PhyloP was used to compare conservation scores between P/LP and putative benign variants. Results: 295 3' and 188 5' UTR variants were obtained from ClinVar, of which 26 3' and 68 5' UTR variants were classified as P/LP. Predictions by DL models achieved statistically significant differences when comparing modelmatched P/LP variants to both putative benign variants and modelmismatched P/LP variants, as well as when comparing all P/LP variants to putative benign variants. PhyloP conservation scores were significantly higher among P/LP compared to putative benign variants for both the 3' and 5' UTR. Discussion: In conclusion, we present a high-confidence set of P/LP 3' and 5' UTR variants spanning a range of mechanisms and supported by detailed pathogenicity and molecular mechanism evidence curation. Predictions from DL models further substantiate these classifications. These datasets will support further development and validation of DL algorithms designed to predict the functional impact of variants that may be implicated in rare disease.
Collapse
Affiliation(s)
- Emma Bohn
- Deep Genomics Inc., Toronto, ON, Canada
| | | | | | | | - Daniele Merico
- Deep Genomics Inc., Toronto, ON, Canada
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, ON, Canada
| |
Collapse
|
34
|
Sapir T, Reiner O. HNRNPU's multi-tasking is essential for proper cortical development. Bioessays 2023; 45:e2300039. [PMID: 37439444 DOI: 10.1002/bies.202300039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 05/27/2023] [Accepted: 06/12/2023] [Indexed: 07/14/2023]
Abstract
Heterogeneous nuclear ribonucleoprotein U (HNRNPU) is a nuclear protein that plays a crucial role in various biological functions, such as RNA splicing and chromatin organization. HNRNPU/scaffold attachment factor A (SAF-A) activities are essential for regulating gene expression, DNA replication, genome integrity, and mitotic fidelity. These functions are critical to ensure the robustness of developmental processes, particularly those involved in shaping the human brain. As a result, HNRNPU is associated with various neurodevelopmental disorders (HNRNPU-related neurodevelopmental disorder, HNRNPU-NDD) characterized by developmental delay and intellectual disability. Our research demonstrates that the loss of HNRNPU function results in the death of both neural progenitor cells and post-mitotic neurons, with a higher sensitivity observed in the former. We reported that HNRNPU truncation leads to the dysregulation of gene expression and alternative splicing of genes that converge on several signaling pathways, some of which are likely to be involved in the pathology of HNRNPU-related NDD.
Collapse
Affiliation(s)
- Tamar Sapir
- Weizmann Institute of Science, Molecular Genetics and Molecular Neuroscience, Rehovot, Central, Israel
| | - Orly Reiner
- Weizmann Institute of Science, Molecular Genetics and Molecular Neuroscience, Rehovot, Central, Israel
| |
Collapse
|
35
|
Kuśnierczyk P. Anti-SARS-CoV-2 mRNA vaccines, their efficiency, side effects and controversies. Scand J Immunol 2023; 98:e13310. [PMID: 38441312 DOI: 10.1111/sji.13310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 06/20/2023] [Accepted: 06/22/2023] [Indexed: 03/07/2024]
Affiliation(s)
- Piotr Kuśnierczyk
- Laboratory of Immunogenetics and Tissue Immunology, Hirszfeld Institute of Immunology and Experimental Therapy, Polish Academy of Sciences, Wrocław, Poland
| |
Collapse
|
36
|
McBeath E, Fujiwara K, Hofmann MC. Evidence-Based Guide to Using Artificial Introns for Tissue-Specific Knockout in Mice. Int J Mol Sci 2023; 24:10258. [PMID: 37373404 PMCID: PMC10299402 DOI: 10.3390/ijms241210258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 06/09/2023] [Accepted: 06/10/2023] [Indexed: 06/29/2023] Open
Abstract
Up until recently, methods for generating floxed mice either conventionally or by CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)-Cas9 (CRISPR-associated protein 9) editing have been technically challenging, expensive and error-prone, or time-consuming. To circumvent these issues, several labs have started successfully using a small artificial intron to conditionally knockout (KO) a gene of interest in mice. However, many other labs are having difficulty getting the technique to work. The key problem appears to be either a failure in achieving correct splicing after the introduction of the artificial intron into the gene or, just as crucial, insufficient functional KO of the gene's protein after Cre-induced removal of the intron's branchpoint. Presented here is a guide on how to choose an appropriate exon and where to place the recombinase-regulated artificial intron (rAI) in that exon to prevent disrupting normal gene splicing while maximizing mRNA degradation after recombinase treatment. The reasoning behind each step in the guide is also discussed. Following these recommendations should increase the success rate of this easy, new, and alternative technique for producing tissue-specific KO mice.
Collapse
Affiliation(s)
- Elena McBeath
- Department of Endocrine Neoplasia & Hormonal Disorders, MD Anderson Cancer Center, Houston, TX 77030, USA;
| | - Keigi Fujiwara
- National Coalition of Independent Scholars, Brattleboro, VT 05301, USA;
| | - Marie-Claude Hofmann
- Department of Endocrine Neoplasia & Hormonal Disorders, MD Anderson Cancer Center, Houston, TX 77030, USA;
| |
Collapse
|
37
|
Valencia JD, Hendrix DA. Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.03.535488. [PMID: 37066250 PMCID: PMC10104019 DOI: 10.1101/2023.04.03.535488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Ribosomes are information-processing macromolecular machines that integrate complex sequence patterns in messenger RNA (mRNA) transcripts to synthesize proteins. Studies of the sequence features that distinguish mRNAs from long noncoding RNAs (lncRNAs) may yield insight into the information that directs and regulates translation. Computational methods for calculating protein-coding potential are important for distinguishing mRNAs from lncRNAs during genome annotation, but most machine learning methods for this task rely on previously known rules to define features. Sequence-to-sequence (seq2seq) models, particularly ones using transformer networks, have proven capable of learning complex grammatical relationships between words to perform natural language translation. Seeking to leverage these advancements in the biological domain, we present a seq2seq formulation for predicting protein-coding potential with deep neural networks and demonstrate that simultaneously learning translation from RNA to protein improves classification performance relative to a classification-only training objective. Inspired by classical signal processing methods for gene discovery and Fourier-based image-processing neural networks, we introduce LocalFilterNet (LFNet). LFNet is a network architecture with an inductive bias for modeling the three-nucleotide periodicity apparent in coding sequences. We incorporate LFNet within an encoder-decoder framework to test whether the translation task improves the classification of transcripts and the interpretation of their sequence features. We use the resulting model to compute nucleotide-resolution importance scores, revealing sequence patterns that could assist the cellular machinery in distinguishing mRNAs and lncRNAs. Finally, we develop a novel approach for estimating mutation effects from Integrated Gradients, a backpropagation-based feature attribution, and characterize the difficulty of efficient approximations in this setting.
Collapse
Affiliation(s)
- Joseph D. Valencia
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - David A. Hendrix
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
38
|
Agarwal V, Inoue F, Schubach M, Martin BK, Dash PM, Zhang Z, Sohota A, Noble WS, Yardimci GG, Kircher M, Shendure J, Ahituv N. Massively parallel characterization of transcriptional regulatory elements in three diverse human cell types. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.05.531189. [PMID: 36945371 PMCID: PMC10028905 DOI: 10.1101/2023.03.05.531189] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
Abstract
The human genome contains millions of candidate cis-regulatory elements (CREs) with cell-type-specific activities that shape both health and myriad disease states. However, we lack a functional understanding of the sequence features that control the activity and cell-type-specific features of these CREs. Here, we used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test the regulatory activity of over 680,000 sequences, representing a nearly comprehensive set of all annotated CREs among three cell types (HepG2, K562, and WTC11), finding 41.7% to be functional. By testing sequences in both orientations, we find promoters to have significant strand orientation effects. We also observe that their 200 nucleotide cores function as non-cell-type-specific 'on switches' providing similar expression levels to their associated gene. In contrast, enhancers have weaker orientation effects, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, we develop sequence-based models to predict CRE function with high accuracy and delineate regulatory motifs. Testing an additional lentiMPRA library encompassing 60,000 CREs in all three cell types, we further identified factors that determine cell-type specificity. Collectively, our work provides an exhaustive catalog of functional CREs in three widely used cell lines, and showcases how large-scale functional measurements can be used to dissect regulatory grammar.
Collapse
Affiliation(s)
- Vikram Agarwal
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- mRNA Center of Excellence, Sanofi Pasteur Inc., Waltham, MA 02451, USA
| | - Fumitaka Inoue
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Max Schubach
- Berlin Institute of Health of Health at Charité - Universitätsmedizin Berlin, 10178, Berlin, Germany
| | - Beth K. Martin
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Pyaree Mohan Dash
- Berlin Institute of Health of Health at Charité - Universitätsmedizin Berlin, 10178, Berlin, Germany
| | - Zicong Zhang
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Ajuni Sohota
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Galip Gürkan Yardimci
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Knight Cancer Institute, Oregon Health and Science University, Portland, OR, USA
- Cancer Early Detection Advanced Research Center, Oregon Health and Science University, Portland, OR, USA
| | - Martin Kircher
- Berlin Institute of Health of Health at Charité - Universitätsmedizin Berlin, 10178, Berlin, Germany
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, Seattle, WA 98195, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA
- Allen Center for Cell Lineage Tracing, University of Washington, Seattle, WA 98195, USA
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
39
|
Crowdsourcing to predict RNA degradation and secondary structure. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-023-00615-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
|
40
|
Uzonyi A, Dierks D, Nir R, Kwon OS, Toth U, Barbosa I, Burel C, Brandis A, Rossmanith W, Le Hir H, Slobodin B, Schwartz S. Exclusion of m6A from splice-site proximal regions by the exon junction complex dictates m6A topologies and mRNA stability. Mol Cell 2023; 83:237-251.e7. [PMID: 36599352 DOI: 10.1016/j.molcel.2022.12.026] [Citation(s) in RCA: 82] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 11/04/2022] [Accepted: 12/21/2022] [Indexed: 01/05/2023]
Abstract
N6-methyladenosine (m6A), a widespread destabilizing mark on mRNA, is non-uniformly distributed across the transcriptome, yet the basis for its selective deposition is unknown. Here, we propose that m6A deposition is not selective. Instead, it is exclusion based: m6A consensus motifs are methylated by default, unless they are within a window of ∼100 nt from a splice junction. A simple model which we extensively validate, relying exclusively on presence of m6A motifs and exon-intron architecture, allows in silico recapitulation of experimentally measured m6A profiles. We provide evidence that exclusion from splice junctions is mediated by the exon junction complex (EJC), potentially via physical occlusion, and that previously observed associations between exon-intron architecture and mRNA decay are mechanistically mediated via m6A. Our findings establish a mechanism coupling nuclear mRNA splicing and packaging with the covalent installation of m6A, in turn controlling cytoplasmic decay.
Collapse
Affiliation(s)
- Anna Uzonyi
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7630031, Israel
| | - David Dierks
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7630031, Israel
| | - Ronit Nir
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7630031, Israel
| | - Oh Sung Kwon
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Ursula Toth
- Center for Anatomy & Cell Biology, Medical University of Vienna, 1090 Vienna, Austria
| | - Isabelle Barbosa
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Cindy Burel
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Alexander Brandis
- Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot 7630031, Israel
| | - Walter Rossmanith
- Center for Anatomy & Cell Biology, Medical University of Vienna, 1090 Vienna, Austria
| | - Hervé Le Hir
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Boris Slobodin
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7630031, Israel; Department of Biochemistry, Rappaport Faculty of Medicine, Technion - Israel Institute of Technology, Haifa 31096, Israel.
| | - Schraga Schwartz
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7630031, Israel.
| |
Collapse
|