1
|
La Fleur A, Shi Y, Seelig G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev 2024; 38:843-865. [PMID: 39362779 DOI: 10.1101/gad.351800.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding of cis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses on cis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.
Collapse
Affiliation(s)
- Alyssa La Fleur
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA;
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
2
|
Stroup EK, Ji Z. Delineating yeast cleavage and polyadenylation signals using deep learning. Genome Res 2024; 34:1066-1080. [PMID: 38914436 PMCID: PMC11368178 DOI: 10.1101/gr.278606.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 06/17/2024] [Indexed: 06/26/2024]
Abstract
3'-end cleavage and polyadenylation is an essential process for eukaryotic mRNA maturation. In yeast species, the polyadenylation signals that recruit the processing machinery are degenerate and remain poorly characterized compared with the well-defined regulatory elements in mammals. Here we address this issue by developing deep learning models to deconvolute degenerate cis-regulatory elements and quantify their positional importance in mediating yeast poly(A) site formation, cleavage heterogeneity, and strength. In S. cerevisiae, cleavage heterogeneity is promoted by the depletion of U-rich elements around poly(A) sites as well as multiple occurrences of upstream UA-rich elements. Sites with high cleavage heterogeneity show overall lower strength. The site strength and tandem site distances modulate alternative polyadenylation (APA) under the diauxic stress. Finally, we develop a deep learning model to reveal the distinct motif configuration of S. pombe poly(A) sites, which show more precise cleavage than S. cerevisiae Altogether, our deep learning models provide unprecedented insights into poly(A) site formation of yeast species, and our results highlight divergent poly(A) signals across distantly related species.
Collapse
Affiliation(s)
- Emily Kunce Stroup
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, Illinois 60611, USA
| | - Zhe Ji
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, Illinois 60611, USA;
- Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, Illinois 60628, USA
| |
Collapse
|
3
|
Cautereels C, Smets J, Bircham P, De Ruysscher D, Zimmermann A, De Rijk P, Steensels J, Gorkovskiy A, Masschelein J, Verstrepen KJ. Combinatorial optimization of gene expression through recombinase-mediated promoter and terminator shuffling in yeast. Nat Commun 2024; 15:1112. [PMID: 38326309 PMCID: PMC10850122 DOI: 10.1038/s41467-024-44997-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 01/12/2024] [Indexed: 02/09/2024] Open
Abstract
Microbes are increasingly employed as cell factories to produce biomolecules. This often involves the expression of complex heterologous biosynthesis pathways in host strains. Achieving maximal product yields and avoiding build-up of (toxic) intermediates requires balanced expression of every pathway gene. However, despite progress in metabolic modeling, the optimization of gene expression still heavily relies on trial-and-error. Here, we report an approach for in vivo, multiplexed Gene Expression Modification by LoxPsym-Cre Recombination (GEMbLeR). GEMbLeR exploits orthogonal LoxPsym sites to independently shuffle promoter and terminator modules at distinct genomic loci. This approach facilitates creation of large strain libraries, in which expression of every pathway gene ranges over 120-fold and each strain harbors a unique expression profile. When applied to the biosynthetic pathway of astaxanthin, an industrially relevant antioxidant, a single round of GEMbLeR improved pathway flux and doubled production titers. Together, this shows that GEMbLeR allows rapid and efficient gene expression optimization in heterologous biosynthetic pathways, offering possibilities for enhancing the performance of microbial cell factories.
Collapse
Affiliation(s)
- Charlotte Cautereels
- VIB Laboratory for Systems Biology, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium
- Laboratory of Genetics and Genomics, Center of Microbial and Plant Genetics, Department M2S, KU Leuven, Gaston Geenslaan 1, Leuven, 3001, Belgium
| | - Jolien Smets
- VIB Laboratory for Systems Biology, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium
- Laboratory of Genetics and Genomics, Center of Microbial and Plant Genetics, Department M2S, KU Leuven, Gaston Geenslaan 1, Leuven, 3001, Belgium
| | - Peter Bircham
- VIB Laboratory for Systems Biology, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium
- Laboratory of Genetics and Genomics, Center of Microbial and Plant Genetics, Department M2S, KU Leuven, Gaston Geenslaan 1, Leuven, 3001, Belgium
| | - Dries De Ruysscher
- Molecular Biotechnology of Plants and Micro-organisms, Department of Biology, KU Leuven, Kasteelpark Arenberg 31, box 2438, Leuven, 3001, Belgium
- Laboratory for Biomolecular Discovery & Engineering, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium
| | - Anna Zimmermann
- VIB Laboratory for Systems Biology, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium
- Laboratory of Genetics and Genomics, Center of Microbial and Plant Genetics, Department M2S, KU Leuven, Gaston Geenslaan 1, Leuven, 3001, Belgium
| | - Peter De Rijk
- Neuromics Support Facility, VIB Center for Molecular Neurology, VIB, Antwerp, 2610, Belgium
- Neuromics Support Facility, Department of Biomedical Sciences, University of Antwerp, Antwerp, 2610, Belgium
| | - Jan Steensels
- VIB Laboratory for Systems Biology, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium
- Laboratory of Genetics and Genomics, Center of Microbial and Plant Genetics, Department M2S, KU Leuven, Gaston Geenslaan 1, Leuven, 3001, Belgium
| | - Anton Gorkovskiy
- VIB Laboratory for Systems Biology, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium
- Laboratory of Genetics and Genomics, Center of Microbial and Plant Genetics, Department M2S, KU Leuven, Gaston Geenslaan 1, Leuven, 3001, Belgium
| | - Joleen Masschelein
- Molecular Biotechnology of Plants and Micro-organisms, Department of Biology, KU Leuven, Kasteelpark Arenberg 31, box 2438, Leuven, 3001, Belgium
- Laboratory for Biomolecular Discovery & Engineering, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium
| | - Kevin J Verstrepen
- VIB Laboratory for Systems Biology, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium.
- Laboratory of Genetics and Genomics, Center of Microbial and Plant Genetics, Department M2S, KU Leuven, Gaston Geenslaan 1, Leuven, 3001, Belgium.
| |
Collapse
|
4
|
Reis-Claro I, Silva MI, Moutinho A, Garcia BC, Pereira-Castro I, Moreira A. Application of the iPLUS non-coding sequence in improving biopharmaceuticals production. Front Bioeng Biotechnol 2024; 12:1355957. [PMID: 38380261 PMCID: PMC10876878 DOI: 10.3389/fbioe.2024.1355957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 01/25/2024] [Indexed: 02/22/2024] Open
Abstract
The biotechnological landscape has witnessed significant growth in biological therapeutics particularly in the field of recombinant protein production. Here we investigate the function of 3'UTR cis-regulatory elements in increasing mRNA and protein levels in different biological therapeutics and model systems, spanning from monoclonal antibodies to mRNA vaccines. We explore the regulatory function of iPLUS - a universal sequence capable of consistently augmenting recombinant protein levels. By incorporating iPLUS in a vector to express a monoclonal antibody used in immunotherapy, in a mammalian cell line used by the industry (ExpiCHO), trastuzumab production increases by 2-fold. As yeast Pichia pastoris is widely used in the manufacture of industrial enzymes and pharmaceuticals, we then used iPLUS in tandem (3x) and iPLUSv2 (a variant of iPLUS) to provide proof-of-concept data that it increases the production of a reporter protein more than 100-fold. As iPLUS functions by also increasing mRNA levels, we hypothesize that these sequences could be used as an asset in the mRNA vaccine industry. In fact, by including iPLUSv2 downstream of Spike we were able to double its production. Moreover, the same effect was observed when we introduced iPLUSv2 downstream of MAGEC2, a tumor-specific antigen tested for cancer mRNA vaccines. Taken together, our study provides data (TLR4) showing that iPLUS may be used as a valuable asset in a variety of systems used by the biotech and biopharmaceutical industry. Our results underscore the critical role of non-coding sequences in controlling gene expression, offering a promising avenue to accelerate, enhance, and cost-effectively optimize biopharmaceutical production processes.
Collapse
Affiliation(s)
- Inês Reis-Claro
- Gene Regulation, i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto, Portugal
| | - Maria Inês Silva
- Gene Regulation, i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto, Portugal
| | - Ana Moutinho
- Gene Regulation, i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto, Portugal
| | - Beatriz C. Garcia
- Gene Regulation, i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto, Portugal
| | - Isabel Pereira-Castro
- Gene Regulation, i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto, Portugal
- IBMC—Instituto de Biologia Molecular e Celular, Universidade do Porto, Porto, Portugal
| | - Alexandra Moreira
- Gene Regulation, i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto, Portugal
- IBMC—Instituto de Biologia Molecular e Celular, Universidade do Porto, Porto, Portugal
- ICBAS—Instituto de Ciências Biomédicas Abel Salazar, Universidade do Porto, Porto, Portugal
| |
Collapse
|
5
|
Perchlik M, Sasse A, Mostafavi S, Fields S, Cuperus JT. Impact on splicing in Saccharomyces cerevisiae of random 50-base sequences inserted into an intron. RNA (NEW YORK, N.Y.) 2023; 30:52-67. [PMID: 37879864 PMCID: PMC10726166 DOI: 10.1261/rna.079752.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 10/18/2023] [Indexed: 10/27/2023]
Abstract
Intron splicing is a key regulatory step in gene expression in eukaryotes. Three sequence elements required for splicing-5' and 3' splice sites and a branchpoint-are especially well-characterized in Saccharomyces cerevisiae, but our understanding of additional intron features that impact splicing in this organism is incomplete, due largely to its small number of introns. To overcome this limitation, we constructed a library in S. cerevisiae of random 50-nt (N50) elements individually inserted into the intron of a reporter gene and quantified canonical splicing and the use of cryptic splice sites by sequencing analysis. More than 70% of approximately 140,000 N50 elements reduced splicing by at least 20%. N50 features, including higher GC content, presence of GU repeats, and stronger predicted secondary structure of its pre-mRNA, correlated with reduced splicing efficiency. A likely basis for the reduced splicing of such a large proportion of variants is the formation of RNA structures that pair N50 bases-such as the GU repeats-with other bases specifically within the reporter pre-mRNA analyzed. However, multiple models were unable to explain more than a small fraction of the variance in splicing efficiency across the library, suggesting that complex nonlinear interactions in RNA structures are not accurately captured by RNA structure prediction methods. Our results imply that the specific context of a pre-mRNA may determine the bases allowable in an intron to prevent secondary structures that reduce splicing. This large data set can serve as a resource for further exploration of splicing mechanisms.
Collapse
Affiliation(s)
- Molly Perchlik
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Alexander Sasse
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Stanley Fields
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
- Department of Medicine, University of Washington, Seattle, Washington 98195, USA
| | - Josh T Cuperus
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
6
|
Controlling gene expression with deep generative design of regulatory DNA. Nat Commun 2022; 13:5099. [PMID: 36042233 PMCID: PMC9427793 DOI: 10.1038/s41467-022-32818-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 08/18/2022] [Indexed: 11/25/2022] Open
Abstract
Design of de novo synthetic regulatory DNA is a promising avenue to control gene expression in biotechnology and medicine. Using mutagenesis typically requires screening sizable random DNA libraries, which limits the designs to span merely a short section of the promoter and restricts their control of gene expression. Here, we prototype a deep learning strategy based on generative adversarial networks (GAN) by learning directly from genomic and transcriptomic data. Our ExpressionGAN can traverse the entire regulatory sequence-expression landscape in a gene-specific manner, generating regulatory DNA with prespecified target mRNA levels spanning the whole gene regulatory structure including coding and adjacent non-coding regions. Despite high sequence divergence from natural DNA, in vivo measurements show that 57% of the highly-expressed synthetic sequences surpass the expression levels of highly-expressed natural controls. This demonstrates the applicability and relevance of deep generative design to expand our knowledge and control of gene expression regulation in any desired organism, condition or tissue. Design of de novo synthetic regulatory DNA is a promising avenue to control gene expression in biotechnology and medicine. Here the authors present EspressionGAN, a generative adversarial network that uses genomic and transcriptomic data to generate regulatory sequences.
Collapse
|