1
|
Gilliot PA, Gorochowski TE. Transfer learning for cross-context prediction of protein expression from 5'UTR sequence. Nucleic Acids Res 2024:gkae491. [PMID: 38864396 DOI: 10.1093/nar/gkae491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 04/28/2024] [Accepted: 05/28/2024] [Indexed: 06/13/2024] Open
Abstract
Model-guided DNA sequence design can accelerate the reprogramming of living cells. It allows us to engineer more complex biological systems by removing the need to physically assemble and test each potential design. While mechanistic models of gene expression have seen some success in supporting this goal, data-centric, deep learning-based approaches often provide more accurate predictions. This accuracy, however, comes at a cost - a lack of generalization across genetic and experimental contexts that has limited their wider use outside the context in which they were trained. Here, we address this issue by demonstrating how a simple transfer learning procedure can effectively tune a pre-trained deep learning model to predict protein translation rate from 5' untranslated region (5'UTR) sequence for diverse contexts in Escherichia coli using a small number of new measurements. This allows for important model features learnt from expensive massively parallel reporter assays to be easily transferred to new settings. By releasing our trained deep learning model and complementary calibration procedure, this study acts as a starting point for continually refined model-based sequence design that builds on previous knowledge and future experimental efforts.
Collapse
Affiliation(s)
- Pierre-Aurélien Gilliot
- School of Biological Sciences, University of Bristol, 24 Tyndall Avenue, Bristol BS8 1TQ, UK
| | - Thomas E Gorochowski
- School of Biological Sciences, University of Bristol, 24 Tyndall Avenue, Bristol BS8 1TQ, UK
- BrisEngBio, School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, UK
| |
Collapse
|
2
|
Liu J, Liu J, Li J, Zhao X, Sun G, Qiao Q, Shi T, Che B, Chen J, Zhuang Q, Wang Y, Sun J, Zhu D, Zheng P. Reconstruction the feedback regulation of amino acid metabolism to develop a non-auxotrophic L-threonine producing Corynebacterium glutamicum. BIORESOUR BIOPROCESS 2024; 11:43. [PMID: 38664309 PMCID: PMC11045695 DOI: 10.1186/s40643-024-00753-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 03/28/2024] [Indexed: 04/28/2024] Open
Abstract
L-Threonine is an important feed additive with the third largest market size among the amino acids produced by microbial fermentation. The GRAS (generally regarded as safe) industrial workhorse Corynebacterium glutamicum is an attractive chassis for L-threonine production. However, the present L-threonine production in C. glutamicum cannot meet the requirement of industrialization due to the relatively low production level of L-threonine and the accumulation of large amounts of by-products (such as L-lysine, L-isoleucine, and glycine). Herein, to enhance the L-threonine biosynthesis in C. glutamicum, releasing the aspartate kinase (LysC) and homoserine dehydrogenase (Hom) from feedback inhibition by L-lysine and L-threonine, respectively, and overexpressing four flux-control genes were performed. Next, to reduce the formation of by-products L-lysine and L-isoleucine without the cause of an auxotrophic phenotype, the feedback regulation of dihydrodipicolinate synthase (DapA) and threonine dehydratase (IlvA) was strengthened by replacing the native enzymes with heterologous analogues with more sensitive feedback inhibition by L-lysine and L-isoleucine, respectively. The resulting strain maintained the capability of synthesizing enough amounts of L-lysine and L-isoleucine for cell biomass formation but exhibited almost no extracellular accumulation of these two amino acids. To further enhance L-threonine production and reduce the by-product glycine, L-threonine exporter and homoserine kinase were overexpressed. Finally, the rationally engineered non-auxotrophic strain ZcglT9 produced 67.63 g/L (17.2% higher) L-threonine with a productivity of 1.20 g/L/h (108.0% higher) in fed-batch fermentation, along with significantly reduced by-product accumulation, representing the record for L-threonine production in C. glutamicum. In this study, we developed a strategy of reconstructing the feedback regulation of amino acid metabolism and successfully applied this strategy to de novo construct a non-auxotrophic L-threonine producing C. glutamicum. The main end by-products including L-lysine, L-isoleucine, and glycine were almost eliminated in fed-batch fermentation of the engineered C. glutamicum strain. This strategy can also be used for engineering producing strains for other amino acids and derivatives.
Collapse
Affiliation(s)
- Jianhang Liu
- State Key Laboratory of Biobased Material and Green Papermaking, Qilu University of Technology, Shandong Academy of Sciences, Jinan, 250353, China
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
- Shandong Provincial Key Laboratory of Microbial Engineering, School of Bioengineering, Qilu University of Technology, Shandong Academy of Sciences, Jinan, 250353, China
| | - Jiao Liu
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Jiajun Li
- State Key Laboratory of Biobased Material and Green Papermaking, Qilu University of Technology, Shandong Academy of Sciences, Jinan, 250353, China
- Shandong Provincial Key Laboratory of Microbial Engineering, School of Bioengineering, Qilu University of Technology, Shandong Academy of Sciences, Jinan, 250353, China
| | - Xiaojia Zhao
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Guannan Sun
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Qianqian Qiao
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Tuo Shi
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Bin Che
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Jiuzhou Chen
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Qianqian Zhuang
- State Key Laboratory of Biobased Material and Green Papermaking, Qilu University of Technology, Shandong Academy of Sciences, Jinan, 250353, China
- Shandong Provincial Key Laboratory of Microbial Engineering, School of Bioengineering, Qilu University of Technology, Shandong Academy of Sciences, Jinan, 250353, China
- Shandong University of Traditional Chinese Medicine, Jinan, 250355, China
| | - Yu Wang
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Jibin Sun
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Deqiang Zhu
- State Key Laboratory of Biobased Material and Green Papermaking, Qilu University of Technology, Shandong Academy of Sciences, Jinan, 250353, China.
- Shandong Provincial Key Laboratory of Microbial Engineering, School of Bioengineering, Qilu University of Technology, Shandong Academy of Sciences, Jinan, 250353, China.
| | - Ping Zheng
- Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China.
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China.
| |
Collapse
|
3
|
Höllerer S, Jeschek M. Ultradeep characterisation of translational sequence determinants refutes rare-codon hypothesis and unveils quadruplet base pairing of initiator tRNA and transcript. Nucleic Acids Res 2023; 51:2377-2396. [PMID: 36727459 PMCID: PMC10018350 DOI: 10.1093/nar/gkad040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 12/05/2022] [Accepted: 01/13/2023] [Indexed: 02/03/2023] Open
Abstract
Translation is a key determinant of gene expression and an important biotechnological engineering target. In bacteria, 5'-untranslated region (5'-UTR) and coding sequence (CDS) are well-known mRNA parts controlling translation and thus cellular protein levels. However, the complex interaction of 5'-UTR and CDS has so far only been studied for few sequences leading to non-generalisable and partly contradictory conclusions. Herein, we systematically assess the dynamic translation from over 1.2 million 5'-UTR-CDS pairs in Escherichia coli to investigate their collective effect using a new method for ultradeep sequence-function mapping. This allows us to disentangle and precisely quantify effects of various sequence determinants of translation. We find that 5'-UTR and CDS individually account for 53% and 20% of variance in translation, respectively, and show conclusively that, contrary to a common hypothesis, tRNA abundance does not explain expression changes between CDSs with different synonymous codons. Moreover, the obtained large-scale data provide clear experimental evidence for a base-pairing interaction between initiator tRNA and mRNA beyond the anticodon-codon interaction, an effect that is often masked for individual sequences and therefore inaccessible to low-throughput approaches. Our study highlights the indispensability of ultradeep sequence-function mapping to accurately determine the contribution of parts and phenomena involved in gene regulation.
Collapse
Affiliation(s)
- Simon Höllerer
- Department of Biosystems Science and Engineering, Swiss Federal Institute of Technology – ETH Zurich, Basel CH-4058, Switzerland
| | - Markus Jeschek
- To whom correspondence should be addressed. Tel: +49 941 943 3161; Fax: +49 941 943 2403;
| |
Collapse
|
4
|
Fages-Lartaud M, Mueller Y, Elie F, Courtade G, Hohmann-Marriott MF. Standard Intein Gene Expression Ramps (SIGER) for Protein-Independent Expression Control. ACS Synth Biol 2023; 12:1058-1071. [PMID: 36920366 PMCID: PMC10127266 DOI: 10.1021/acssynbio.2c00530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]
Abstract
Coordination of multigene expression is one of the key challenges of metabolic engineering for the development of cell factories. Constraints on translation initiation and early ribosome kinetics of mRNA are imposed by features of the 5'UTR in combination with the start of the gene, referred to as the "gene ramp", such as rare codons and mRNA secondary structures. These features strongly influence the translation yield and protein quality by regulating the ribosome distribution on mRNA strands. The utilization of genetic expression sequences, such as promoters and 5'UTRs in combination with different target genes, leads to a wide variety of gene ramp compositions with irregular translation rates, leading to unpredictable levels of protein yield and quality. Here, we present the Standard Intein Gene Expression Ramp (SIGER) system for controlling protein expression. The SIGER system makes use of inteins to decouple the translation initiation features from the gene of a target protein. We generated sequence-specific gene expression sequences for two inteins (DnaB and DnaX) that display defined levels of protein expression. Additionally, we used inteins that possess the ability to release the C-terminal fusion protein in vivo to avoid the impairment of protein functionality by the fused intein. Overall, our results show that SIGER systems are unique tools to mitigate the undesirable effects of gene ramp variation and to control the relative ratios of enzymes involved in molecular pathways. As a proof of concept of the potential of the system, we also used a SIGER system to express two difficult-to-produce proteins, GumM and CBM73.
Collapse
Affiliation(s)
- Maxime Fages-Lartaud
- Department of Biotechnology and Food Science, Norwegian University of Science and Technology, Trondheim N-7491, Norway
| | - Yasmin Mueller
- Department of Biotechnology and Food Science, Norwegian University of Science and Technology, Trondheim N-7491, Norway
| | - Florence Elie
- Department of Biotechnology and Food Science, Norwegian University of Science and Technology, Trondheim N-7491, Norway
| | - Gaston Courtade
- Department of Biotechnology and Food Science, Norwegian University of Science and Technology, Trondheim N-7491, Norway
| | - Martin Frank Hohmann-Marriott
- Department of Biotechnology and Food Science, Norwegian University of Science and Technology, Trondheim N-7491, Norway.,United Scientists CORE (Limited), Dunedin 9016, Aotearoa, New Zealand
| |
Collapse
|
5
|
Casas A, Bultelle M, Motraghi C, Kitney R. PASIV: A Pooled Approach-Based Workflow to Overcome Toxicity-Induced Design of Experiments Failures and Inefficiencies. ACS Synth Biol 2022; 11:1272-1291. [PMID: 35261238 PMCID: PMC8938949 DOI: 10.1021/acssynbio.1c00562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
![]()
We present here a
newly developed workflow—which we have
called PASIV—designed to provide a solution to a practical
problem with design of experiments (DoE) methodology: i.e., what can
be done if the scoping phase of the DoE cycle is severely hampered
by burden and toxicity issues (caused by either the metabolite or
an intermediary), making it unreliable or impossible to proceed to
the screening phase? PASIV—standing for pooled approach, screening,
identification, and visualization—was designed so the (viable)
region of interest can be made to appear through an interplay between
biology and software. This was achieved by combining multiplex construction
in a pooled approach (one-pot reaction) with a viability assay and
with a range of bioinformatics tools (including a novel construct
matching tool). PASIV was tested on the exemplar of the lycopene pathway—under
stressful constitutive expression—yielding a region of interest
with comparatively stronger producers.
Collapse
Affiliation(s)
- Alexis Casas
- Department of Bioengineering, Imperial College London, Exhibition Road, London SW7 2BX, United Kingdom
| | - Matthieu Bultelle
- Department of Bioengineering, Imperial College London, Exhibition Road, London SW7 2BX, United Kingdom
| | - Charles Motraghi
- Department of Bioengineering, Imperial College London, Exhibition Road, London SW7 2BX, United Kingdom
| | - Richard Kitney
- Department of Bioengineering, Imperial College London, Exhibition Road, London SW7 2BX, United Kingdom
| |
Collapse
|
6
|
Casas A, Bultelle M, Motraghi C, Kitney R. Removing the Bottleneck: Introducing cMatch - A Lightweight Tool for Construct-Matching in Synthetic Biology. Front Bioeng Biotechnol 2022; 9:785131. [PMID: 35083201 PMCID: PMC8784771 DOI: 10.3389/fbioe.2021.785131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 12/14/2021] [Indexed: 11/30/2022] Open
Abstract
We present a software tool, called cMatch, to reconstruct and identify synthetic genetic constructs from their sequences, or a set of sub-sequences—based on two practical pieces of information: their modular structure, and libraries of components. Although developed for combinatorial pathway engineering problems and addressing their quality control (QC) bottleneck, cMatch is not restricted to these applications. QC takes place post assembly, transformation and growth. It has a simple goal, to verify that the genetic material contained in a cell matches what was intended to be built - and when it is not the case, to locate the discrepancies and estimate their severity. In terms of reproducibility/reliability, the QC step is crucial. Failure at this step requires repetition of the construction and/or sequencing steps. When performed manually or semi-manually QC is an extremely time-consuming, error prone process, which scales very poorly with the number of constructs and their complexity. To make QC frictionless and more reliable, cMatch performs an operation we have called “construct-matching” and automates it. Construct-matching is more thorough than simple sequence-matching, as it matches at the functional level-and quantifies the matching at the individual component level and across the whole construct. Two algorithms (called CM_1 and CM_2) are presented. They differ according to the nature of their inputs. CM_1 is the core algorithm for construct-matching and is to be used when input sequences are long enough to cover constructs in their entirety (e.g., obtained with methods such as next generation sequencing). CM_2 is an extension designed to deal with shorter data (e.g., obtained with Sanger sequencing), and that need recombining. Both algorithms are shown to yield accurate construct-matching in a few minutes (even on hardware with limited processing power), together with a set of metrics that can be used to improve the robustness of the decision-making process. To ensure reliability and reproducibility, cMatch builds on the highly validated pairwise-matching Smith-Waterman algorithm. All the tests presented have been conducted on synthetic data for challenging, yet realistic constructs - and on real data gathered during studies on a metabolic engineering example (lycopene production).
Collapse
Affiliation(s)
- Alexis Casas
- Department of Bioengineering, Imperial College London, London, United Kingdom
| | - Matthieu Bultelle
- Department of Bioengineering, Imperial College London, London, United Kingdom
| | - Charles Motraghi
- Department of Bioengineering, Imperial College London, London, United Kingdom
| | - Richard Kitney
- Department of Bioengineering, Imperial College London, London, United Kingdom
| |
Collapse
|