1
|
Xiao Z, Li W, Moon H, Roell GW, Chen Y, Tang YJ. Generative Artificial Intelligence GPT-4 Accelerates Knowledge Mining and Machine Learning for Synthetic Biology. ACS Synth Biol 2023; 12:2973-2982. [PMID: 37682043 DOI: 10.1021/acssynbio.3c00310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Knowledge mining from synthetic biology journal articles for machine learning (ML) applications is a labor-intensive process. The development of natural language processing (NLP) tools, such as GPT-4, can accelerate the extraction of published information related to microbial performance under complex strain engineering and bioreactor conditions. As a proof of concept, we proposed prompt engineering for a GPT-4 workflow pipeline to extract knowledge from 176 publications on two oleaginous yeasts (Yarrowia lipolytica and Rhodosporidium toruloides). After human intervention, the pipeline obtained a total of 2037 data instances. The structured data sets and feature selections enabled ML approaches (e.g., a random forest model) to predict Yarrowia fermentation titers with decent accuracy (R2 of 0.86 for unseen test data). Via transfer learning, the trained model could assess the production potential of the engineered nonconventional yeast, R. toruloides, for which there are fewer published reports. This work demonstrated the potential of generative artificial intelligence to streamline information extraction from research articles, thereby facilitating fermentation predictions and biomanufacturing development.
Collapse
Affiliation(s)
- Zhengyang Xiao
- Department of Energy, Environmental, and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Wenyu Li
- Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Hannah Moon
- ImpactDB LLC, St. Louis, Missouri 63105, United States
- Clayton High School, 1 Mark Twain Cir, Clayton, Missouri 63105, United States
| | - Garrett W Roell
- ImpactDB LLC, St. Louis, Missouri 63105, United States
- Department of Molecular Biosciences & Bioengineering, University of Hawaii at Manoa, Honolulu, Hawaii 96822, United States
| | - Yixin Chen
- Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Yinjie J Tang
- Department of Energy, Environmental, and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| |
Collapse
|
2
|
Volk MJ, Lourentzou I, Mishra S, Vo LT, Zhai C, Zhao H. Biosystems Design by Machine Learning. ACS Synth Biol 2020; 9:1514-1533. [PMID: 32485108 DOI: 10.1021/acssynbio.0c00129] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Biosystems such as enzymes, pathways, and whole cells have been increasingly explored for biotechnological applications. However, the intricate connectivity and resulting complexity of biosystems poses a major hurdle in designing biosystems with desirable features. As -omics and other high throughput technologies have been rapidly developed, the promise of applying machine learning (ML) techniques in biosystems design has started to become a reality. ML models enable the identification of patterns within complicated biological data across multiple scales of analysis and can augment biosystems design applications by predicting new candidates for optimized performance. ML is being used at every stage of biosystems design to help find nonobvious engineering solutions with fewer design iterations. In this review, we first describe commonly used models and modeling paradigms within ML. We then discuss some applications of these models that have already shown success in biotechnological applications. Moreover, we discuss successful applications at all scales of biosystems design, including nucleic acids, genetic circuits, proteins, pathways, genomes, and bioprocesses. Finally, we discuss some limitations of these methods and potential solutions as well as prospects of the combination of ML and biosystems design.
Collapse
|
3
|
Wehrs M, Tanjore D, Eng T, Lievense J, Pray TR, Mukhopadhyay A. Engineering Robust Production Microbes for Large-Scale Cultivation. Trends Microbiol 2019; 27:524-537. [PMID: 30819548 DOI: 10.1016/j.tim.2019.01.006] [Citation(s) in RCA: 119] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Revised: 01/11/2019] [Accepted: 01/23/2019] [Indexed: 11/27/2022]
Abstract
Systems biology and synthetic biology are increasingly used to examine and modulate complex biological systems. As such, many issues arising during scaling-up microbial production processes can be addressed using these approaches. We review differences between laboratory-scale cultures and larger-scale processes to provide a perspective on those strain characteristics that are especially important during scaling. Systems biology has been used to examine a range of microbial systems for their response in bioreactors to fluctuations in nutrients, dissolved gases, and other stresses. Synthetic biology has been used both to assess and modulate strain response, and to engineer strains to improve production. We discuss these approaches and tools in the context of their use in engineering robust microbes for applications in large-scale production.
Collapse
Affiliation(s)
- Maren Wehrs
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Institut für Genetik, Technische Universität Braunschweig, Braunschweig, Germany; Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA 94608, USA
| | - Deepti Tanjore
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Advanced Biofuels and Bioproducts Process Development Unit, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Thomas Eng
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA 94608, USA
| | | | - Todd R Pray
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Advanced Biofuels and Bioproducts Process Development Unit, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Aindrila Mukhopadhyay
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA 94608, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
| |
Collapse
|
4
|
Walker RSK, Pretorius IS. Applications of Yeast Synthetic Biology Geared towards the Production of Biopharmaceuticals. Genes (Basel) 2018; 9:E340. [PMID: 29986380 PMCID: PMC6070867 DOI: 10.3390/genes9070340] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Revised: 07/01/2018] [Accepted: 07/02/2018] [Indexed: 12/18/2022] Open
Abstract
Engineered yeast are an important production platform for the biosynthesis of high-value compounds with medical applications. Recent years have witnessed several new developments in this area, largely spurred by advances in the field of synthetic biology and the elucidation of natural metabolic pathways. This minireview presents an overview of synthetic biology applications for the heterologous biosynthesis of biopharmaceuticals in yeast and demonstrates the power and potential of yeast cell factories by highlighting several recent examples. In addition, an outline of emerging trends in this rapidly-developing area is discussed, hinting upon the potential state-of-the-art in the years ahead.
Collapse
Affiliation(s)
- Roy S K Walker
- Department of Molecular Sciences, Macquarie University, Sydney 2109, Australia.
| | | |
Collapse
|
5
|
Oyetunde T, Bao FS, Chen JW, Martin HG, Tang YJ. Leveraging knowledge engineering and machine learning for microbial bio-manufacturing. Biotechnol Adv 2018; 36:1308-1315. [DOI: 10.1016/j.biotechadv.2018.04.008] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Revised: 02/27/2018] [Accepted: 04/26/2018] [Indexed: 12/21/2022]
|
6
|
Ding S, Liao X, Tu W, Wu L, Tian Y, Sun Q, Chen J, Hu QN. EcoSynther: A Customized Platform To Explore the Biosynthetic Potential in E. coli. ACS Chem Biol 2017; 12:2823-2829. [PMID: 28952720 DOI: 10.1021/acschembio.7b00605] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Developing computational tools for a chassis-centered biosynthetic pathway design is very important for a productive heterologous biosynthesis system by considering enormous foreign biosynthetic reactions. For many cases, a pathway to produce a target molecule consists of both native and heterologous reactions when utilizing a microbial organism as the host organism. Due to tens of thousands of biosynthetic reactions existing in nature, it is not trivial to identify which could be served as heterologous ones to produce the target molecule in a specific organism. In the present work, we integrate more than 10,000 E. coli non-native reactions and utilize a probability-based algorithm to search pathways. Moreover, we built a user-friendly Web server named EcoSynther. It is able to explore the precursors and heterologous reactions needed to produce a target molecule in Escherichia coli K12 MG1655 and then applies flux balance analysis to calculate theoretical yields of each candidate pathway. Compared with other chassis-centered biosynthetic pathway design tools, EcoSynther has two unique features: (1) allow for automatic search without knowing a precursor in E. coli and (2) evaluate the candidate pathways under constraints from E. coli physiological states and growth conditions. EcoSynther is available at http://www.rxnfinder.org/ecosynther/ .
Collapse
Affiliation(s)
- Shaozhen Ding
- Shanghai Institutes
for Biological Sciences, Chinese Academy of Sciences, Shanghai 200333, People’s Republic of China
- Tianjin Institute
of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, People’s Republic of China
| | - Xiaoping Liao
- Tianjin Institute
of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, People’s Republic of China
| | - Weizhong Tu
- Wuhan LifeSynther
Science and Technology Co. Limited, Wuhan, 430070, People’s Republic of China
| | - Ling Wu
- Tianjin Institute
of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, People’s Republic of China
| | - Yu Tian
- Tianjin Institute
of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, People’s Republic of China
- University of
Chinese Academy of Sciences, Beijing 100864, People’s Republic of China
| | - Qiuping Sun
- Tianjin Institute
of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, People’s Republic of China
| | - Junni Chen
- Wuhan LifeSynther
Science and Technology Co. Limited, Wuhan, 430070, People’s Republic of China
| | - Qian-Nan Hu
- Shanghai Institutes
for Biological Sciences, Chinese Academy of Sciences, Shanghai 200333, People’s Republic of China
- Tianjin Institute
of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, People’s Republic of China
| |
Collapse
|
7
|
Paramasivan K, Mutturi S. Progress in terpene synthesis strategies through engineering of Saccharomyces cerevisiae. Crit Rev Biotechnol 2017; 37:974-989. [DOI: 10.1080/07388551.2017.1299679] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
| | - Sarma Mutturi
- CSIR-Central Food Technological Research Institute, Mysore, India
| |
Collapse
|
8
|
Wu G, Yan Q, Jones JA, Tang YJ, Fong SS, Koffas MA. Metabolic Burden: Cornerstones in Synthetic Biology and Metabolic Engineering Applications. Trends Biotechnol 2016; 34:652-664. [DOI: 10.1016/j.tibtech.2016.02.010] [Citation(s) in RCA: 365] [Impact Index Per Article: 45.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2016] [Revised: 02/18/2016] [Accepted: 02/19/2016] [Indexed: 01/23/2023]
|
9
|
Wu SG, Shimizu K, Tang JKH, Tang YJ. Facilitate Collaborations among Synthetic Biology, Metabolic Engineering and Machine Learning. CHEMBIOENG REVIEWS 2016. [DOI: 10.1002/cben.201500024] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
10
|
Stanford NJ, Millard P, Swainston N. RobOKoD: microbial strain design for (over)production of target compounds. Front Cell Dev Biol 2015; 3:17. [PMID: 25853130 PMCID: PMC4371745 DOI: 10.3389/fcell.2015.00017] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 02/25/2015] [Indexed: 11/16/2022] Open
Abstract
Sustainable production of target compounds such as biofuels and high-value chemicals for pharmaceutical, agrochemical, and chemical industries is becoming an increasing priority given their current dependency upon diminishing petrochemical resources. Designing these strains is difficult, with current methods focusing primarily on knocking-out genes, dismissing other vital steps of strain design including the overexpression and dampening of genes. The design predictions from current methods also do not translate well-into successful strains in the laboratory. Here, we introduce RobOKoD (Robust, Overexpression, Knockout and Dampening), a method for predicting strain designs for overproduction of targets. The method uses flux variability analysis to profile each reaction within the system under differing production percentages of target-compound and biomass. Using these profiles, reactions are identified as potential knockout, overexpression, or dampening targets. The identified reactions are ranked according to their suitability, providing flexibility in strain design for users. The software was tested by designing a butanol-producing Escherichia coli strain, and was compared against the popular OptKnock and RobustKnock methods. RobOKoD shows favorable design predictions, when predictions from these methods are compared to a successful butanol-producing experimentally-validated strain. Overall RobOKoD provides users with rankings of predicted beneficial genetic interventions with which to support optimized strain design.
Collapse
Affiliation(s)
- Natalie J Stanford
- Manchester Institute of Biotechnology, University of Manchester Manchester, UK ; School of Computer Science, University of Manchester Manchester, UK
| | - Pierre Millard
- Manchester Institute of Biotechnology, University of Manchester Manchester, UK ; School of Computer Science, University of Manchester Manchester, UK ; INSA, UPS, INP, LISBP, Université de Toulouse Toulouse, France ; INRA, UMR792, Ingénierie des Systèmes Biologiques et des Procédés Toulouse, France ; Centre National de la Recherche Scientifique, UMR5504 Toulouse, France
| | - Neil Swainston
- Manchester Institute of Biotechnology, University of Manchester Manchester, UK ; School of Computer Science, University of Manchester Manchester, UK
| |
Collapse
|
11
|
Dai Z, Liu Y, Guo J, Huang L, Zhang X. Yeast synthetic biology for high-value metabolites. FEMS Yeast Res 2014; 15:1-11. [DOI: 10.1111/1567-1364.12187] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 04/30/2014] [Accepted: 07/15/2014] [Indexed: 01/08/2023] Open
Affiliation(s)
- Zhubo Dai
- Key Laboratory of Systems Microbial Biotechnology; Tianjin Institute of Industrial Biotechnology; Chinese Academy of Sciences; Tianjin China
| | - Yi Liu
- Key Laboratory of Systems Microbial Biotechnology; Tianjin Institute of Industrial Biotechnology; Chinese Academy of Sciences; Tianjin China
| | - Juan Guo
- National Resource Center for Chinese Materia Medica; China Academy of Chinese Medical Sciences; Beijing China
| | - Luqi Huang
- National Resource Center for Chinese Materia Medica; China Academy of Chinese Medical Sciences; Beijing China
| | - Xueli Zhang
- Key Laboratory of Systems Microbial Biotechnology; Tianjin Institute of Industrial Biotechnology; Chinese Academy of Sciences; Tianjin China
| |
Collapse
|
12
|
Elucidation of intrinsic biosynthesis yields using 13C-based metabolism analysis. Microb Cell Fact 2014; 13:42. [PMID: 24642094 PMCID: PMC3994946 DOI: 10.1186/1475-2859-13-42] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2013] [Accepted: 03/12/2014] [Indexed: 11/10/2022] Open
Abstract
This paper discusses the use of 13C-based metabolism analysis for the assessment of intrinsic product yields - the actual carbon contribution from a single carbon substrate to the final product via a specific biosynthesis route - in the following four cases. First, undefined nutrients (such as yeast extract) in fermentation may contribute significantly to product synthesis, which can be quantified through an isotopic dilution method. Second, product and biomass synthesis may be dependent on the co-metabolism of multiple-carbon sources. 13C labeling experiments can track the fate of each carbon substrate in the cell metabolism and identify which substrate plays a main role in product synthesis. Third, 13C labeling can validate and quantify the contribution of the engineered pathway (versus the native pathway) to the product synthesis. Fourth, the loss of catabolic energy due to cell maintenance (energy used for functions other than production of new cell components) and low P/O ratio (Phosphate/Oxygen Ratio) significantly reduces product yields. Therefore, 13C-metabolic flux analysis is needed to assess the influence of suboptimal energy metabolism on microbial productivity, and determine how ATP/NAD(P)H are partitioned among various cellular functions. Since product yield is a major determining factor in the commercialization of a microbial cell factory, we foresee that 13C-isotopic labeling experiments, even without performing extensive flux calculations, can play a valuable role in the development and verification of microbial cell factories.
Collapse
|
13
|
Misra A, Conway MF, Johnnie J, Qureshi TM, Lige B, Derrick AM, Agbo EC, Sriram G. Metabolic analyses elucidate non-trivial gene targets for amplifying dihydroartemisinic acid production in yeast. Front Microbiol 2013; 4:200. [PMID: 23898325 PMCID: PMC3724057 DOI: 10.3389/fmicb.2013.00200] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2013] [Accepted: 06/25/2013] [Indexed: 11/13/2022] Open
Abstract
Synthetic biology enables metabolic engineering of industrial microbes to synthesize value-added molecules. In this, a major challenge is the efficient redirection of carbon to the desired metabolic pathways. Pinpointing strategies toward this goal requires an in-depth investigation of the metabolic landscape of the organism, particularly primary metabolism, to identify precursor and cofactor availability for the target compound. The potent antimalarial therapeutic artemisinin and its precursors are promising candidate molecules for production in microbial hosts. Recent advances have demonstrated the production of artemisinin precursors in engineered yeast strains as an alternative to extraction from plants. We report the application of in silico and in vivo metabolic pathway analyses to identify metabolic engineering targets to improve the yield of the direct artemisinin precursor dihydroartemisinic acid (DHA) in yeast. First, in silico extreme pathway (ExPa) analysis identified NADPH-malic enzyme and the oxidative pentose phosphate pathway (PPP) as mechanisms to meet NADPH demand for DHA synthesis. Next, we compared key DHA-synthesizing ExPas to the metabolic flux distributions obtained from in vivo (13)C metabolic flux analysis of a DHA-synthesizing strain. This comparison revealed that knocking out ethanol synthesis and overexpressing glucose-6-phosphate dehydrogenase in the oxidative PPP (gene YNL241C) or the NADPH-malic enzyme ME2 (YKL029C) are vital steps toward overproducing DHA. Finally, we employed in silico flux balance analysis and minimization of metabolic adjustment on a yeast genome-scale model to identify gene knockouts for improving DHA yields. The best strategy involved knockout of an oxaloacetate transporter (YKL120W) and an aspartate aminotransferase (YKL106W), and was predicted to improve DHA yields by 70-fold. Collectively, our work elucidates multiple non-trivial metabolic engineering strategies for improving DHA yield in yeast.
Collapse
Affiliation(s)
- Ashish Misra
- Department of Chemical and Biomolecular Engineering, University of MarylandCollege Park, MD, USA
| | - Matthew F. Conway
- Department of Chemical and Biomolecular Engineering, University of MarylandCollege Park, MD, USA
| | - Joseph Johnnie
- Institute for Systems Engineering, University of MarylandCollege Park, MD, USA
| | - Tabish M. Qureshi
- Department of Chemical and Biomolecular Engineering, University of MarylandCollege Park, MD, USA
| | - Bao Lige
- Fyodor BiotechnologiesBaltimore, MD, USA
| | | | | | - Ganesh Sriram
- Department of Chemical and Biomolecular Engineering, University of MarylandCollege Park, MD, USA
| |
Collapse
|
14
|
Metabolic engineering of Synechocystis sp. strain PCC 6803 for isobutanol production. Appl Environ Microbiol 2012. [PMID: 23183979 DOI: 10.1128/aem.02827-12] [Citation(s) in RCA: 104] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Global warming and decreasing fossil fuel reserves have prompted great interest in the synthesis of advanced biofuels from renewable resources. In an effort to address these concerns, we performed metabolic engineering of the cyanobacterium Synechocystis sp. strain PCC 6803 to develop a strain that can synthesize isobutanol under both autotrophic and mixotrophic conditions. With the expression of two heterologous genes from the Ehrlich pathway, the engineered strain can accumulate 90 mg/liter of isobutanol from 50 mM bicarbonate in a gas-tight shaking flask. The strain does not require any inducer (i.e., isopropyl β-d-1-thiogalactopyranoside [IPTG]) or antibiotics to maintain its isobutanol production. In the presence of glucose, isobutanol synthesis is only moderately promoted (titer = 114 mg/liter). Based on isotopomer analysis, we found that, compared to the wild-type strain, the mutant significantly reduced its glucose utilization and mainly employed autotrophic metabolism for biomass growth and isobutanol production. Since isobutanol is toxic to the cells and may also be degraded photochemically by hydroxyl radicals during the cultivation process, we employed in situ removal of the isobutanol using oleyl alcohol as a solvent trap. This resulted in a final net concentration of 298 mg/liter of isobutanol under mixotrophic culture conditions.
Collapse
|