1
|
Jin J, Wu Y, Cao P, Zheng X, Zhang Q, Chen Y. Potential and challenge in accelerating high-value conversion of CO 2 in microbial electrosynthesis system via data-driven approach. BIORESOURCE TECHNOLOGY 2024; 412:131380. [PMID: 39214179 DOI: 10.1016/j.biortech.2024.131380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 08/26/2024] [Accepted: 08/27/2024] [Indexed: 09/04/2024]
Abstract
Microbial electrosynthesis for CO2 utilization (MESCU) producing valuable chemicals with high energy density has garnered attention due to its long-term stability and high coulombic efficiency. The data-driven approaches offer a promising avenue by leveraging existing data to uncover the underlying patterns. This comprehensive review firstly uncovered the potentials of utilizing data-driven approaches to enhance high-value conversion of CO2 via MESCU. Firstly, critical challenges of MESCU advancing have been identified, including reactor configuration, cathode design, and microbial analysis. Subsequently, the potential of data-driven approaches to tackle the corresponding challenges, encompassing the identification of pivotal parameters governing reactor setup and cathode design, alongside the decipheration of omics data derived from microbial communities, have been discussed. Correspondingly, the future direction of data-driven approaches in assisting the application of MESCU has been addressed. This review offers guidance and theoretical support for future data-driven applications to accelerate MESCU research and potential industrialization.
Collapse
Affiliation(s)
- Jiasheng Jin
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China
| | - Yang Wu
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China.
| | - Peiyu Cao
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China
| | - Xiong Zheng
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; Key Laboratory of Yangtze River Water Environment, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China.
| | - Qingran Zhang
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China
| | - Yinguang Chen
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China
| |
Collapse
|
2
|
Berruto CA, Demirer GS. Engineering agricultural soil microbiomes and predicting plant phenotypes. Trends Microbiol 2024; 32:858-873. [PMID: 38429182 DOI: 10.1016/j.tim.2024.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 02/02/2024] [Accepted: 02/06/2024] [Indexed: 03/03/2024]
Abstract
Plant growth-promoting rhizobacteria (PGPR) can improve crop yields, nutrient use efficiency, plant tolerance to stressors, and confer benefits to future generations of crops grown in the same soil. Unlocking the potential of microbial communities in the rhizosphere and endosphere is therefore of great interest for sustainable agriculture advancements. Before plant microbiomes can be engineered to confer desirable phenotypic effects on their plant hosts, a deeper understanding of the interacting factors influencing rhizosphere community structure and function is needed. Dealing with this complexity is becoming more feasible using computational approaches. In this review, we discuss recent advances at the intersection of experimental and computational strategies for the investigation of plant-microbiome interactions and the engineering of desirable soil microbiomes.
Collapse
Affiliation(s)
- Chiara A Berruto
- Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Gozde S Demirer
- Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
3
|
Kundu P, Beura S, Mondal S, Das AK, Ghosh A. Machine learning for the advancement of genome-scale metabolic modeling. Biotechnol Adv 2024; 74:108400. [PMID: 38944218 DOI: 10.1016/j.biotechadv.2024.108400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 05/13/2024] [Accepted: 06/23/2024] [Indexed: 07/01/2024]
Abstract
Constraint-based modeling (CBM) has evolved as the core systems biology tool to map the interrelations between genotype, phenotype, and external environment. The recent advancement of high-throughput experimental approaches and multi-omics strategies has generated a plethora of new and precise information from wide-ranging biological domains. On the other hand, the continuously growing field of machine learning (ML) and its specialized branch of deep learning (DL) provide essential computational architectures for decoding complex and heterogeneous biological data. In recent years, both multi-omics and ML have assisted in the escalation of CBM. Condition-specific omics data, such as transcriptomics and proteomics, helped contextualize the model prediction while analyzing a particular phenotypic signature. At the same time, the advanced ML tools have eased the model reconstruction and analysis to increase the accuracy and prediction power. However, the development of these multi-disciplinary methodological frameworks mainly occurs independently, which limits the concatenation of biological knowledge from different domains. Hence, we have reviewed the potential of integrating multi-disciplinary tools and strategies from various fields, such as synthetic biology, CBM, omics, and ML, to explore the biochemical phenomenon beyond the conventional biological dogma. How the integrative knowledge of these intersected domains has improved bioengineering and biomedical applications has also been highlighted. We categorically explained the conventional genome-scale metabolic model (GEM) reconstruction tools and their improvement strategies through ML paradigms. Further, the crucial role of ML and DL in omics data restructuring for GEM development has also been briefly discussed. Finally, the case-study-based assessment of the state-of-the-art method for improving biomedical and metabolic engineering strategies has been elaborated. Therefore, this review demonstrates how integrating experimental and in silico strategies can help map the ever-expanding knowledge of biological systems driven by condition-specific cellular information. This multiview approach will elevate the application of ML-based CBM in the biomedical and bioengineering fields for the betterment of society and the environment.
Collapse
Affiliation(s)
- Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Satyajit Beura
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Suman Mondal
- P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Kumar Das
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
4
|
Turanli B, Gulfidan G, Aydogan OO, Kula C, Selvaraj G, Arga KY. Genome-scale metabolic models in translational medicine: the current status and potential of machine learning in improving the effectiveness of the models. Mol Omics 2024; 20:234-247. [PMID: 38444371 DOI: 10.1039/d3mo00152k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
The genome-scale metabolic model (GEM) has emerged as one of the leading modeling approaches for systems-level metabolic studies and has been widely explored for a broad range of organisms and applications. Owing to the development of genome sequencing technologies and available biochemical data, it is possible to reconstruct GEMs for model and non-model microorganisms as well as for multicellular organisms such as humans and animal models. GEMs will evolve in parallel with the availability of biological data, new mathematical modeling techniques and the development of automated GEM reconstruction tools. The use of high-quality, context-specific GEMs, a subset of the original GEM in which inactive reactions are removed while maintaining metabolic functions in the extracted model, for model organisms along with machine learning (ML) techniques could increase their applications and effectiveness in translational research in the near future. Here, we briefly review the current state of GEMs, discuss the potential contributions of ML approaches for more efficient and frequent application of these models in translational research, and explore the extension of GEMs to integrative cellular models.
Collapse
Affiliation(s)
- Beste Turanli
- Marmara University, Faculty of Engineering, Department of Bioengineering, Istanbul, Turkey.
- Health Biotechnology Joint Research and Application Center of Excellence, Istanbul, Turkey
| | - Gizem Gulfidan
- Marmara University, Faculty of Engineering, Department of Bioengineering, Istanbul, Turkey.
| | - Ozge Onluturk Aydogan
- Marmara University, Faculty of Engineering, Department of Bioengineering, Istanbul, Turkey.
| | - Ceyda Kula
- Marmara University, Faculty of Engineering, Department of Bioengineering, Istanbul, Turkey.
- Health Biotechnology Joint Research and Application Center of Excellence, Istanbul, Turkey
| | - Gurudeeban Selvaraj
- Concordia University, Centre for Research in Molecular Modeling & Department of Chemistry and Biochemistry, Quebec, Canada
- Saveetha Institute of Medical and Technical Sciences (SIMATS), Saveetha Dental College and Hospital, Department of Biomaterials, Bioinformatics Unit, Chennai, India
| | - Kazim Yalcin Arga
- Marmara University, Faculty of Engineering, Department of Bioengineering, Istanbul, Turkey.
- Health Biotechnology Joint Research and Application Center of Excellence, Istanbul, Turkey
- Marmara University, Genetic and Metabolic Diseases Research and Investigation Center, Istanbul, Turkey
| |
Collapse
|
5
|
Goshisht MK. Machine Learning and Deep Learning in Synthetic Biology: Key Architectures, Applications, and Challenges. ACS OMEGA 2024; 9:9921-9945. [PMID: 38463314 PMCID: PMC10918679 DOI: 10.1021/acsomega.3c05913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 01/19/2024] [Accepted: 01/30/2024] [Indexed: 03/12/2024]
Abstract
Machine learning (ML), particularly deep learning (DL), has made rapid and substantial progress in synthetic biology in recent years. Biotechnological applications of biosystems, including pathways, enzymes, and whole cells, are being probed frequently with time. The intricacy and interconnectedness of biosystems make it challenging to design them with the desired properties. ML and DL have a synergy with synthetic biology. Synthetic biology can be employed to produce large data sets for training models (for instance, by utilizing DNA synthesis), and ML/DL models can be employed to inform design (for example, by generating new parts or advising unrivaled experiments to perform). This potential has recently been brought to light by research at the intersection of engineering biology and ML/DL through achievements like the design of novel biological components, best experimental design, automated analysis of microscopy data, protein structure prediction, and biomolecular implementations of ANNs (Artificial Neural Networks). I have divided this review into three sections. In the first section, I describe predictive potential and basics of ML along with myriad applications in synthetic biology, especially in engineering cells, activity of proteins, and metabolic pathways. In the second section, I describe fundamental DL architectures and their applications in synthetic biology. Finally, I describe different challenges causing hurdles in the progress of ML/DL and synthetic biology along with their solutions.
Collapse
Affiliation(s)
- Manoj Kumar Goshisht
- Department of Chemistry, Natural and
Applied Sciences, University of Wisconsin—Green
Bay, Green
Bay, Wisconsin 54311-7001, United States
| |
Collapse
|
6
|
Hanke P, Parrello B, Vasieva O, Akins C, Chlenski P, Babnigg G, Henry C, Foflonker F, Brettin T, Antonopoulos D, Stevens R, Fonstein M. Engineering of increased L-Threonine production in bacteria by combinatorial cloning and machine learning. Metab Eng Commun 2023; 17:e00225. [PMID: 37435441 PMCID: PMC10331477 DOI: 10.1016/j.mec.2023.e00225] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 06/02/2023] [Accepted: 06/03/2023] [Indexed: 07/13/2023] Open
Abstract
The goal of this study is to develop a general strategy for bacterial engineering using an integrated synthetic biology and machine learning (ML) approach. This strategy was developed in the context of increasing L-threonine production in Escherichia coli ATCC 21277. A set of 16 genes was initially selected based on metabolic pathway relevance to threonine biosynthesis and used for combinatorial cloning to construct a set of 385 strains to generate training data (i.e., a range of L-threonine titers linked to each of the specific gene combinations). Hybrid (regression/classification) deep learning (DL) models were developed and used to predict additional gene combinations in subsequent rounds of combinatorial cloning for increased L-threonine production based on the training data. As a result, E. coli strains built after just three rounds of iterative combinatorial cloning and model prediction generated higher L-threonine titers (from 2.7 g/L to 8.4 g/L) than those of patented L-threonine strains being used as controls (4-5 g/L). Interesting combinations of genes in L-threonine production included deletions of the tdh, metL, dapA, and dhaM genes as well as overexpression of the pntAB, ppc, and aspC genes. Mechanistic analysis of the metabolic system constraints for the best performing constructs offers ways to improve the models by adjusting weights for specific gene combinations. Graph theory analysis of pairwise gene modifications and corresponding levels of L-threonine production also suggests additional rules that can be incorporated into future ML models.
Collapse
Affiliation(s)
- Paul Hanke
- Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL, 60439, USA
| | - Bruce Parrello
- University of Chicago, 5801 S. Ellis Ave, Chicago, IL, 60637, USA
| | - Olga Vasieva
- BSMI, 1818 Skokie Blvd., #201, Northbrook, IL, 60062, USA
| | - Chase Akins
- Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL, 60439, USA
| | - Philippe Chlenski
- Department of Computer Science, Columbia University, New York, NY, 10027, USA
| | - Gyorgy Babnigg
- Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL, 60439, USA
| | - Chris Henry
- Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL, 60439, USA
| | - Fatima Foflonker
- Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL, 60439, USA
| | - Thomas Brettin
- Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL, 60439, USA
| | | | - Rick Stevens
- Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL, 60439, USA
- University of Chicago, 5801 S. Ellis Ave, Chicago, IL, 60637, USA
| | - Michael Fonstein
- Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL, 60439, USA
| |
Collapse
|
7
|
Cai P, Liu S, Zhang D, Hu QN. MCF2Chem: A manually curated knowledge base of biosynthetic compound production. BIOTECHNOLOGY FOR BIOFUELS AND BIOPRODUCTS 2023; 16:167. [PMID: 37925500 PMCID: PMC10625697 DOI: 10.1186/s13068-023-02419-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 10/23/2023] [Indexed: 11/06/2023]
Abstract
BACKGROUND Microbes have been used as cell factories to synthesize various chemical compounds. Recent advances in synthetic biological technologies have accelerated the increase in the number and capacity of microbial cell factories; the variety and number of synthetic compounds produced via these cell factories have also grown substantially. However, no database is available that provides detailed information on the microbial cell factories and the synthesized compounds. RESULTS In this study, we established MCF2Chem, a manually curated knowledge base on the production of biosynthetic compounds using microbial cell factories. It contains 8888 items of production records related to 1231 compounds that were synthesizable by 590 microbial cell factories, including the production data of compounds (titer, yield, productivity, and content), strain culture information (culture medium, carbon source/precursor/substrate), fermentation information (mode, vessel, scale, and condition), and other information (e.g., strain modification method). The database contains statistical analyses data of compounds and microbial species. The data statistics of MCF2Chem showed that bacteria accounted for 60% of the species and that "fatty acids", "terpenoids", and "shikimates and phenylpropanoids" accounted for the top three chemical products. Escherichia coli, Saccharomyces cerevisiae, Yarrowia lipolytica, and Corynebacterium glutamicum synthesized 78% of these chemical compounds. Furthermore, we constructed a system to recommend microbial cell factories suitable for synthesizing target compounds and vice versa by combining MCF2Chem data, additional strain- and compound-related data, the phylogenetic relationships between strains, and compound similarities. CONCLUSIONS MCF2Chem provides a user-friendly interface for querying, browsing, and visualizing detailed statistical information on microbial cell factories and their synthesizable compounds. It is publicly available at https://mcf.lifesynther.com . This database may serve as a useful resource for synthetic biologists.
Collapse
Affiliation(s)
- Pengli Cai
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Sheng Liu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Dachuan Zhang
- Ecological Systems Design, Institute of Environmental Engineering, ETH Zurich, 8093, Zurich, Switzerland
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
8
|
Khamwachirapithak P, Sae-Tang K, Mhuantong W, Tanapongpipat S, Zhao XQ, Liu CG, Wei DQ, Champreda V, Runguphan W. Optimizing Ethanol Production in Saccharomyces cerevisiae at Ambient and Elevated Temperatures through Machine Learning-Guided Combinatorial Promoter Modifications. ACS Synth Biol 2023; 12:2897-2908. [PMID: 37681736 PMCID: PMC10594650 DOI: 10.1021/acssynbio.3c00199] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Indexed: 09/09/2023]
Abstract
Bioethanol has gained popularity in recent decades as an ecofriendly alternative to fossil fuels due to increasing concerns about global climate change. However, economically viable ethanol fermentation remains a challenge. High-temperature fermentation can reduce production costs, but Saccharomyces cerevisiae yeast strains normally ferment poorly under high temperatures. In this study, we present a machine learning (ML) approach to optimize bioethanol production in S. cerevisiae by fine-tuning the promoter activities of three endogenous genes. We created 216 combinatorial strains of S. cerevisiae by replacing native promoters with five promoters of varying strengths to regulate ethanol production. Promoter replacement resulted in a 63% improvement in ethanol production at 30 °C. We created an ML-guided workflow by utilizing XGBoost to train high-performance models based on promoter strengths and cellular metabolite concentrations obtained from ethanol production of 216 combinatorial strains at 30 °C. This strategy was then applied to optimize ethanol production at 40 °C, where we selected 31 strains for experimental fermentation. This reduced experimental load led to a 7.4% increase in ethanol production in the second round of the ML-guided workflow. Our study offers a comprehensive library of promoter strength modifications for key ethanol production enzymes, showcasing how machine learning can guide yeast strain optimization and make bioethanol production more cost-effective and efficient. Furthermore, we demonstrate that metabolic engineering processes can be accelerated and optimized through this approach.
Collapse
Affiliation(s)
- Peerapat Khamwachirapithak
- National
Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency
(NSTDA) 111 Thailand Science Park, Phahonyothin Road, Khlong
Nueng, Khlong Luang, Pathum Thani 12120, Thailand
| | - Kittapong Sae-Tang
- National
Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency
(NSTDA) 111 Thailand Science Park, Phahonyothin Road, Khlong
Nueng, Khlong Luang, Pathum Thani 12120, Thailand
| | - Wuttichai Mhuantong
- National
Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency
(NSTDA) 111 Thailand Science Park, Phahonyothin Road, Khlong
Nueng, Khlong Luang, Pathum Thani 12120, Thailand
| | - Sutipa Tanapongpipat
- National
Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency
(NSTDA) 111 Thailand Science Park, Phahonyothin Road, Khlong
Nueng, Khlong Luang, Pathum Thani 12120, Thailand
| | - Xin-Qing Zhao
- State
Key Laboratory of Microbial Metabolism, Joint International Research
Laboratory of Metabolic & Developmental Sciences, School of Life
Sciences and Biotechnology, Shanghai Jiao
Tong University, Shanghai 200240, People’s
Republic of China
| | - Chen-Guang Liu
- State
Key Laboratory of Microbial Metabolism, Joint International Research
Laboratory of Metabolic & Developmental Sciences, School of Life
Sciences and Biotechnology, Shanghai Jiao
Tong University, Shanghai 200240, People’s
Republic of China
| | - Dong-Qing Wei
- Department
of Bioinformatics and Biological Statistics, School of Life Sciences
and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, People’s Republic of China
| | - Verawat Champreda
- National
Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency
(NSTDA) 111 Thailand Science Park, Phahonyothin Road, Khlong
Nueng, Khlong Luang, Pathum Thani 12120, Thailand
| | - Weerawat Runguphan
- National
Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency
(NSTDA) 111 Thailand Science Park, Phahonyothin Road, Khlong
Nueng, Khlong Luang, Pathum Thani 12120, Thailand
| |
Collapse
|
9
|
Xiao Z, Li W, Moon H, Roell GW, Chen Y, Tang YJ. Generative Artificial Intelligence GPT-4 Accelerates Knowledge Mining and Machine Learning for Synthetic Biology. ACS Synth Biol 2023; 12:2973-2982. [PMID: 37682043 DOI: 10.1021/acssynbio.3c00310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Knowledge mining from synthetic biology journal articles for machine learning (ML) applications is a labor-intensive process. The development of natural language processing (NLP) tools, such as GPT-4, can accelerate the extraction of published information related to microbial performance under complex strain engineering and bioreactor conditions. As a proof of concept, we proposed prompt engineering for a GPT-4 workflow pipeline to extract knowledge from 176 publications on two oleaginous yeasts (Yarrowia lipolytica and Rhodosporidium toruloides). After human intervention, the pipeline obtained a total of 2037 data instances. The structured data sets and feature selections enabled ML approaches (e.g., a random forest model) to predict Yarrowia fermentation titers with decent accuracy (R2 of 0.86 for unseen test data). Via transfer learning, the trained model could assess the production potential of the engineered nonconventional yeast, R. toruloides, for which there are fewer published reports. This work demonstrated the potential of generative artificial intelligence to streamline information extraction from research articles, thereby facilitating fermentation predictions and biomanufacturing development.
Collapse
Affiliation(s)
- Zhengyang Xiao
- Department of Energy, Environmental, and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Wenyu Li
- Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Hannah Moon
- ImpactDB LLC, St. Louis, Missouri 63105, United States
- Clayton High School, 1 Mark Twain Cir, Clayton, Missouri 63105, United States
| | - Garrett W Roell
- ImpactDB LLC, St. Louis, Missouri 63105, United States
- Department of Molecular Biosciences & Bioengineering, University of Hawaii at Manoa, Honolulu, Hawaii 96822, United States
| | - Yixin Chen
- Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Yinjie J Tang
- Department of Energy, Environmental, and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| |
Collapse
|
10
|
Gonçalves DM, Henriques R, Costa RS. Predicting metabolic fluxes from omics data via machine learning: Moving from knowledge-driven towards data-driven approaches. Comput Struct Biotechnol J 2023; 21:4960-4973. [PMID: 37876626 PMCID: PMC10590844 DOI: 10.1016/j.csbj.2023.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 10/01/2023] [Accepted: 10/01/2023] [Indexed: 10/26/2023] Open
Abstract
The accurate prediction of phenotypes in microorganisms is a main challenge for systems biology. Genome-scale models (GEMs) are a widely used mathematical formalism for predicting metabolic fluxes using constraint-based modeling methods such as flux balance analysis (FBA). However, they require prior knowledge of the metabolic network of an organism and appropriate objective functions, often hampering the prediction of metabolic fluxes under different conditions. Moreover, the integration of omics data to improve the accuracy of phenotype predictions in different physiological states is still in its infancy. Here, we present a novel approach for predicting fluxes under various conditions. We explore the use of supervised machine learning (ML) models using transcriptomics and/or proteomics data and compare their performance against the standard parsimonious FBA (pFBA) approach using case studies of Escherichia coli organism as an example. Our results show that the proposed omics-based ML approach is promising to predict both internal and external metabolic fluxes with smaller prediction errors in comparison to the pFBA approach. The code, data, and detailed results are available at the project's repository[1].
Collapse
Affiliation(s)
- Daniel M. Gonçalves
- INESC-ID, Rua Alves Redol, 9, Lisbon, 1000-029, Portugal
- Instituto Superior Técnico, Av. Rovisco Pais, 1, Lisbon, 1049-001, Portugal
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, Universidade NOVA de Lisboa, Caparica, 2829-516, Portugal
| | - Rui Henriques
- INESC-ID, Rua Alves Redol, 9, Lisbon, 1000-029, Portugal
- Instituto Superior Técnico, Av. Rovisco Pais, 1, Lisbon, 1049-001, Portugal
| | - Rafael S. Costa
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, Universidade NOVA de Lisboa, Caparica, 2829-516, Portugal
| |
Collapse
|
11
|
Karlsen ST, Rau MH, Sánchez BJ, Jensen K, Zeidan AA. From genotype to phenotype: computational approaches for inferring microbial traits relevant to the food industry. FEMS Microbiol Rev 2023; 47:fuad030. [PMID: 37286882 PMCID: PMC10337747 DOI: 10.1093/femsre/fuad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/31/2023] [Accepted: 06/06/2023] [Indexed: 06/09/2023] Open
Abstract
When selecting microbial strains for the production of fermented foods, various microbial phenotypes need to be taken into account to achieve target product characteristics, such as biosafety, flavor, texture, and health-promoting effects. Through continuous advances in sequencing technologies, microbial whole-genome sequences of increasing quality can now be obtained both cheaper and faster, which increases the relevance of genome-based characterization of microbial phenotypes. Prediction of microbial phenotypes from genome sequences makes it possible to quickly screen large strain collections in silico to identify candidates with desirable traits. Several microbial phenotypes relevant to the production of fermented foods can be predicted using knowledge-based approaches, leveraging our existing understanding of the genetic and molecular mechanisms underlying those phenotypes. In the absence of this knowledge, data-driven approaches can be applied to estimate genotype-phenotype relationships based on large experimental datasets. Here, we review computational methods that implement knowledge- and data-driven approaches for phenotype prediction, as well as methods that combine elements from both approaches. Furthermore, we provide examples of how these methods have been applied in industrial biotechnology, with special focus on the fermented food industry.
Collapse
Affiliation(s)
- Signe T Karlsen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Martin H Rau
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Benjamín J Sánchez
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Kristian Jensen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Ahmad A Zeidan
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| |
Collapse
|
12
|
Wang X, Mohsin A, Sun Y, Li C, Zhuang Y, Wang G. From Spatial-Temporal Multiscale Modeling to Application: Bridging the Valley of Death in Industrial Biotechnology. Bioengineering (Basel) 2023; 10:744. [PMID: 37370675 DOI: 10.3390/bioengineering10060744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/13/2023] [Accepted: 06/19/2023] [Indexed: 06/29/2023] Open
Abstract
The Valley of Death confronts industrial biotechnology with a significant challenge to the commercialization of products. Fortunately, with the integration of computation, automation and artificial intelligence (AI) technology, the industrial biotechnology accelerates to cross the Valley of Death. The Fourth Industrial Revolution (Industry 4.0) has spurred advanced development of intelligent biomanufacturing, which has evolved the industrial structures in line with the worldwide trend. To achieve this, intelligent biomanufacturing can be structured into three main parts that comprise digitalization, modeling and intellectualization, with modeling forming a crucial link between the other two components. This paper provides an overview of mechanistic models, data-driven models and their applications in bioprocess development. We provide a detailed elaboration of the hybrid model and its applications in bioprocess engineering, including strain design, process control and optimization, as well as bioreactor scale-up. Finally, the challenges and opportunities of biomanufacturing towards Industry 4.0 are also discussed.
Collapse
Affiliation(s)
- Xueting Wang
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology (ECUST), Shanghai 200237, China
| | - Ali Mohsin
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology (ECUST), Shanghai 200237, China
| | - Yifei Sun
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology (ECUST), Shanghai 200237, China
| | - Chao Li
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology (ECUST), Shanghai 200237, China
| | - Yingping Zhuang
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology (ECUST), Shanghai 200237, China
| | - Guan Wang
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology (ECUST), Shanghai 200237, China
| |
Collapse
|
13
|
Helleckes LM, Hemmerich J, Wiechert W, von Lieres E, Grünberger A. Machine learning in bioprocess development: from promise to practice. Trends Biotechnol 2023; 41:817-835. [PMID: 36456404 DOI: 10.1016/j.tibtech.2022.10.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 10/20/2022] [Accepted: 10/27/2022] [Indexed: 11/30/2022]
Abstract
Fostered by novel analytical techniques, digitalization, and automation, modern bioprocess development provides large amounts of heterogeneous experimental data, containing valuable process information. In this context, data-driven methods like machine learning (ML) approaches have great potential to rationally explore large design spaces while exploiting experimental facilities most efficiently. Herein we demonstrate how ML methods have been applied so far in bioprocess development, especially in strain engineering and selection, bioprocess optimization, scale-up, monitoring, and control of bioprocesses. For each topic, we will highlight successful application cases, current challenges, and point out domains that can potentially benefit from technology transfer and further progress in the field of ML.
Collapse
Affiliation(s)
- Laura M Helleckes
- Institute for Bio- and Geosciences (IBG-1), Forschungszentrum Jülich GmbH, 52428 Jülich, Germany; RWTH Aachen University, Templergraben 55, 52062 Aachen, Germany
| | - Johannes Hemmerich
- Institute for Bio- and Geosciences (IBG-1), Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
| | - Wolfgang Wiechert
- Institute for Bio- and Geosciences (IBG-1), Forschungszentrum Jülich GmbH, 52428 Jülich, Germany; RWTH Aachen University, Templergraben 55, 52062 Aachen, Germany
| | - Eric von Lieres
- Institute for Bio- and Geosciences (IBG-1), Forschungszentrum Jülich GmbH, 52428 Jülich, Germany; RWTH Aachen University, Templergraben 55, 52062 Aachen, Germany
| | - Alexander Grünberger
- Multiscale Bioengineering, Technical Faculty, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany; Center for Biotechnology (CeBiTec), Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany; Institute of Process Engineering in Life Sciences, Section III: Microsystems in Bioprocess Engineering, Karlsruhe Institute of Technology, Fritz-Haber-Weg 2, 76131, Karlsruhe, Germany.
| |
Collapse
|
14
|
Patra P, B R D, Kundu P, Das M, Ghosh A. Recent advances in machine learning applications in metabolic engineering. Biotechnol Adv 2023; 62:108069. [PMID: 36442697 DOI: 10.1016/j.biotechadv.2022.108069] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 10/18/2022] [Accepted: 11/22/2022] [Indexed: 11/27/2022]
Abstract
Metabolic engineering encompasses several widely-used strategies, which currently hold a high seat in the field of biotechnology when its potential is manifesting through a plethora of research and commercial products with a strong societal impact. The genomic revolution that occurred almost three decades ago has initiated the generation of large omics-datasets which has helped in gaining a better understanding of cellular behavior. The itinerary of metabolic engineering that has occurred based on these large datasets has allowed researchers to gain detailed insights and a reasonable understanding of the intricacies of biosystems. However, the existing trail-and-error approaches for metabolic engineering are laborious and time-intensive when it comes to the production of target compounds with high yields through genetic manipulations in host organisms. Machine learning (ML) coupled with the available metabolic engineering test instances and omics data brings a comprehensive and multidisciplinary approach that enables scientists to evaluate various parameters for effective strain design. This vast amount of biological data should be standardized through knowledge engineering to train different ML models for providing accurate predictions in gene circuits designing, modification of proteins, optimization of bioprocess parameters for scaling up, and screening of hyper-producing robust cell factories. This review briefs on the premise of ML, followed by mentioning various ML methods and algorithms alongside the numerous omics datasets available to train ML models for predicting metabolic outcomes with high-accuracy. The combinative interplay between the ML algorithms and biological datasets through knowledge engineering have guided the recent advancements in applications such as CRISPR/Cas systems, gene circuits, protein engineering, metabolic pathway reconstruction, and bioprocess engineering. Finally, this review addresses the probable challenges of applying ML in metabolic engineering which will guide the researchers toward novel techniques to overcome the limitations.
Collapse
Affiliation(s)
- Pradipta Patra
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Disha B R
- B.M.S College of Engineering, Basavanagudi, Bengaluru, Karnataka 560019, India
| | - Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Manali Das
- School of Bioscience, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
15
|
Volk MJ, Tran VG, Tan SI, Mishra S, Fatma Z, Boob A, Li H, Xue P, Martin TA, Zhao H. Metabolic Engineering: Methodologies and Applications. Chem Rev 2022; 123:5521-5570. [PMID: 36584306 DOI: 10.1021/acs.chemrev.2c00403] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Metabolic engineering aims to improve the production of economically valuable molecules through the genetic manipulation of microbial metabolism. While the discipline is a little over 30 years old, advancements in metabolic engineering have given way to industrial-level molecule production benefitting multiple industries such as chemical, agriculture, food, pharmaceutical, and energy industries. This review describes the design, build, test, and learn steps necessary for leading a successful metabolic engineering campaign. Moreover, we highlight major applications of metabolic engineering, including synthesizing chemicals and fuels, broadening substrate utilization, and improving host robustness with a focus on specific case studies. Finally, we conclude with a discussion on perspectives and future challenges related to metabolic engineering.
Collapse
Affiliation(s)
- Michael J Volk
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Vinh G Tran
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Shih-I Tan
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Chemical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | - Shekhar Mishra
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Zia Fatma
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Aashutosh Boob
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Hongxiang Li
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Pu Xue
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Teresa A Martin
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Huimin Zhao
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
16
|
Kumar Sharma A, Kumar Ghodke P, Goyal N, Nethaji S, Chen WH. Machine learning technology in biohydrogen production from agriculture waste: Recent advances and future perspectives. BIORESOURCE TECHNOLOGY 2022; 364:128076. [PMID: 36216286 DOI: 10.1016/j.biortech.2022.128076] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 09/30/2022] [Accepted: 10/02/2022] [Indexed: 06/16/2023]
Abstract
Agricultural waste biomass has shown great potential to deliver green energy produced by biochemical and thermochemical conversion processes to mitigate future energy crises. Biohydrogen has become more interested in carbon-free and high-energy dense fuels among different biofuels. However, it is challenging to develop models based on experience or theory for precise predictions due to the complexity of biohydrogen production systems and the limitations of human perception. Recent advancements in machine learning (ML) may open up new possibilities. For this reason, this critical study offers a thorough understanding of ML's use in biohydrogen production. The most recent developments in ML-assisted biohydrogen technologies, including biochemical and thermochemical processes, are examined in depth. This review paper also discusses the prediction of biohydrogen production from agricultural waste. Finally, the techno-economic and scientific obstacles to ML application in agriculture waste biomass-based biohydrogen production are summarized.
Collapse
Affiliation(s)
- Amit Kumar Sharma
- Department of Chemistry, Applied Sciences Cluster, Centre for Alternate and Renewable Energy Research, R&D, University of Petroleum & Energy Studies (UPES), School of Engineering, Energy Acres Building, Bidholi, Dehradun 248007, Uttarakhand, India
| | - Praveen Kumar Ghodke
- Department of Chemical Engineering, National Institute of Technology Calicut, Kozhikode 673601, Kerala, India
| | - Nishu Goyal
- School of Health Sciences, University of Petroleum & Energy Studies (UPES), School of Engineering, Energy Acres Building, Bidholi, Dehradun 248007, Uttarakhand, India
| | - S Nethaji
- Department of Chemical Engineering, Manipal Institute of Technology, Manipal Karnataka, 576104 l, India
| | - Wei-Hsin Chen
- Department of Aeronautics and Astronautics, National Cheng Kung University, Tainan 701, Taiwan; Research Center for Smart Sustainable Circular Economy, Tunghai University, Taichung 407, Taiwan; Department of Mechanical Engineering, National Chin-Yi University of Technology, Taichung 411, Taiwan.
| |
Collapse
|
17
|
Du YH, Wang MY, Yang LH, Tong LL, Guo DS, Ji XJ. Optimization and Scale-Up of Fermentation Processes Driven by Models. Bioengineering (Basel) 2022; 9:bioengineering9090473. [PMID: 36135019 PMCID: PMC9495923 DOI: 10.3390/bioengineering9090473] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 09/05/2022] [Accepted: 09/09/2022] [Indexed: 11/16/2022] Open
Abstract
In the era of sustainable development, the use of cell factories to produce various compounds by fermentation has attracted extensive attention; however, industrial fermentation requires not only efficient production strains, but also suitable extracellular conditions and medium components, as well as scaling-up. In this regard, the use of biological models has received much attention, and this review will provide guidance for the rapid selection of biological models. This paper first introduces two mechanistic modeling methods, kinetic modeling and constraint-based modeling (CBM), and generalizes their applications in practice. Next, we review data-driven modeling based on machine learning (ML), and highlight the application scope of different learning algorithms. The combined use of ML and CBM for constructing hybrid models is further discussed. At the end, we also discuss the recent strategies for predicting bioreactor scale-up and culture behavior through a combination of biological models and computational fluid dynamics (CFD) models.
Collapse
Affiliation(s)
- Yuan-Hang Du
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, Nanjing 210023, China
| | - Min-Yu Wang
- State Key Laboratory of Materials-Oriented Chemical Engineering, College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing 211816, China
| | - Lin-Hui Yang
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, Nanjing 210023, China
| | - Ling-Ling Tong
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, Nanjing 210023, China
| | - Dong-Sheng Guo
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, Nanjing 210023, China
- Correspondence: (D.-S.G.); (X.-J.J.)
| | - Xiao-Jun Ji
- State Key Laboratory of Materials-Oriented Chemical Engineering, College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing 211816, China
- Correspondence: (D.-S.G.); (X.-J.J.)
| |
Collapse
|
18
|
Lo-Thong-Viramoutou O, Charton P, Cadet XF, Grondin-Perez B, Saavedra E, Damour C, Cadet F. Non-linearity of Metabolic Pathways Critically Influences the Choice of Machine Learning Model. Front Artif Intell 2022; 5:744755. [PMID: 35757298 PMCID: PMC9226554 DOI: 10.3389/frai.2022.744755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 04/29/2022] [Indexed: 11/13/2022] Open
Abstract
The use of machine learning (ML) in life sciences has gained wide interest over the past years, as it speeds up the development of high performing models. Important modeling tools in biology have proven their worth for pathway design, such as mechanistic models and metabolic networks, as they allow better understanding of mechanisms involved in the functioning of organisms. However, little has been done on the use of ML to model metabolic pathways, and the degree of non-linearity associated with them is not clear. Here, we report the construction of different metabolic pathways with several linear and non-linear ML models. Different types of data are used; they lead to the prediction of important biological data, such as pathway flux and final product concentration. A comparison reveals that the data features impact model performance and highlight the effectiveness of non-linear models (e.g., QRF: RMSE = 0.021 nmol·min-1 and R2 = 1 vs. Bayesian GLM: RMSE = 1.379 nmol·min-1 R2 = 0.823). It turns out that the greater the degree of non-linearity of the pathway, the better suited a non-linear model will be. Therefore, a decision-making support for pathway modeling is established. These findings generally support the hypothesis that non-linear aspects predominate within the metabolic pathways. This must be taken into account when devising possible applications of these pathways for the identification of biomarkers of diseases (e.g., infections, cancer, neurodegenerative diseases) or the optimization of industrial production processes.
Collapse
Affiliation(s)
- Ophélie Lo-Thong-Viramoutou
- University of Paris, BIGR—Biologie Intégrée du Globule Rouge, Inserm, UMR_S1134, Paris, France
- Laboratory of Excellence GR-Ex, Paris, France
- Laboratory DSIMB, UMR_S1134, BIGR, Inserm, Faculty of Sciences and Technology, University of La Reunion, Saint-Denis, France
| | - Philippe Charton
- University of Paris, BIGR—Biologie Intégrée du Globule Rouge, Inserm, UMR_S1134, Paris, France
- Laboratory of Excellence GR-Ex, Paris, France
- Laboratory DSIMB, UMR_S1134, BIGR, Inserm, Faculty of Sciences and Technology, University of La Reunion, Saint-Denis, France
| | | | - Brigitte Grondin-Perez
- EnergyLab, EA 4079, Faculty of Sciences and Technology, University of La Reunion, Saint-Denis, France
| | - Emma Saavedra
- Departamento de Bioquímica, Instituto Nacional de Cardiología Ignacio Chávez, Mexico City, Mexico
| | - Cédric Damour
- EnergyLab, EA 4079, Faculty of Sciences and Technology, University of La Reunion, Saint-Denis, France
| | - Frédéric Cadet
- University of Paris, BIGR—Biologie Intégrée du Globule Rouge, Inserm, UMR_S1134, Paris, France
- Laboratory of Excellence GR-Ex, Paris, France
- Laboratory DSIMB, UMR_S1134, BIGR, Inserm, Faculty of Sciences and Technology, University of La Reunion, Saint-Denis, France
| |
Collapse
|
19
|
Rickert CA, Lieleg O. Machine learning approaches for biomolecular, biophysical, and biomaterials research. BIOPHYSICS REVIEWS 2022; 3:021306. [PMID: 38505413 PMCID: PMC10914139 DOI: 10.1063/5.0082179] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 05/12/2022] [Indexed: 03/21/2024]
Abstract
A fluent conversation with a virtual assistant, person-tailored news feeds, and deep-fake images created within seconds-all those things that have been unthinkable for a long time are now a part of our everyday lives. What these examples have in common is that they are realized by different means of machine learning (ML), a technology that has fundamentally changed many aspects of the modern world. The possibility to process enormous amount of data in multi-hierarchical, digital constructs has paved the way not only for creating intelligent systems but also for obtaining surprising new insight into many scientific problems. However, in the different areas of biosciences, which typically rely heavily on the collection of time-consuming experimental data, applying ML methods is a bit more challenging: Here, difficulties can arise from small datasets and the inherent, broad variability, and complexity associated with studying biological objects and phenomena. In this Review, we give an overview of commonly used ML algorithms (which are often referred to as "machines") and learning strategies as well as their applications in different bio-disciplines such as molecular biology, drug development, biophysics, and biomaterials science. We highlight how selected research questions from those fields were successfully translated into machine readable formats, discuss typical problems that can arise in this context, and provide an overview of how to resolve those encountered difficulties.
Collapse
|
20
|
Liao X, Ma H, Tang YJ. Artificial intelligence: a solution to involution of design–build–test–learn cycle. Curr Opin Biotechnol 2022; 75:102712. [DOI: 10.1016/j.copbio.2022.102712] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 02/05/2022] [Accepted: 03/01/2022] [Indexed: 01/08/2023]
|
21
|
McElhinney JMWR, Catacutan MK, Mawart A, Hasan A, Dias J. Interfacing Machine Learning and Microbial Omics: A Promising Means to Address Environmental Challenges. Front Microbiol 2022; 13:851450. [PMID: 35547145 PMCID: PMC9083327 DOI: 10.3389/fmicb.2022.851450] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Accepted: 03/14/2022] [Indexed: 11/13/2022] Open
Abstract
Microbial communities are ubiquitous and carry an exceptionally broad metabolic capability. Upon environmental perturbation, microbes are also amongst the first natural responsive elements with perturbation-specific cues and markers. These communities are thereby uniquely positioned to inform on the status of environmental conditions. The advent of microbial omics has led to an unprecedented volume of complex microbiological data sets. Importantly, these data sets are rich in biological information with potential for predictive environmental classification and forecasting. However, the patterns in this information are often hidden amongst the inherent complexity of the data. There has been a continued rise in the development and adoption of machine learning (ML) and deep learning architectures for solving research challenges of this sort. Indeed, the interface between molecular microbial ecology and artificial intelligence (AI) appears to show considerable potential for significantly advancing environmental monitoring and management practices through their application. Here, we provide a primer for ML, highlight the notion of retaining biological sample information for supervised ML, discuss workflow considerations, and review the state of the art of the exciting, yet nascent, interdisciplinary field of ML-driven microbial ecology. Current limitations in this sphere of research are also addressed to frame a forward-looking perspective toward the realization of what we anticipate will become a pivotal toolkit for addressing environmental monitoring and management challenges in the years ahead.
Collapse
Affiliation(s)
- James M. W. R. McElhinney
- Applied Genomics Laboratory, Center for Membranes and Advanced Water Technology, Khalifa University, Abu Dhabi, United Arab Emirates
| | | | - Aurelie Mawart
- Applied Genomics Laboratory, Center for Membranes and Advanced Water Technology, Khalifa University, Abu Dhabi, United Arab Emirates
| | - Ayesha Hasan
- Applied Genomics Laboratory, Center for Membranes and Advanced Water Technology, Khalifa University, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University, Abu Dhabi, United Arab Emirates
| | - Jorge Dias
- EECS, Center for Autonomous Robotic Systems, Khalifa University, Abu Dhabi, United Arab Emirates
| |
Collapse
|
22
|
Sampaio M, Rocha M, Dias O. Exploring synergies between plant metabolic modelling and machine learning. Comput Struct Biotechnol J 2022; 20:1885-1900. [PMID: 35521559 PMCID: PMC9052043 DOI: 10.1016/j.csbj.2022.04.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 04/08/2022] [Accepted: 04/11/2022] [Indexed: 11/03/2022] Open
|
23
|
Robustness: linking strain design to viable bioprocesses. Trends Biotechnol 2022; 40:918-931. [PMID: 35120750 DOI: 10.1016/j.tibtech.2022.01.004] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 01/05/2022] [Accepted: 01/05/2022] [Indexed: 12/18/2022]
Abstract
Microbial cell factories are becoming increasingly popular for the sustainable production of various chemicals. Metabolic engineering has led to the design of advanced cell factories; however, their long-term yield, titer, and productivity falter when scaled up and subjected to industrial conditions. This limitation arises from a lack of robustness - the ability to maintain a constant phenotype despite the perturbations of such processes. This review describes predictable and stochastic industrial perturbations as well as state-of-the-art technologies to counter process variability. Moreover, we distinguish robustness from tolerance and discuss the potential of single-cell studies for improving system robustness. Finally, we highlight ways of achieving consistent and comparable quantification of robustness that can guide the selection of strains for industrial bioprocesses.
Collapse
|
24
|
Mey F, Clauwaert J, Van Huffel K, Waegeman W, De Mey M. Improving the performance of machine learning models for biotechnology: The quest for deus ex machina. Biotechnol Adv 2021; 53:107858. [PMID: 34695560 DOI: 10.1016/j.biotechadv.2021.107858] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 10/13/2021] [Accepted: 10/14/2021] [Indexed: 11/24/2022]
Abstract
Machine learning is becoming an integral part of the Design-Build-Test-Learn cycle in biotechnology. Machine learning models learn from collected datasets such as omics data and predict a defined outcome, which has led to both production improvements and predictive tools in the field. Robust prediction of the behavior of microbial cell factories and production processes not only greatly increases our understanding of the function of such systems, but also provides significant savings of development time. However, many pitfalls when modeling biological data - bad fit, noisy data, model instability, low data quantity and imbalances in the data - cause models to suffer in their performance. Here we provide an accessible, in-depth analysis on the problems created by these pitfalls, as well as means of their detection and mediation, with a focus on supervised learning. Assessing the state of the art, we show that, currently, in-depth analyses of model performance are often absent and must be improved. This review provides a toolbox for the analysis of model robustness and performance, and simultaneously proposes a standard for the community to facilitate future work. It is further accompanied by an interactive online tutorial on the discussed issues.
Collapse
Affiliation(s)
- Friederike Mey
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Jim Clauwaert
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
| | - Kirsten Van Huffel
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Willem Waegeman
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
| | - Marjan De Mey
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000 Ghent, Belgium.
| |
Collapse
|
25
|
Nakazawa S, Imaichi O, Kogure T, Kubota T, Toyoda K, Suda M, Inui M, Ito K, Shirai T, Araki M. History-Driven Genetic Modification Design Technique Using a Domain-Specific Lexical Model for the Acceleration of DBTL Cycles for Microbial Cell Factories. ACS Synth Biol 2021; 10:2308-2317. [PMID: 34351735 DOI: 10.1021/acssynbio.1c00234] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The development of microbes for conducting bioprocessing via synthetic biology involves design-build-test-learn (DBTL) cycles. To aid the designing step, we developed a computational technique that suggests next genetic modifications on the basis of relatedness to the user's design history of genetic modifications accumulated through former DBTL cycles conducted by the user. This technique, which comprehensively retrieves well-known designs related to the history, involves searching text for previous literature and then mining genes that frequently co-occur in the literature with those modified genes. We further developed a domain-specific lexical model that weights literature that is more related to the domain of metabolic engineering to emphasize genes modified for bioprocessing. Our technique made a suggestion by using a history of creating a Corynebacterium glutamicum strain producing shikimic acid that had 18 genetic modifications. Inspired by the suggestion, eight genes were considered by biologists for further modification, and modifying four of these genes proved experimentally efficient in increasing the production of shikimic acid. These results indicated that our proposed technique successfully utilized the former cycles to suggest relevant designs that biologists considered worth testing. Comprehensive retrieval of well-tested designs will help less-experienced researchers overcome the entry barrier as well as inspire experienced researchers to formulate design concepts that have been overlooked or suspended. This technique will aid DBTL cycles by feeding histories back to the next genetic design, thereby complementing the designing step.
Collapse
Affiliation(s)
- Shiori Nakazawa
- Center for Exploratory Research, Research and Development Group, Hitachi, Ltd., 1-280, Higashi-Koigakubo, Kokubunji-shi, Tokyo 185-8601, Japan
| | - Osamu Imaichi
- Center for Exploratory Research, Research and Development Group, Hitachi, Ltd., 1-280, Higashi-Koigakubo, Kokubunji-shi, Tokyo 185-8601, Japan
| | - Takahisa Kogure
- Research Institute of Innovative Technology for Earth, 9-2, Kizugawadai, Kizugawa-shi, Kyoto 619-0292, Japan
| | - Takeshi Kubota
- Research Institute of Innovative Technology for Earth, 9-2, Kizugawadai, Kizugawa-shi, Kyoto 619-0292, Japan
| | - Koichi Toyoda
- Research Institute of Innovative Technology for Earth, 9-2, Kizugawadai, Kizugawa-shi, Kyoto 619-0292, Japan
| | - Masako Suda
- Research Institute of Innovative Technology for Earth, 9-2, Kizugawadai, Kizugawa-shi, Kyoto 619-0292, Japan
| | - Masayuki Inui
- Research Institute of Innovative Technology for Earth, 9-2, Kizugawadai, Kizugawa-shi, Kyoto 619-0292, Japan
| | - Kiyoto Ito
- Center for Exploratory Research, Research and Development Group, Hitachi, Ltd., 1-280, Higashi-Koigakubo, Kokubunji-shi, Tokyo 185-8601, Japan
| | - Tomokazu Shirai
- Riken, 1-6 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 240-0035, Japan
| | - Michihiro Araki
- Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan
- Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan
- National Institutes of Biomedical Innovation, Health and Nutrition, 1-23-1 Toyama, Shinjuku-ku, Tokyo 162-8638, Japan
| |
Collapse
|
26
|
Khaleghi MK, Savizi ISP, Lewis NE, Shojaosadati SA. Synergisms of machine learning and constraint-based modeling of metabolism for analysis and optimization of fermentation parameters. Biotechnol J 2021; 16:e2100212. [PMID: 34390201 DOI: 10.1002/biot.202100212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 08/10/2021] [Accepted: 08/11/2021] [Indexed: 11/06/2022]
Abstract
Recent noteworthy advances in the development of high-performing microbial and mammalian strains have enabled the sustainable production of bio-economically valuable substances such as bio-compounds, biofuels, and biopharmaceuticals. However, to obtain an industrially viable mass-production scheme, much time and effort are required. The robust and rational design of fermentation processes requires analysis and optimization of different extracellular conditions and medium components, which have a massive effect on growth and productivity. In this regard, knowledge- and data-driven modeling methods have received much attention. Constraint-based modeling (CBM) is a knowledge-driven mathematical approach that has been widely used in fermentation analysis and optimization due to its capabilities of predicting the cellular phenotype from genotype through high-throughput means. On the other hand, machine learning (ML) is a data-driven statistical method that identifies the data patterns within sophisticated biological systems and processes, where there is inadequate knowledge to represent underlying mechanisms. Furthermore, ML models are becoming a viable complement to constraint-based models in a reciprocal manner when one is used as a pre-step of another. As a result, more predictable model is produced. This review highlights the applications of CBM and ML independently and the combination of these two approaches for analyzing and optimizing fermentation parameters. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Mohammad Karim Khaleghi
- Biotechnology Department, Faculty of Chemical Engineering, Tarbiat Modares University, Tehran, Iran
| | - Iman Shahidi Pour Savizi
- Biotechnology Department, Faculty of Chemical Engineering, Tarbiat Modares University, Tehran, Iran
| | - Nathan E Lewis
- Department of Bioengineering, University of California, San Diego, USA.,Department of Pediatrics, University of California, San Diego, USA
| | - Seyed Abbas Shojaosadati
- Biotechnology Department, Faculty of Chemical Engineering, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
27
|
Integrated knowledge mining, genome-scale modeling, and machine learning for predicting Yarrowia lipolytica bioproduction. Metab Eng 2021; 67:227-236. [PMID: 34242777 DOI: 10.1016/j.ymben.2021.07.003] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Revised: 06/17/2021] [Accepted: 07/05/2021] [Indexed: 01/14/2023]
Abstract
Predicting bioproduction titers from microbial hosts has been challenging due to complex interactions between microbial regulatory networks, stress responses, and suboptimal cultivation conditions. This study integrated knowledge mining, feature extraction, genome-scale modeling (GSM), and machine learning (ML) to develop a model for predicting Yarrowia lipolytica chemical titers (i.e., organic acids, terpenoids, etc.). First, Y. lipolytica production data, including cultivation conditions, genetic engineering strategies, and product information, was manually collected from literature (~100 papers) and stored as either numerical (e.g., substrate concentrations) or categorical (e.g., bioreactor modes) variables. For each case recorded, central pathway fluxes were estimated using GSMs and flux balance analysis (FBA) to provide metabolic features. Second, a ML ensemble learner was trained to predict strain production titers. Accurate predictions on the test data were obtained for instances with production titers >1 g/L (R2 = 0.87). However, the model had reduced predictability for low performance strains (0.01-1 g/L, R2 = 0.29) potentially due to biosynthesis bottlenecks not captured in the features. Feature ranking indicated that the FBA fluxes, the number of enzyme steps, the substrate inputs, and thermodynamic barriers (i.e., Gibbs free energy of reaction) were the most influential factors. Third, the model was evaluated on other oleaginous yeasts and indicated there were conserved features for some hosts that can be potentially exploited by transfer learning. The platform was also designed to assist computational strain design tools (such as OptKnock) to screen genetic targets for improved microbial production in light of experimental conditions.
Collapse
|
28
|
Xu Y, Wu Y, Lv X, Sun G, Zhang H, Chen T, Du G, Li J, Liu L. Design and construction of novel biocatalyst for bioprocessing: Recent advances and future outlook. BIORESOURCE TECHNOLOGY 2021; 332:125071. [PMID: 33826982 DOI: 10.1016/j.biortech.2021.125071] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 03/19/2021] [Accepted: 03/23/2021] [Indexed: 06/12/2023]
Abstract
Bioprocess, a biocatalysis-based technology, is becoming popular in many research fields and widely applied in industrial manufacturing. However, low bioconversion, low productivity, and high costs during industrial processes are usually the limitation in bioprocess. Therefore, many biocatalyst strategies have been developed to meet these challenges in recent years. In this review, we firstly discuss protein engineering strategies, which are emerged for improving the biocatalysis activity of biocatalysts. Then, we summarize metabolic engineering strategies that are promoting the development of microbial cell factories. Next, we illustrate the necessity of using the combining strategy of protein engineering and metabolic engineering for efficient biocatalysts. Lastly, future perspectives about the development and application of novel biocatalyst strategies are discussed. This review provides theoretical guidance for the development of efficient, sustainable, and economical bioprocesses mediated by novel biocatalysts.
Collapse
Affiliation(s)
- Yameng Xu
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, PR China; Science Center for Future Foods, Jiangnan University, Wuxi 214122, PR China
| | - Yaokang Wu
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, PR China; Science Center for Future Foods, Jiangnan University, Wuxi 214122, PR China
| | - Xueqin Lv
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, PR China; Science Center for Future Foods, Jiangnan University, Wuxi 214122, PR China
| | - Guoyun Sun
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, PR China; Science Center for Future Foods, Jiangnan University, Wuxi 214122, PR China
| | - Hongzhi Zhang
- Shandong Runde Biotechnology Co., Ltd., Tai'an 271000, PR China
| | - Taichi Chen
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, PR China; Science Center for Future Foods, Jiangnan University, Wuxi 214122, PR China
| | - Guocheng Du
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, PR China; Science Center for Future Foods, Jiangnan University, Wuxi 214122, PR China
| | - Jianghua Li
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, PR China; Science Center for Future Foods, Jiangnan University, Wuxi 214122, PR China
| | - Long Liu
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, PR China; Science Center for Future Foods, Jiangnan University, Wuxi 214122, PR China.
| |
Collapse
|
29
|
Wong M, Badri A, Gasparis C, Belfort G, Koffas M. Modular optimization in metabolic engineering. Crit Rev Biochem Mol Biol 2021; 56:587-602. [PMID: 34180323 DOI: 10.1080/10409238.2021.1937928] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
There is an increasing demand for bioproducts produced by metabolically engineered microbes, such as pharmaceuticals, biofuels, biochemicals and other high value compounds. In order to meet this demand, modular optimization, the optimizing of subsections instead of the whole system, has been adopted to engineer cells to overproduce products. Research into modularity has focused on traditional approaches such as DNA, RNA, and protein-level modularity of intercellular machinery, by optimizing metabolic pathways for enhanced production. While research into these traditional approaches continues, limitations such as scale-up and time cost hold them back from wider use, while at the same time there is a shift to more novel methods, such as moving from episomal expression to chromosomal integration. Recently, nontraditional approaches such as co-culture systems and cell-free metabolic engineering (CFME) are being investigated for modular optimization. Co-culture modularity looks to optimally divide the metabolic burden between different hosts. CFME seeks to modularly optimize metabolic pathways in vitro, both speeding up the design of such systems and eliminating the issues associated with live hosts. In this review we will examine both traditional and nontraditional approaches for modular optimization, examining recent developments and discussing issues and emerging solutions for future research in metabolic engineering.
Collapse
Affiliation(s)
- Matthew Wong
- Howard P. Isermann Department of Chemical and Biological Engineering and the Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Abinaya Badri
- Howard P. Isermann Department of Chemical and Biological Engineering and the Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Christopher Gasparis
- Howard P. Isermann Department of Chemical and Biological Engineering and the Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Georges Belfort
- Howard P. Isermann Department of Chemical and Biological Engineering and the Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Mattheos Koffas
- Howard P. Isermann Department of Chemical and Biological Engineering and the Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY, USA
| |
Collapse
|
30
|
Helmy M, Smith D, Selvarajoo K. Systems biology approaches integrated with artificial intelligence for optimized metabolic engineering. Metab Eng Commun 2020; 11:e00149. [PMID: 33072513 PMCID: PMC7546651 DOI: 10.1016/j.mec.2020.e00149] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 10/01/2020] [Accepted: 10/07/2020] [Indexed: 12/05/2022] Open
Abstract
Metabolic engineering aims to maximize the production of bio-economically important substances (compounds, enzymes, or other proteins) through the optimization of the genetics, cellular processes and growth conditions of microorganisms. This requires detailed understanding of underlying metabolic pathways involved in the production of the targeted substances, and how the cellular processes or growth conditions are regulated by the engineering. To achieve this goal, a large system of experimental techniques, compound libraries, computational methods and data resources, including multi-omics data, are used. The recent advent of multi-omics systems biology approaches significantly impacted the field by opening new avenues to perform dynamic and large-scale analyses that deepen our knowledge on the manipulations. However, with the enormous transcriptomics, proteomics and metabolomics available, it is a daunting task to integrate the data for a more holistic understanding. Novel data mining and analytics approaches, including Artificial Intelligence (AI), can provide breakthroughs where traditional low-throughput experiment-alone methods cannot easily achieve. Here, we review the latest attempts of combining systems biology and AI in metabolic engineering research, and highlight how this alliance can help overcome the current challenges facing industrial biotechnology, especially for food-related substances and compounds using microorganisms.
Collapse
Affiliation(s)
- Mohamed Helmy
- Singapore Institute of Food and Biotechnology Innovation (SIFBI), Agency for Science, Technology and Research (A∗STAR), Singapore, Singapore
| | - Derek Smith
- Singapore Institute of Food and Biotechnology Innovation (SIFBI), Agency for Science, Technology and Research (A∗STAR), Singapore, Singapore
| | - Kumar Selvarajoo
- Singapore Institute of Food and Biotechnology Innovation (SIFBI), Agency for Science, Technology and Research (A∗STAR), Singapore, Singapore
- Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore (NUS), Singapore, Singapore
| |
Collapse
|
31
|
Banerjee D, Eng T, Lau AK, Sasaki Y, Wang B, Chen Y, Prahl JP, Singan VR, Herbert RA, Liu Y, Tanjore D, Petzold CJ, Keasling JD, Mukhopadhyay A. Genome-scale metabolic rewiring improves titers rates and yields of the non-native product indigoidine at scale. Nat Commun 2020; 11:5385. [PMID: 33097726 PMCID: PMC7584609 DOI: 10.1038/s41467-020-19171-4] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 09/30/2020] [Indexed: 01/06/2023] Open
Abstract
High titer, rate, yield (TRY), and scalability are challenging metrics to achieve due to trade-offs between carbon use for growth and production. To achieve these metrics, we take the minimal cut set (MCS) approach that predicts metabolic reactions for elimination to couple metabolite production strongly with growth. We compute MCS solution-sets for a non-native product indigoidine, a sustainable pigment, in Pseudomonas putida KT2440, an emerging industrial microbe. From the 63 solution-sets, our omics guided process identifies one experimentally feasible solution requiring 14 simultaneous reaction interventions. We implement a total of 14 genes knockdowns using multiplex-CRISPRi. MCS-based solution shifts production from stationary to exponential phase. We achieve 25.6 g/L, 0.22 g/l/h, and ~50% maximum theoretical yield (0.33 g indigoidine/g glucose). These phenotypes are maintained from batch to fed-batch mode, and across scales (100-ml shake flasks, 250-ml ambr®, and 2-L bioreactors).
Collapse
Affiliation(s)
- Deepanwita Banerjee
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, 94608, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Thomas Eng
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, 94608, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Andrew K Lau
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, 94608, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Yusuke Sasaki
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, 94608, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Brenda Wang
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, 94608, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Yan Chen
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, 94608, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Jan-Philip Prahl
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Advanced Biofuel and Bioproduct Process Development Unit, Lawrence Berkeley National Laboratory, Emeryville, CA, 94608, USA
| | - Vasanth R Singan
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Robin A Herbert
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, 94608, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Yuzhong Liu
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, 94608, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Deepti Tanjore
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Advanced Biofuel and Bioproduct Process Development Unit, Lawrence Berkeley National Laboratory, Emeryville, CA, 94608, USA
| | - Christopher J Petzold
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, 94608, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Jay D Keasling
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, 94608, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- QB3 Institute, University of California-Berkeley, 5885 Hollis Street, 4th Floor, Emeryville, CA, 94608, USA
- Department of Chemical & Biomolecular Engineering, University of California, Berkeley, CA, 94720, USA
- Department of Bioengineering, University of California, Berkeley, CA, 94720, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University Denmark, 2970, Horsholm, Denmark
- Synthetic Biochemistry Center, Institute for Synthetic Biology, Shenzhen Institutes for Advanced Technologies, Shenzhen, China
| | - Aindrila Mukhopadhyay
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, 94608, USA.
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| |
Collapse
|
32
|
Antonakoudis A, Barbosa R, Kotidis P, Kontoravdi C. The era of big data: Genome-scale modelling meets machine learning. Comput Struct Biotechnol J 2020; 18:3287-3300. [PMID: 33240470 PMCID: PMC7663219 DOI: 10.1016/j.csbj.2020.10.011] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 10/07/2020] [Accepted: 10/08/2020] [Indexed: 12/15/2022] Open
Abstract
With omics data being generated at an unprecedented rate, genome-scale modelling has become pivotal in its organisation and analysis. However, machine learning methods have been gaining ground in cases where knowledge is insufficient to represent the mechanisms underlying such data or as a means for data curation prior to attempting mechanistic modelling. We discuss the latest advances in genome-scale modelling and the development of optimisation algorithms for network and error reduction, intracellular constraining and applications to strain design. We further review applications of supervised and unsupervised machine learning methods to omics datasets from microbial and mammalian cell systems and present efforts to harness the potential of both modelling approaches through hybrid modelling.
Collapse
Affiliation(s)
| | | | | | - Cleo Kontoravdi
- Department of Chemical Engineering, Imperial College London, London SW7 2AZ, United Kingdom
| |
Collapse
|
33
|
Liu Y, Su A, Li J, Ledesma-Amaro R, Xu P, Du G, Liu L. Towards next-generation model microorganism chassis for biomanufacturing. Appl Microbiol Biotechnol 2020; 104:9095-9108. [DOI: 10.1007/s00253-020-10902-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 09/03/2020] [Accepted: 09/10/2020] [Indexed: 11/29/2022]
|
34
|
Rana P, Berry C, Ghosh P, Fong SS. Recent advances on constraint-based models by integrating machine learning. Curr Opin Biotechnol 2020; 64:85-91. [DOI: 10.1016/j.copbio.2019.11.007] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Revised: 11/04/2019] [Accepted: 11/06/2019] [Indexed: 01/06/2023]
|
35
|
Volk MJ, Lourentzou I, Mishra S, Vo LT, Zhai C, Zhao H. Biosystems Design by Machine Learning. ACS Synth Biol 2020; 9:1514-1533. [PMID: 32485108 DOI: 10.1021/acssynbio.0c00129] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Biosystems such as enzymes, pathways, and whole cells have been increasingly explored for biotechnological applications. However, the intricate connectivity and resulting complexity of biosystems poses a major hurdle in designing biosystems with desirable features. As -omics and other high throughput technologies have been rapidly developed, the promise of applying machine learning (ML) techniques in biosystems design has started to become a reality. ML models enable the identification of patterns within complicated biological data across multiple scales of analysis and can augment biosystems design applications by predicting new candidates for optimized performance. ML is being used at every stage of biosystems design to help find nonobvious engineering solutions with fewer design iterations. In this review, we first describe commonly used models and modeling paradigms within ML. We then discuss some applications of these models that have already shown success in biotechnological applications. Moreover, we discuss successful applications at all scales of biosystems design, including nucleic acids, genetic circuits, proteins, pathways, genomes, and bioprocesses. Finally, we discuss some limitations of these methods and potential solutions as well as prospects of the combination of ML and biosystems design.
Collapse
|
36
|
Chen Y, Banerjee D, Mukhopadhyay A, Petzold CJ. Systems and synthetic biology tools for advanced bioproduction hosts. Curr Opin Biotechnol 2020; 64:101-109. [PMID: 31927061 DOI: 10.1016/j.copbio.2019.12.007] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Revised: 11/27/2019] [Accepted: 12/08/2019] [Indexed: 02/07/2023]
Abstract
The genomic revolution ushered in an era of discovery and characterization of enzymes from novel organisms that fueled engineering of microbes to produce commodity and high-value compounds. Over the past decade advances in synthetic biology tools in recent years contributed to significant progress in metabolic engineering efforts to produce both biofuels and bioproducts resulting in several such related items being brought to market. These successes represent a burgeoning bio-economy; however, significant resources and time are still necessary to progress a system from proof-of-concept to market. In order to fully realize this potential, methods that examine biological systems in a comprehensive, systematic and high-throughput manner are essential. Recent success in synthetic biology has coincided with the development of systems biology and analytical approaches that kept pace and scaled with technology development. Here, we review a selection of systems biology methods and their use in synthetic biology approaches for microbial biotechnology platforms.
Collapse
Affiliation(s)
- Yan Chen
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, USA; Agile BioFoundry, Lawrence Berkeley National Laboratory, Emeryville, CA, USA; Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Deepanwita Banerjee
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, USA; Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Aindrila Mukhopadhyay
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, USA; Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Christopher J Petzold
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, USA; Agile BioFoundry, Lawrence Berkeley National Laboratory, Emeryville, CA, USA; Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| |
Collapse
|
37
|
Raghavendran V, Asare E, Roy I. Bacterial cellulose: Biosynthesis, production, and applications. Adv Microb Physiol 2020; 77:89-138. [PMID: 34756212 DOI: 10.1016/bs.ampbs.2020.07.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Bacterial cellulose (BC) is a natural polymer produced by the acetic acid producing bacterium and has gathered much interest over the last decade for its biomedical and biotechnological applications. Unlike the plant derived cellulose nanofibres, which require pretreatment to deconstruct the recalcitrant lignocellulosic network, BC are 100% pure, and are extruded by cells as nanofibrils. Moreover, these nanofibrils can be converted to macrofibers that possess excellent material properties, surpassing even the strength of steel, and can be used as substitutes for fossil fuel derived synthetic fibers. The focus of the review is to present the fundamental long-term research on the influence of environmental factors on the organism's BC production capabilities, the production methods that are available for scaling up/scaled-up processes, and its use as a bulk commodity or for biomedical applications.
Collapse
Affiliation(s)
- Vijayendran Raghavendran
- Department of Materials Science and Engineering, Kroto Research Institute, University of Sheffield, Sheffield, United Kingdom
| | - Emmanuel Asare
- Department of Materials Science and Engineering, Kroto Research Institute, University of Sheffield, Sheffield, United Kingdom
| | - Ipsita Roy
- Department of Materials Science and Engineering, Kroto Research Institute, University of Sheffield, Sheffield, United Kingdom.
| |
Collapse
|
38
|
Agarwal A, Liu YA, McDowell C. 110th Anniversary: Ensemble-Based Machine Learning for Industrial Fermenter Classification and Foaming Control. Ind Eng Chem Res 2019. [DOI: 10.1021/acs.iecr.9b02424] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Aman Agarwal
- AspenTech-PetroChina Center of Excellence in Process System Engineering, Department of Chemical Engineering, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, United States
| | - Y. A. Liu
- AspenTech-PetroChina Center of Excellence in Process System Engineering, Department of Chemical Engineering, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, United States
| | - Christopher McDowell
- Novozymes Biologicals, Inc., 5400 Corporate Circle, Salem, Virginia 24153, United States
| |
Collapse
|
39
|
Zampieri G, Vijayakumar S, Yaneske E, Angione C. Machine and deep learning meet genome-scale metabolic modeling. PLoS Comput Biol 2019; 15:e1007084. [PMID: 31295267 PMCID: PMC6622478 DOI: 10.1371/journal.pcbi.1007084] [Citation(s) in RCA: 150] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Omic data analysis is steadily growing as a driver of basic and applied molecular biology research. Core to the interpretation of complex and heterogeneous biological phenotypes are computational approaches in the fields of statistics and machine learning. In parallel, constraint-based metabolic modeling has established itself as the main tool to investigate large-scale relationships between genotype, phenotype, and environment. The development and application of these methodological frameworks have occurred independently for the most part, whereas the potential of their integration for biological, biomedical, and biotechnological research is less known. Here, we describe how machine learning and constraint-based modeling can be combined, reviewing recent works at the intersection of both domains and discussing the mathematical and practical aspects involved. We overlap systematic classifications from both frameworks, making them accessible to nonexperts. Finally, we delineate potential future scenarios, propose new joint theoretical frameworks, and suggest concrete points of investigation for this joint subfield. A multiview approach merging experimental and knowledge-driven omic data through machine learning methods can incorporate key mechanistic information in an otherwise biologically-agnostic learning process.
Collapse
Affiliation(s)
- Guido Zampieri
- Department of Computer Science and Information Systems, Teesside University, Middlesbrough, United Kingdom
| | - Supreeta Vijayakumar
- Department of Computer Science and Information Systems, Teesside University, Middlesbrough, United Kingdom
| | - Elisabeth Yaneske
- Department of Computer Science and Information Systems, Teesside University, Middlesbrough, United Kingdom
| | - Claudio Angione
- Department of Computer Science and Information Systems, Teesside University, Middlesbrough, United Kingdom
- Healthcare Innovation Centre, Teesside University, Middlesbrough, United Kingdom
| |
Collapse
|
40
|
Presnell KV, Alper HS. Systems Metabolic Engineering Meets Machine Learning: A New Era for Data-Driven Metabolic Engineering. Biotechnol J 2019; 14:e1800416. [PMID: 30927499 DOI: 10.1002/biot.201800416] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Revised: 02/20/2019] [Indexed: 12/30/2022]
Abstract
The recent increase in high-throughput capacity of 'omics datasets combined with advances and interest in machine learning (ML) have created great opportunities for systems metabolic engineering. In this regard, data-driven modeling methods have become increasingly valuable to metabolic strain design. In this review, the nature of 'omics is discussed and a broad introduction to the ML algorithms combining these datasets into predictive models of metabolism and metabolic rewiring is provided. Next, this review highlights recent work in the literature that utilizes such data-driven methods to inform various metabolic engineering efforts for different classes of application including product maximization, understanding and profiling phenotypes, de novo metabolic pathway design, and creation of robust system-scale models for biotechnology. Overall, this review aims to highlight the potential and promise of using ML algorithms with metabolic engineering and systems biology related datasets.
Collapse
Affiliation(s)
- Kristin V Presnell
- McKetta Department of Chemical Engineering, The University of Texas at Austin, 200 E Dean Keeton St. Stop C0400, Austin, TX, 78712, USA
| | - Hal S Alper
- McKetta Department of Chemical Engineering, The University of Texas at Austin, 200 E Dean Keeton St. Stop C0400, Austin, TX, 78712, USA.,Institute for Cellular and Molecular Biology, The University of Texas at Austin, 100 E 24 St., Austin, TX, 78712, USA
| |
Collapse
|