1
|
Han YH, Kim HJ, Kim K, Yang J, Seo SW. Synthetic translational coupling system for accurate and predictable polycistronic gene expression control in bacteria. Metab Eng 2024; 88:148-159. [PMID: 39742955 DOI: 10.1016/j.ymben.2024.12.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 12/02/2024] [Accepted: 12/30/2024] [Indexed: 01/04/2025]
Abstract
Precise and predictable genetic elements are required to address various issues, such as suboptimal metabolic flux or imbalanced protein assembly caused by the inadequate control of polycistronic gene expression in bacteria. Here, we devised a synthetic biopart based on the translational coupling to control polycistronic gene expression. This module links the translation of genes within a polycistronic mRNA, maintaining their expression ratios regardless of coding sequences, transcription rate, and upstream gene translation rate. By engineering the Shine-Dalgarno sequences within these synthetic bioparts, we adjusted the expression ratios of polycistronic genes. We created 41 bioparts with varied relative expression ratios, ranging from 0.03 to 0.92, enabling precise control of pathway enzyme gene expression in a polycistronic manner. This led to up to a 7.6-fold increase in the production of valuable biochemicals such as 3-hydroxypropionic acid, poly(3-hydroxybutyrate), and lycopene. Our work provides genetic regulatory modules for precise and predictable polycistronic gene expression, facilitating efficient protein assembly, biosynthetic gene cluster expression, and pathway optimization.
Collapse
Affiliation(s)
- Yong Hee Han
- Interdisciplinary Program in Bioengineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea; School of Biological Sciences and Technology, Chonnam National University, 77 Yongbong-ro, Gwangju, 61186, South Korea; Institute of Systems Biology & Life Science Informatics, Chonnam National University, 77 Yongbong-ro, Gwangju, 61186, South Korea
| | - Hyeon Jin Kim
- Interdisciplinary Program in Bioengineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea
| | - Keonwoo Kim
- School of Chemical and Biological Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea
| | - Jina Yang
- Department of Chemical Engineering, Jeju National University, 102, Jejudaehak-ro, Jeju-si, Jeju-do, 63243, South Korea
| | - Sang Woo Seo
- Interdisciplinary Program in Bioengineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea; School of Chemical and Biological Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea; Institute of Chemical Processes, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea; Bio-MAX Institute, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea; Institute of Bio Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea.
| |
Collapse
|
2
|
Viswan NA, Tribut A, Gasparyan M, Radulescu O, Bhalla US. Mathematical basis and toolchain for hierarchical optimization of biochemical networks. PLoS Comput Biol 2024; 20:e1012624. [PMID: 39621764 PMCID: PMC11637339 DOI: 10.1371/journal.pcbi.1012624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Revised: 12/12/2024] [Accepted: 11/08/2024] [Indexed: 12/13/2024] Open
Abstract
Biological signalling systems are complex, and efforts to build mechanistic models must confront a huge parameter space, indirect and sparse data, and frequently encounter multiscale and multiphysics phenomena. We present HOSS, a framework for Hierarchical Optimization of Systems Simulations, to address such problems. HOSS operates by breaking down extensive systems models into individual pathway blocks organized in a nested hierarchy. At the first level, dependencies are solely on signalling inputs, and subsequent levels rely only on the preceding ones. We demonstrate that each independent pathway in every level can be efficiently optimized. Once optimized, its parameters are held constant while the pathway serves as input for succeeding levels. We develop an algorithmic approach to identify the necessary nested hierarchies for the application of HOSS in any given biochemical network. Furthermore, we devise two parallelizable variants that generate numerous model instances using stochastic scrambling of parameters during initial and intermediate stages of optimization. Our results indicate that these variants produce superior models and offer an estimate of solution degeneracy. Additionally, we showcase the effectiveness of the optimization methods for both abstracted, event-based simulations and ODE-based models.
Collapse
Affiliation(s)
- Nisha Ann Viswan
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India
- The University of Trans-Disciplinary Health Sciences and Technology, Bangalore, India
| | - Alexandre Tribut
- Laboratory of Pathogens and Host Immunity, University of Montpellier, CNRS and INSERM, Montpellier, France
- Ecole Centrale de Nantes, Nantes, France
| | - Manvel Gasparyan
- Laboratory of Pathogens and Host Immunity, University of Montpellier, CNRS and INSERM, Montpellier, France
| | - Ovidiu Radulescu
- Laboratory of Pathogens and Host Immunity, University of Montpellier, CNRS and INSERM, Montpellier, France
| | - Upinder S. Bhalla
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India
| |
Collapse
|
3
|
Wen X, Lin J, Yang C, Li Y, Cheng H, Liu Y, Zhang Y, Ma H, Mao Y, Liao X, Wang M. Automated characterization and analysis of expression compatibility between regulatory sequences and metabolic genes in Escherichia coli. Synth Syst Biotechnol 2024; 9:647-657. [PMID: 38817827 PMCID: PMC11137365 DOI: 10.1016/j.synbio.2024.05.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 05/11/2024] [Accepted: 05/16/2024] [Indexed: 06/01/2024] Open
Abstract
Utilizing standardized artificial regulatory sequences to fine-tuning the expression of multiple metabolic pathways/genes is a key strategy in the creation of efficient microbial cell factories. However, when regulatory sequence expression strengths are characterized using only a few reporter genes, they may not be applicable across diverse genes. This introduces great uncertainty into the precise regulation of multiple genes at multiple expression levels. To address this, our study adopted a fluorescent protein fusion strategy for a more accurate assessment of target protein expression levels. We combined 41 commonly-used metabolic genes with 15 regulatory sequences, yielding an expression dataset encompassing 520 unique combinations. This dataset highlighted substantial variation in protein expression level under identical regulatory sequences, with relative expression levels ranging from 2.8 to 176-fold. It also demonstrated that improving the strength of regulatory sequences does not necessarily lead to significant improvements in the expression levels of target proteins. Utilizing this dataset, we have developed various machine learning models and discovered that the integration of promoter regions, ribosome binding sites, and coding sequences significantly improves the accuracy of predicting protein expression levels, with a Spearman correlation coefficient of 0.72, where the promoter sequence exerts a predominant influence. Our study aims not only to provide a detailed guide for fine-tuning gene expression in the metabolic engineering of Escherichia coli but also to deepen our understanding of the compatibility issues between regulatory sequences and target genes.
Collapse
Affiliation(s)
- Xiao Wen
- School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin, 300308, China
| | - Jiawei Lin
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- School of Biological Engineering, Tianjin University of Science and Technology, Tianjin, 300457, China
| | - Chunhe Yang
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- School of Biological Engineering, Tianjin University of Science and Technology, Tianjin, 300457, China
| | - Ying Li
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- School of Biological Engineering, Tianjin University of Science and Technology, Tianjin, 300457, China
| | - Haijiao Cheng
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin, 300308, China
| | - Ye Liu
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin, 300308, China
| | - Yue Zhang
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin, 300308, China
| | - Hongwu Ma
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin, 300308, China
| | - Yufeng Mao
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin, 300308, China
| | - Xiaoping Liao
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin, 300308, China
| | - Meng Wang
- School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin, 300308, China
| |
Collapse
|
4
|
Corcoran WK, Cosio A, Edelstein HI, Leonard JN. Exploring structure-function relationships in engineered receptor performance using computational structure prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.07.622438. [PMID: 39574600 PMCID: PMC11581020 DOI: 10.1101/2024.11.07.622438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2024]
Abstract
Engineered receptors play increasingly important roles in transformative cell-based therapies. However, the structural mechanisms that drive differences in performance across receptor designs are often poorly understood. Recent advances in protein structural prediction tools have enabled the modeling of virtually any user-defined protein, but how these tools might build understanding of engineered receptors has yet to be fully explored. In this study, we employed structural modeling tools to perform post hoc analyses to investigate whether predicted structural features might explain observed functional variation. We selected a recently reported library of receptors derived from natural cytokine receptors as a case study, generated structural models, and from these predictions quantified a set of structural features that plausibly impact receptor performance. Encouragingly, for a subset of receptors, structural features explained considerable variation in performance, and trends were largely conserved across structurally diverse receptor sets. This work indicates potential for structure prediction-guided synthetic receptor engineering.
Collapse
Affiliation(s)
- William K. Corcoran
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois 60208, United States
- Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, Illinois 60208, United States
- Center for Synthetic Biology, Northwestern University, Evanston, Illinois 60208, United States
| | - Amparo Cosio
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois 60208, United States
- Center for Synthetic Biology, Northwestern University, Evanston, Illinois 60208, United States
| | - Hailey I. Edelstein
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois 60208, United States
- Center for Synthetic Biology, Northwestern University, Evanston, Illinois 60208, United States
| | - Joshua N. Leonard
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois 60208, United States
- Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, Illinois 60208, United States
- Center for Synthetic Biology, Northwestern University, Evanston, Illinois 60208, United States
- Chemistry of Life Processes Institute, Northwestern University, Evanston, Illinois 60208, United States
- Member, Robert H. Lurie Comprehensive Cancer Center, Northwestern University, Evanston, Illinois 60208, United States
| |
Collapse
|
5
|
Qin Z, Ren H, Zhao P, Wang K, Liu H, Miao C, Du Y, Li J, Wu L, Chen Z. Current computational tools for protein lysine acylation site prediction. Brief Bioinform 2024; 25:bbae469. [PMID: 39316944 PMCID: PMC11421846 DOI: 10.1093/bib/bbae469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 08/20/2024] [Accepted: 09/07/2024] [Indexed: 09/26/2024] Open
Abstract
As a main subtype of post-translational modification (PTM), protein lysine acylations (PLAs) play crucial roles in regulating diverse functions of proteins. With recent advancements in proteomics technology, the identification of PTM is becoming a data-rich field. A large amount of experimentally verified data is urgently required to be translated into valuable biological insights. With computational approaches, PLA can be accurately detected across the whole proteome, even for organisms with small-scale datasets. Herein, a comprehensive summary of 166 in silico PLA prediction methods is presented, including a single type of PLA site and multiple types of PLA sites. This recapitulation covers important aspects that are critical for the development of a robust predictor, including data collection and preparation, sample selection, feature representation, classification algorithm design, model evaluation, and method availability. Notably, we discuss the application of protein language models and transfer learning to solve the small-sample learning issue. We also highlight the prediction methods developed for functionally relevant PLA sites and species/substrate/cell-type-specific PLA sites. In conclusion, this systematic review could potentially facilitate the development of novel PLA predictors and offer useful insights to researchers from various disciplines.
Collapse
Affiliation(s)
- Zhaohui Qin
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Haoran Ren
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Kaiyuan Wang
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Huixia Liu
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Chunbo Miao
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Yanxiu Du
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Junzhou Li
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Liuji Wu
- National Key Laboratory of Wheat and Maize Crop Science, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| |
Collapse
|
6
|
Chen V, Yang M, Cui W, Kim JS, Talwalkar A, Ma J. Applying interpretable machine learning in computational biology-pitfalls, recommendations and opportunities for new developments. Nat Methods 2024; 21:1454-1461. [PMID: 39122941 PMCID: PMC11348280 DOI: 10.1038/s41592-024-02359-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 06/24/2024] [Indexed: 08/12/2024]
Abstract
Recent advances in machine learning have enabled the development of next-generation predictive models for complex computational biology problems, thereby spurring the use of interpretable machine learning (IML) to unveil biological insights. However, guidelines for using IML in computational biology are generally underdeveloped. We provide an overview of IML methods and evaluation techniques and discuss common pitfalls encountered when applying IML methods to computational biology problems. We also highlight open questions, especially in the era of large language models, and call for collaboration between IML and computational biology researchers.
Collapse
Affiliation(s)
- Valerie Chen
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Muyu Yang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Wenbo Cui
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Joon Sik Kim
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ameet Talwalkar
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
7
|
Wan F, Wong F, Collins JJ, de la Fuente-Nunez C. Machine learning for antimicrobial peptide identification and design. NATURE REVIEWS BIOENGINEERING 2024; 2:392-407. [PMID: 39850516 PMCID: PMC11756916 DOI: 10.1038/s44222-024-00152-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2025]
Abstract
Artificial intelligence (AI) and machine learning (ML) models are being deployed in many domains of society and have recently reached the field of drug discovery. Given the increasing prevalence of antimicrobial resistance, as well as the challenges intrinsic to antibiotic development, there is an urgent need to accelerate the design of new antimicrobial therapies. Antimicrobial peptides (AMPs) are therapeutic agents for treating bacterial infections, but their translation into the clinic has been slow owing to toxicity, poor stability, limited cellular penetration and high cost, among other issues. Recent advances in AI and ML have led to breakthroughs in our abilities to predict biomolecular properties and structures and to generate new molecules. The ML-based modelling of peptides may overcome some of the disadvantages associated with traditional drug discovery and aid the rapid development and translation of AMPs. Here, we provide an introduction to this emerging field and survey ML approaches that can be used to address issues currently hindering AMP development. We also outline important limitations that can be addressed for the broader adoption of AMPs in clinical practice, as well as new opportunities in data-driven peptide design.
Collapse
Affiliation(s)
- Fangping Wan
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- These authors contributed equally: Fangping Wan, Felix Wong
| | - Felix Wong
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- These authors contributed equally: Fangping Wan, Felix Wong
| | - James J. Collins
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
- These authors jointly supervised this work: James J. Collins, Cesar de la Fuente-Nunez
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- These authors jointly supervised this work: James J. Collins, Cesar de la Fuente-Nunez
| |
Collapse
|
8
|
Mekki YM. Physicians should build their own machine-learning models. PATTERNS (NEW YORK, N.Y.) 2024; 5:100948. [PMID: 38487798 PMCID: PMC10935494 DOI: 10.1016/j.patter.2024.100948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/17/2024]
Abstract
Yosra Mekki suggests that doctors should have the ability to develop their own machine-learning models. She proposes an approach with the "spotlight" on physicians, to create user-friendly frameworks that allow doctors to develop customized models without requiring extensive previous knowledge of machine learning.
Collapse
Affiliation(s)
- Yosra Magdi Mekki
- College of Medicine, Qatar University, QU Health, Qatar University, PO Box 2713, Doha, Qatar
- Machine Learning Lab Group, College of Engineering, Qatar University, PO Box 2713, Doha, Qatar
- The Clinician Engineer Hub, University of Cambridge Hospitals, Cambridge, UK
| |
Collapse
|
9
|
Wang H, Du Q, Wang Y, Xu H, Wei Z, Wang X. GPro: generative AI-empowered toolkit for promoter design. Bioinformatics 2024; 40:btae123. [PMID: 38429953 PMCID: PMC10937896 DOI: 10.1093/bioinformatics/btae123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 02/19/2024] [Accepted: 02/28/2024] [Indexed: 03/03/2024] Open
Abstract
MOTIVATION Promoters with desirable properties are crucial in biotechnological applications. Generative AI (GenAI) has demonstrated potential in creating novel synthetic promoters with significantly enhanced functionality. However, these methods' reliance on various programming frameworks and specific task-oriented contexts limits their flexibilities. Overcoming these limitations is essential for researchers to fully leverage the power of GenAI to design promoters for their tasks. RESULTS Here, we introduce GPro (Generative AI-empowered toolkit for promoter design), a user-friendly toolkit that integrates a collection of cutting-edge GenAI-empowered approaches for promoter design. This toolkit provides a standardized pipeline covering essential promoter design processes, including training, optimization, and evaluation. Several detailed demos are provided to reproduce state-of-the-art promoter design pipelines. GPro's user-friendly interface makes it accessible to a wide range of users including non-AI experts. It also offers a variety of optional algorithms for each design process, and gives users the flexibility to compare methods and create customized pipelines. AVAILABILITY AND IMPLEMENTATION GPro is released as an open-source software under the MIT license. The source code for GPro is available on GitHub for Linux, macOS, and Windows: https://github.com/WangLabTHU/GPro, and is available for download via Zenodo repository at https://zenodo.org/doi/10.5281/zenodo.10681733.
Collapse
Affiliation(s)
- Haochen Wang
- Ministry of Education Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
- Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China
- Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Qixiu Du
- Ministry of Education Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
- Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China
- Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Ye Wang
- Ministry of Education Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
- Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China
- Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Hanwen Xu
- Ministry of Education Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
- Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China
- Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Zheng Wei
- Ministry of Education Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
- Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China
- Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
- Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China
- Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|