1
|
Zhou J, Huang M. Navigating the landscape of enzyme design: from molecular simulations to machine learning. Chem Soc Rev 2024; 53:8202-8239. [PMID: 38990263 DOI: 10.1039/d4cs00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Global environmental issues and sustainable development call for new technologies for fine chemical synthesis and waste valorization. Biocatalysis has attracted great attention as the alternative to the traditional organic synthesis. However, it is challenging to navigate the vast sequence space to identify those proteins with admirable biocatalytic functions. The recent development of deep-learning based structure prediction methods such as AlphaFold2 reinforced by different computational simulations or multiscale calculations has largely expanded the 3D structure databases and enabled structure-based design. While structure-based approaches shed light on site-specific enzyme engineering, they are not suitable for large-scale screening of potential biocatalysts. Effective utilization of big data using machine learning techniques opens up a new era for accelerated predictions. Here, we review the approaches and applications of structure-based and machine-learning guided enzyme design. We also provide our view on the challenges and perspectives on effectively employing enzyme design approaches integrating traditional molecular simulations and machine learning, and the importance of database construction and algorithm development in attaining predictive ML models to explore the sequence fitness landscape for the design of admirable biocatalysts.
Collapse
Affiliation(s)
- Jiahui Zhou
- School of Chemistry and Chemical Engineering, Queen's University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, UK.
| | - Meilan Huang
- School of Chemistry and Chemical Engineering, Queen's University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, UK.
| |
Collapse
|
2
|
He H, Chen J, Xie J, Ding J, Pan H, Li Y, Jia H. Engineering UDP-Glycosyltransferase UGTPg29 for the Efficient Synthesis of Ginsenoside Rg3 from Protopanaxadiol. Appl Biochem Biotechnol 2024:10.1007/s12010-024-05009-y. [PMID: 39120838 DOI: 10.1007/s12010-024-05009-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/23/2024] [Indexed: 08/10/2024]
Abstract
Rare ginsenosides Rg3 and Rh2, which exhibit diverse pharmacological effects, are derivatives of protopanaxadiol (PPD). UDP-glycosyltransferases, such as the M315F variant of Bs-YjiC (Bs-YjiCm) from Bacillus subtilis and UGTPg29 from Panax ginseng, can efficiently convert PPD into Rh2 and Rh2 into Rg3, respectively. In the present study, the N178I mutation of Bs-YjiCm was introduced, resulting in an increase in Rh2 production. UDP-glycosyltransferase UGTPg29 was then engineered to improve its robustness through semi-rational design. The variant R91M/D184M/A287V/A342L, which indicated desirable stability and activity, was utilized in coupling with the N178I variant of Bs-YjiCm and sucrose synthase AtSuSy from Arabidopsis thaliana to set up a "one-pot" three-enzyme reaction for the biosynthesis of Rg3. The influential factors, including the ratio and concentration of UDP-glycosyltransferases, pH, and the concentrations of UDP, sucrose, and DMSO, were optimized. On this basis, a fed-batch strategy was adopted to achieve a Rg3 yield as high as 12.38 mM (9.72 g/L) with a final yield of 68.78% within 24 h. This work may provide promising UDP-glycosyltransferase candidates for ginsenoside biosynthesis.
Collapse
Affiliation(s)
- Huichang He
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, 211816, China
| | - Jiajie Chen
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, 211816, China
| | - Jiangtao Xie
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, 211816, China
| | - Jiajie Ding
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, 211816, China
| | - Huayi Pan
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, 211816, China
| | - Yan Li
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, 211816, China.
| | - Honghua Jia
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, 211816, China
| |
Collapse
|
3
|
Xu B, Liu LH, Lai S, Chen J, Wu S, Lei W, Lin H, Zhang Y, Hu Y, He J, Chen X, He Q, Yang M, Wang H, Zhao X, Wang M, Luo H, Ge Q, Gao H, Xia J, Cao Z, Zhang B, Jiang A, Wu YR. Directed Evolution of Escherichia coli Nissle 1917 to Utilize Allulose as Sole Carbon Source. SMALL METHODS 2024; 8:e2301385. [PMID: 38415955 DOI: 10.1002/smtd.202301385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/19/2024] [Indexed: 02/29/2024]
Abstract
Sugar substitutes are popular due to their akin taste and low calories. However, excessive use of aspartame and erythritol can have varying effects. While D-allulose is presently deemed a secure alternative to sugar, its excessive consumption is not devoid of cellular stress implications. In this study, the evolution of Escherichia coli Nissle 1917 (EcN) is directed to utilize allulose as sole carbon source through a combination of adaptive laboratory evolution (ALE) and fluorescence-activated droplet sorting (FADS) techniques. Employing whole genome sequencing (WGS) and clustered regularly interspaced short palindromic repeats interference (CRISPRi) in conjunction with compensatory expression displayed those genetic mutations in sugar and amino acid metabolic pathways, including glnP, glpF, gmpA, nagE, pgmB, ybaN, etc., increased allulose assimilation. Enzyme-substrate dynamics simulations and deep learning predict enhanced substrate specificity and catalytic efficiency in nagE A247E and pgmB G12R mutants. The findings evince that these mutations hold considerable promise in enhancing allulose uptake and facilitating its conversion into glycolysis, thus signifying the emergence of a novel metabolic pathway for allulose utilization. These revelations bear immense potential for the sustainable utilization of D-allulose in promoting health and well-being.
Collapse
Affiliation(s)
- Bo Xu
- School of Basic Medical Sciences, Hubei University of Science and Technology, Xianning, 437100, P. R. China
| | - Li-Hua Liu
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
- Biology Department and Institute of Marine Sciences, College of Science, Shantou University, Shantou, 515063, P. R. China
| | - Shijing Lai
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
| | - Jingjing Chen
- Yeasen Biotechnology (Shanghai) Co., Ltd, Shanghai, 200000, P. R. China
| | - Song Wu
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
| | - Wei Lei
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
| | - Houliang Lin
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
| | - Yu Zhang
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
| | - Yucheng Hu
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
- College of Chemistry and Chemical Engineering, Southwest University, Chongqing, 400715, P. R. China
| | - Jingtao He
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
| | - Xipeng Chen
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
| | - Qian He
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
| | - Min Yang
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
| | - Haimei Wang
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
| | - Xuemei Zhao
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
| | - Man Wang
- Yeasen Biotechnology (Shanghai) Co., Ltd, Shanghai, 200000, P. R. China
| | - Haodong Luo
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
- Biology Department and Institute of Marine Sciences, College of Science, Shantou University, Shantou, 515063, P. R. China
| | - Qijun Ge
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
| | - Huamei Gao
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
| | - Jiaqi Xia
- School of Basic Medicine, Jiamusi University, Jiamusi, 154000, P. R. China
| | - Zhen Cao
- Yeasen Biotechnology (Shanghai) Co., Ltd, Shanghai, 200000, P. R. China
| | - Baoxun Zhang
- College of Chemistry and Chemical Engineering, Southwest University, Chongqing, 400715, P. R. China
| | - Ao Jiang
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
| | - Yi-Rui Wu
- Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd, Guangzhou, Guangdong, 510000, P. R. China
| |
Collapse
|
4
|
Wang J, Yang Z, Chen C, Yao G, Wan X, Bao S, Ding J, Wang L, Jiang H. MPEK: a multitask deep learning framework based on pretrained language models for enzymatic reaction kinetic parameters prediction. Brief Bioinform 2024; 25:bbae387. [PMID: 39129365 PMCID: PMC11317537 DOI: 10.1093/bib/bbae387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 06/24/2024] [Accepted: 07/23/2024] [Indexed: 08/13/2024] Open
Abstract
Enzymatic reaction kinetics are central in analyzing enzymatic reaction mechanisms and target-enzyme optimization, and thus in biomanufacturing and other industries. The enzyme turnover number (kcat) and Michaelis constant (Km), key kinetic parameters for measuring enzyme catalytic efficiency, are crucial for analyzing enzymatic reaction mechanisms and the directed evolution of target enzymes. Experimental determination of kcat and Km is costly in terms of time, labor, and cost. To consider the intrinsic connection between kcat and Km and further improve the prediction performance, we propose a universal pretrained multitask deep learning model, MPEK, to predict these parameters simultaneously while considering pH, temperature, and organismal information. Through testing on the same kcat and Km test datasets, MPEK demonstrated superior prediction performance over the previous models. Specifically, MPEK achieved the Pearson coefficient of 0.808 for predicting kcat, improving ca. 14.6% and 7.6% compared to the DLKcat and UniKP models, and it achieved the Pearson coefficient of 0.777 for predicting Km, improving ca. 34.9% and 53.3% compared to the Kroll_model and UniKP models. More importantly, MPEK was able to reveal enzyme promiscuity and was sensitive to slight changes in the mutant enzyme sequence. In addition, in three case studies, it was shown that MPEK has the potential for assisted enzyme mining and directed evolution. To facilitate in silico evaluation of enzyme catalytic efficiency, we have established a web server implementing this model, which can be accessed at http://mathtc.nscc-tj.cn/mpek.
Collapse
Affiliation(s)
- Jingjing Wang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Zhijiang Yang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Chang Chen
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Ge Yao
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Xiukun Wan
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Shaoheng Bao
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| |
Collapse
|
5
|
Lu H, Xiao L, Liao W, Yan X, Nielsen J. Cell factory design with advanced metabolic modelling empowered by artificial intelligence. Metab Eng 2024; 85:61-72. [PMID: 39038602 DOI: 10.1016/j.ymben.2024.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/06/2024] [Accepted: 07/06/2024] [Indexed: 07/24/2024]
Abstract
Advances in synthetic biology and artificial intelligence (AI) have provided new opportunities for modern biotechnology. High-performance cell factories, the backbone of industrial biotechnology, are ultimately responsible for determining whether a bio-based product succeeds or fails in the fierce competition with petroleum-based products. To date, one of the greatest challenges in synthetic biology is the creation of high-performance cell factories in a consistent and efficient manner. As so-called white-box models, numerous metabolic network models have been developed and used in computational strain design. Moreover, great progress has been made in AI-powered strain engineering in recent years. Both approaches have advantages and disadvantages. Therefore, the deep integration of AI with metabolic models is crucial for the construction of superior cell factories with higher titres, yields and production rates. The detailed applications of the latest advanced metabolic models and AI in computational strain design are summarized in this review. Additionally, approaches for the deep integration of AI and metabolic models are discussed. It is anticipated that advanced mechanistic metabolic models powered by AI will pave the way for the efficient construction of powerful industrial chassis strains in the coming years.
Collapse
Affiliation(s)
- Hongzhong Lu
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China.
| | - Luchi Xiao
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China
| | - Wenbin Liao
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China; Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, 200237, PR China
| | - Xuefeng Yan
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, 200237, PR China
| | - Jens Nielsen
- BioInnovation Institute, Ole Måløes Vej, DK2200, Copenhagen N, Denmark; Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden.
| |
Collapse
|
6
|
Shi Z, Wang D, Li Y, Deng R, Lin J, Liu C, Li H, Wang R, Zhao M, Mao Z, Yuan Q, Liao X, Ma H. REME: an integrated platform for reaction enzyme mining and evaluation. Nucleic Acids Res 2024; 52:W299-W305. [PMID: 38769057 PMCID: PMC11223788 DOI: 10.1093/nar/gkae405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 04/16/2024] [Accepted: 05/01/2024] [Indexed: 05/22/2024] Open
Abstract
A key challenge in pathway design is finding proper enzymes that can be engineered to catalyze a non-natural reaction. Although existing tools can identify potential enzymes based on similar reactions, these tools encounter several issues. Firstly, the calculated similar reactions may not even have the same reaction type. Secondly, the associated enzymes are often numerous and identifying the most promising candidate enzymes is difficult due to the lack of data for evaluation. Thirdly, existing web tools do not provide interactive functions that enable users to fine-tune results based on their expertise. Here, we present REME (https://reme.biodesign.ac.cn/), the first integrated web platform for reaction enzyme mining and evaluation. Combining atom-to-atom mapping, atom type change identification, and reaction similarity calculation enables quick ranking and visualization of reactions similar to an objective non-natural reaction. Additional functionality enables users to filter similar reactions by their specified functional groups and candidate enzymes can be further filtered (e.g. by organisms) or expanded by Enzyme Commission number (EC) or sequence homology. Afterward, enzyme attributes (such as kcat, Km, optimal temperature and pH) can be assessed with deep learning-based methods, facilitating the swift identification of potential enzymes that can catalyze the non-natural reaction.
Collapse
Affiliation(s)
- Zhenkun Shi
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Dehang Wang
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, PR China
| | - Yang Li
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- University of Chinese Academy of Sciences, Beijing 101408, PR China
| | - Rui Deng
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, PR China
| | - Jiawei Lin
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, PR China
| | - Cui Liu
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Haoran Li
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Ruoyu Wang
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Muqiang Zhao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Zhitao Mao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Qianqian Yuan
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Xiaoping Liao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- Haihe Laboratory of Synthetic Biology, Tianjin 300308, PR China
| | - Hongwu Ma
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| |
Collapse
|
7
|
Puniya BL, Verma M, Damiani C, Bakr S, Dräger A. Perspectives on computational modeling of biological systems and the significance of the SysMod community. BIOINFORMATICS ADVANCES 2024; 4:vbae090. [PMID: 38948011 PMCID: PMC11213628 DOI: 10.1093/bioadv/vbae090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 05/12/2024] [Accepted: 06/14/2024] [Indexed: 07/02/2024]
Abstract
Motivation In recent years, applying computational modeling to systems biology has caused a substantial surge in both discovery and practical applications and a significant shift in our understanding of the complexity inherent in biological systems. Results In this perspective article, we briefly overview computational modeling in biology, highlighting recent advancements such as multi-scale modeling due to the omics revolution, single-cell technology, and integration of artificial intelligence and machine learning approaches. We also discuss the primary challenges faced: integration, standardization, model complexity, scalability, and interdisciplinary collaboration. Lastly, we highlight the contribution made by the Computational Modeling of Biological Systems (SysMod) Community of Special Interest (COSI) associated with the International Society of Computational Biology (ISCB) in driving progress within this rapidly evolving field through community engagement (via both in person and virtual meetings, social media interactions), webinars, and conferences. Availability and implementation Additional information about SysMod is available at https://sysmod.info.
Collapse
Affiliation(s)
- Bhanwar Lal Puniya
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, NE 68588, United States
| | - Meghna Verma
- Systems Medicine, Clinical Pharmacology and Quantitative Pharmacology, R&D BioPharmaceuticals, AstraZeneca, Gaithersburg, MD 20878, United States
| | - Chiara Damiani
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan 20126, Italy
| | - Shaimaa Bakr
- Department of Medicine, Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, Stanford, CA 94305-5479, United States
| | - Andreas Dräger
- Computational Systems Biology of Infections and Antimicrobial-Resistant Pathogens, Cluster of Excellence ‘Controlling Microbes to Fight Infections’, Institute for Bioinformatics and Medical Informatics (IBMI), Eberhard Karl University of Tübingen, Tübingen 72076, Germany
- German Center for Infection Research (DZIF), partner site Tübingen, Tübingen 72076, Germany
- Quantitative Biology Center (QBiC), Eberhard Karl University of Tübingen, Tübingen 72076, Germany
- Data Analytics and Bioinformatics, Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle (Saale) 06120, Germany
| |
Collapse
|
8
|
Kroll A, Ranjan S, Lercher MJ. A multimodal Transformer Network for protein-small molecule interactions enhances predictions of kinase inhibition and enzyme-substrate relationships. PLoS Comput Biol 2024; 20:e1012100. [PMID: 38768223 PMCID: PMC11142704 DOI: 10.1371/journal.pcbi.1012100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 05/31/2024] [Accepted: 04/24/2024] [Indexed: 05/22/2024] Open
Abstract
The activities of most enzymes and drugs depend on interactions between proteins and small molecules. Accurate prediction of these interactions could greatly accelerate pharmaceutical and biotechnological research. Current machine learning models designed for this task have a limited ability to generalize beyond the proteins used for training. This limitation is likely due to a lack of information exchange between the protein and the small molecule during the generation of the required numerical representations. Here, we introduce ProSmith, a machine learning framework that employs a multimodal Transformer Network to simultaneously process protein amino acid sequences and small molecule strings in the same input. This approach facilitates the exchange of all relevant information between the two molecule types during the computation of their numerical representations, allowing the model to account for their structural and functional interactions. Our final model combines gradient boosting predictions based on the resulting multimodal Transformer Network with independent predictions based on separate deep learning representations of the proteins and small molecules. The resulting predictions outperform recently published state-of-the-art models for predicting protein-small molecule interactions across three diverse tasks: predicting kinase inhibitions; inferring potential substrates for enzymes; and predicting Michaelis constants KM. The Python code provided can be used to easily implement and improve machine learning predictions involving arbitrary protein-small molecule interactions.
Collapse
Affiliation(s)
- Alexander Kroll
- Institute for Computer Science and Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| | - Sahasra Ranjan
- Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, India
| | - Martin J. Lercher
- Institute for Computer Science and Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
9
|
Tripp A, Braun M, Wieser F, Oberdorfer G, Lechner H. Click, Compute, Create: A Review of Web-based Tools for Enzyme Engineering. Chembiochem 2024:e202400092. [PMID: 38634409 DOI: 10.1002/cbic.202400092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/14/2024] [Accepted: 04/15/2024] [Indexed: 04/19/2024]
Abstract
Enzyme engineering, though pivotal across various biotechnological domains, is often plagued by its time-consuming and labor-intensive nature. This review aims to offer an overview of supportive in silico methodologies for this demanding endeavor. Starting from methods to predict protein structures, to classification of their activity and even the discovery of new enzymes we continue with describing tools used to increase thermostability and production yields of selected targets. Subsequently, we discuss computational methods to modulate both, the activity as well as selectivity of enzymes. Last, we present recent approaches based on cutting-edge machine learning methods to redesign enzymes. With exception of the last chapter, there is a strong focus on methods easily accessible via web-interfaces or simple Python-scripts, therefore readily useable for a diverse and broad community.
Collapse
Affiliation(s)
- Adrian Tripp
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Markus Braun
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Florian Wieser
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Gustav Oberdorfer
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
- BioTechMed, Graz, Austria
| | - Horst Lechner
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
- BioTechMed, Graz, Austria
| |
Collapse
|
10
|
He X, Yan M. GraphKM: machine and deep learning for K M prediction of wildtype and mutant enzymes. BMC Bioinformatics 2024; 25:135. [PMID: 38549073 PMCID: PMC10979596 DOI: 10.1186/s12859-024-05746-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 03/14/2024] [Indexed: 04/01/2024] Open
Abstract
Michaelis constant (KM) is one of essential parameters for enzymes kinetics in the fields of protein engineering, enzyme engineering, and synthetic biology. As overwhelming experimental measurements of KM are difficult and time-consuming, prediction of the KM values from machine and deep learning models would increase the pace of the enzymes kinetics studies. Existing machine and deep learning models are limited to the specific enzymes, i.e., a minority of enzymes or wildtype enzymes. Here, we used a deep learning framework PaddlePaddle to implement a machine and deep learning approach (GraphKM) for KM prediction of wildtype and mutant enzymes. GraphKM is composed by graph neural networks (GNN), fully connected layers and gradient boosting framework. We represented the substrates through molecular graph and the enzymes through a pretrained transformer-based language model to construct the model inputs. We compared the difference of the model results made by the different GNN (GIN, GAT, GCN, and GAT-GCN). The GAT-GCN-based model generally outperformed. To evaluate the prediction performance of the GraphKM and other reported KM prediction models, we collected an independent KM dataset (HXKm) from literatures.
Collapse
Affiliation(s)
- Xiao He
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, China
| | - Ming Yan
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, China.
| |
Collapse
|
11
|
Kugler A, Stensjö K. Machine learning predicts system-wide metabolic flux control in cyanobacteria. Metab Eng 2024; 82:171-182. [PMID: 38395194 DOI: 10.1016/j.ymben.2024.02.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 02/14/2024] [Accepted: 02/20/2024] [Indexed: 02/25/2024]
Abstract
Metabolic fluxes and their control mechanisms are fundamental in cellular metabolism, offering insights for the study of biological systems and biotechnological applications. However, quantitative and predictive understanding of controlling biochemical reactions in microbial cell factories, especially at the system level, is limited. In this work, we present ARCTICA, a computational framework that integrates constraint-based modelling with machine learning tools to address this challenge. Using the model cyanobacterium Synechocystis sp. PCC 6803 as chassis, we demonstrate that ARCTICA effectively simulates global-scale metabolic flux control. Key findings are that (i) the photosynthetic bioproduction is mainly governed by enzymes within the Calvin-Benson-Bassham (CBB) cycle, rather than by those involve in the biosynthesis of the end-product, (ii) the catalytic capacity of the CBB cycle limits the photosynthetic activity and downstream pathways and (iii) ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) is a major, but not the most, limiting step within the CBB cycle. Predicted metabolic reactions qualitatively align with prior experimental observations, validating our modelling approach. ARCTICA serves as a valuable pipeline for understanding cellular physiology and predicting rate-limiting steps in genome-scale metabolic networks, and thus provides guidance for bioengineering of cyanobacteria.
Collapse
Affiliation(s)
- Amit Kugler
- Microbial Chemistry, Department of Chemistry-Ångström Laboratory, Uppsala University, Box 523, SE-751 20, Uppsala, Sweden
| | - Karin Stensjö
- Microbial Chemistry, Department of Chemistry-Ångström Laboratory, Uppsala University, Box 523, SE-751 20, Uppsala, Sweden.
| |
Collapse
|
12
|
Sun BJ, Li WM, Lv P, Wen GN, Wu DY, Tao SA, Liao ML, Yu CQ, Jiang ZW, Wang Y, Xie HX, Wang XF, Chen ZQ, Liu F, Du WG. Genetically Encoded Lizard Color Divergence for Camouflage and Thermoregulation. Mol Biol Evol 2024; 41:msae009. [PMID: 38243850 PMCID: PMC10835340 DOI: 10.1093/molbev/msae009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 01/03/2024] [Accepted: 01/08/2024] [Indexed: 01/22/2024] Open
Abstract
Local adaptation is critical in speciation and evolution, yet comprehensive studies on proximate and ultimate causes of local adaptation are generally scarce. Here, we integrated field ecological experiments, genome sequencing, and genetic verification to demonstrate both driving forces and molecular mechanisms governing local adaptation of body coloration in a lizard from the Qinghai-Tibet Plateau. We found dark lizards from the cold meadow population had lower spectrum reflectance but higher melanin contents than light counterparts from the warm dune population. Additionally, the colorations of both dark and light lizards facilitated the camouflage and thermoregulation in their respective microhabitat simultaneously. More importantly, by genome resequencing analysis, we detected a novel mutation in Tyrp1 that underpinned this color adaptation. The allele frequencies at the site of SNP 459# in the gene of Tyrp1 are 22.22% G/C and 77.78% C/C in dark lizards and 100% G/G in light lizards. Model-predicted structure and catalytic activity showed that this mutation increased structure flexibility and catalytic activity in enzyme TYRP1, and thereby facilitated the generation of eumelanin in dark lizards. The function of the mutation in Tyrp1 was further verified by more melanin contents and darker coloration detected in the zebrafish injected with the genotype of Tyrp1 from dark lizards. Therefore, our study demonstrates that a novel mutation of a major melanin-generating gene underpins skin color variation co-selected by camouflage and thermoregulation in a lizard. The resulting strong selection may reinforce adaptive genetic divergence and enable the persistence of adjacent populations with distinct body coloration.
Collapse
Affiliation(s)
- Bao-Jun Sun
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Wei-Ming Li
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Peng Lv
- State Key Laboratory of Membrane Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guan-Nan Wen
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Dan-Yang Wu
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Shi-Ang Tao
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Ming-Ling Liao
- The Key Laboratory of Mariculture, Ministry of Education, Fisheries College, Ocean University of China, Qingdao 266003, China
| | - Chang-Qing Yu
- Ecology Laboratory, Beijing Ecotech Science and Technology Ltd, Beijing 100190, China
| | - Zhong-Wen Jiang
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Yang Wang
- Key Laboratory of Animal Physiology, Biochemistry and Molecular Biology of Hebei Province, College of Life Sciences, Hebei Normal University, Shijiazhuang 050024, China
| | - Hong-Xin Xie
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Xi-Feng Wang
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | | | - Feng Liu
- State Key Laboratory of Membrane Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Wei-Guo Du
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
13
|
Yu H, Deng H, He J, Keasling JD, Luo X. UniKP: a unified framework for the prediction of enzyme kinetic parameters. Nat Commun 2023; 14:8211. [PMID: 38081905 PMCID: PMC10713628 DOI: 10.1038/s41467-023-44113-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 11/30/2023] [Indexed: 12/18/2023] Open
Abstract
Prediction of enzyme kinetic parameters is essential for designing and optimizing enzymes for various biotechnological and industrial applications, but the limited performance of current prediction tools on diverse tasks hinders their practical applications. Here, we introduce UniKP, a unified framework based on pretrained language models for the prediction of enzyme kinetic parameters, including enzyme turnover number (kcat), Michaelis constant (Km), and catalytic efficiency (kcat / Km), from protein sequences and substrate structures. A two-layer framework derived from UniKP (EF-UniKP) has also been proposed to allow robust kcat prediction in considering environmental factors, including pH and temperature. In addition, four representative re-weighting methods are systematically explored to successfully reduce the prediction error in high-value prediction tasks. We have demonstrated the application of UniKP and EF-UniKP in several enzyme discovery and directed evolution tasks, leading to the identification of new enzymes and enzyme mutants with higher activity. UniKP is a valuable tool for deciphering the mechanisms of enzyme kinetics and enables novel insights into enzyme engineering and their industrial applications.
Collapse
Affiliation(s)
- Han Yu
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Huaxiang Deng
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Jiahui He
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Jay D Keasling
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Joint BioEnergy Institute, Emeryville, CA, 94608, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Department of Chemical and Biomolecular Engineering & Department of Bioengineering, University of California, Berkeley, CA, 94720, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800, Kgs, Lyngby, Denmark
| | - Xiaozhou Luo
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| |
Collapse
|
14
|
Liu H, Guan F, Liu T, Yang L, Fan L, Liu X, Luo H, Wu N, Yao B, Tian J, Huang H. MECE: a method for enhancing the catalytic efficiency of glycoside hydrolase based on deep neural networks and molecular evolution. Sci Bull (Beijing) 2023; 68:2793-2805. [PMID: 37867059 DOI: 10.1016/j.scib.2023.09.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/14/2023] [Accepted: 09/25/2023] [Indexed: 10/24/2023]
Abstract
The demand for high efficiency glycoside hydrolases (GHs) is on the rise due to their various industrial applications. However, improving the catalytic efficiency of an enzyme remains a challenge. This investigation showcases the capability of a deep neural network and method for enhancing the catalytic efficiency (MECE) platform to predict mutations that improve catalytic activity in GHs. The MECE platform includes DeepGH, a deep learning model that is able to identify GH families and functional residues. This model was developed utilizing 119 GH family protein sequences obtained from the Carbohydrate-Active enZYmes (CAZy) database. After undergoing ten-fold cross-validation, the DeepGH models exhibited a predictive accuracy of 96.73%. The utilization of gradient-weighted class activation mapping (Grad-CAM) was used to aid us in comprehending the classification features, which in turn facilitated the creation of enzyme mutants. As a result, the MECE platform was validated with the development of CHIS1754-MUT7, a mutant that boasts seven amino acid substitutions. The kcat/Km of CHIS1754-MUT7 was found to be 23.53 times greater than that of the wild type CHIS1754. Due to its high computational efficiency and low experimental cost, this method offers significant advantages and presents a novel approach for the intelligent design of enzyme catalytic efficiency. As a result, it holds great promise for a wide range of applications.
Collapse
Affiliation(s)
- Hanqing Liu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China; Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Feifei Guan
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| | - Tuoyu Liu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Lixin Yang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Lingxi Fan
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Xiaoqing Liu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Huiying Luo
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Ningfeng Wu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Bin Yao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Jian Tian
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China; Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| | - Huoqing Huang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| |
Collapse
|
15
|
Prešern U, Goličnik M. Enzyme Databases in the Era of Omics and Artificial Intelligence. Int J Mol Sci 2023; 24:16918. [PMID: 38069254 PMCID: PMC10707154 DOI: 10.3390/ijms242316918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 11/24/2023] [Accepted: 11/26/2023] [Indexed: 12/18/2023] Open
Abstract
Enzyme research is important for the development of various scientific fields such as medicine and biotechnology. Enzyme databases facilitate this research by providing a wide range of information relevant to research planning and data analysis. Over the years, various databases that cover different aspects of enzyme biology (e.g., kinetic parameters, enzyme occurrence, and reaction mechanisms) have been developed. Most of the databases are curated manually, which improves reliability of the information; however, such curation cannot keep pace with the exponential growth in published data. Lack of data standardization is another obstacle for data extraction and analysis. Improving machine readability of databases is especially important in the light of recent advances in deep learning algorithms that require big training datasets. This review provides information regarding the current state of enzyme databases, especially in relation to the ever-increasing amount of generated research data and recent advancements in artificial intelligence algorithms. Furthermore, it describes several enzyme databases, providing the reader with necessary information for their use.
Collapse
Affiliation(s)
| | - Marko Goličnik
- Institute of Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Vrazov trg 2, 1000 Ljubljana, Slovenia;
| |
Collapse
|
16
|
Qiu S, Zhao S, Yang A. DLTKcat: deep learning-based prediction of temperature-dependent enzyme turnover rates. Brief Bioinform 2023; 25:bbad506. [PMID: 38189538 PMCID: PMC10772988 DOI: 10.1093/bib/bbad506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 11/29/2023] [Accepted: 12/08/2023] [Indexed: 01/09/2024] Open
Abstract
The enzyme turnover rate, ${k}_{cat}$, quantifies enzyme kinetics by indicating the maximum efficiency of enzyme catalysis. Despite its importance, ${k}_{cat}$ values remain scarce in databases for most organisms, primarily because of the cost of experimental measurements. To predict ${k}_{cat}$ and account for its strong temperature dependence, DLTKcat was developed in this study and demonstrated superior performance (log10-scale root mean squared error = 0.88, R-squared = 0.66) than previously published models. Through two case studies, DLTKcat showed its ability to predict the effects of protein sequence mutations and temperature changes on ${k}_{cat}$ values. Although its quantitative accuracy is not high enough yet to model the responses of cellular metabolism to temperature changes, DLTKcat has the potential to eventually become a computational tool to describe the temperature dependence of biological systems.
Collapse
Affiliation(s)
- Sizhe Qiu
- Department of Engineering Science, University of Oxford, OX1 3PJ, United Kingdom
| | - Simiao Zhao
- Radcliffe Department of Medicine, University of Oxford, OX3 9DU, United Kingdom
| | - Aidong Yang
- Department of Engineering Science, University of Oxford, OX1 3PJ, United Kingdom
| |
Collapse
|
17
|
Kim GB, Kim JY, Lee JA, Norsigian CJ, Palsson BO, Lee SY. Functional annotation of enzyme-encoding genes using deep learning with transformer layers. Nat Commun 2023; 14:7370. [PMID: 37963869 PMCID: PMC10645960 DOI: 10.1038/s41467-023-43216-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 11/03/2023] [Indexed: 11/16/2023] Open
Abstract
Functional annotation of open reading frames in microbial genomes remains substantially incomplete. Enzymes constitute the most prevalent functional gene class in microbial genomes and can be described by their specific catalytic functions using the Enzyme Commission (EC) number. Consequently, the ability to predict EC numbers could substantially reduce the number of un-annotated genes. Here we present a deep learning model, DeepECtransformer, which utilizes transformer layers as a neural network architecture to predict EC numbers. Using the extensively studied Escherichia coli K-12 MG1655 genome, DeepECtransformer predicted EC numbers for 464 un-annotated genes. We experimentally validated the enzymatic activities predicted for three proteins (YgfF, YciO, and YjdM). Further examination of the neural network's reasoning process revealed that the trained neural network relies on functional motifs of enzymes to predict EC numbers. Thus, DeepECtransformer is a method that facilitates the functional annotation of uncharacterized genes.
Collapse
Affiliation(s)
- Gi Bae Kim
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea
| | - Ji Yeon Kim
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea
| | - Jong An Lee
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea
| | - Charles J Norsigian
- Division of Biological Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Bernhard O Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, 92093, USA
- Novo Nordisk Foundation Center for Biosustainability, 2800, Kongens Lyngby, Denmark
| | - Sang Yup Lee
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea.
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea.
- BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
18
|
Wendering P, Nikoloski Z. Model-driven insights into the effects of temperature on metabolism. Biotechnol Adv 2023; 67:108203. [PMID: 37348662 DOI: 10.1016/j.biotechadv.2023.108203] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 05/22/2023] [Accepted: 06/18/2023] [Indexed: 06/24/2023]
Abstract
Temperature affects cellular processes at different spatiotemporal scales, and identifying the genetic and molecular mechanisms underlying temperature responses paves the way to develop approaches for mitigating the effects of future climate scenarios. A systems view of the effects of temperature on cellular physiology can be obtained by focusing on metabolism since: (i) its functions depend on transcription and translation and (ii) its outcomes support organisms' development, growth, and reproduction. Here we provide a systematic review of modelling efforts directed at investigating temperature effects on properties of single biochemical reactions, system-level traits, metabolic subsystems, and whole-cell metabolism across different prokaryotes and eukaryotes. We compare and contrast computational approaches and theories that facilitate modelling of temperature effects on key properties of enzymes and their consideration in constraint-based as well as kinetic models of metabolism. In addition, we provide a summary of insights from computational approaches, facilitating integration of omics data from temperature-modulated experiments with models of metabolic networks, and review the resulting biotechnological applications. Lastly, we provide a perspective on how different types of metabolic modelling can profit from developments in machine learning and models of different cellular layers to improve model-driven insights into the effects of temperature relevant for biotechnological applications.
Collapse
Affiliation(s)
- Philipp Wendering
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany; Systems Biology and Mathematical Modeling, Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam, Germany
| | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany; Systems Biology and Mathematical Modeling, Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam, Germany.
| |
Collapse
|
19
|
Zheng SY, Zhou WJ, Lin XN, Li FF, Xie CF, Liu DL, Yao DS. Increased yield of 2-O-α-d-glucopyranosyl-l-ascorbic acid synthesis by α-glucosidase using rational design that regulating the ground state of enzyme and substrate complex. Biotechnol J 2023; 18:e2300122. [PMID: 37288751 DOI: 10.1002/biot.202300122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 05/04/2023] [Accepted: 06/02/2023] [Indexed: 06/09/2023]
Abstract
BACKGROUND α-Glucosidase (AG) is a bifunctional enzyme, it has a capacity to synthesize 2-O-α-d-glucopyranosyl-l-ascorbic acid (AA-2G) from l-ascorbic acid (L-AA) and low-cost maltose under mild conditions, but it can also hydrolyze AA-2G, which leads to low synthesis efficiency of AA-2G. MAIN METHODS AND MAJOR RESULTS This study introduces a rational molecular design strategy to regulate enzymatic reactions based on inhibiting the formation of ground state of enzyme-substrate complex. Y215 was analyzed as the key amino acid site affecting the affinity of AG to AA-2G and L-AA. For the purpose of reducing the hydrolysis efficiency of AA-2G, the mutant Y215W was obtained by analyzing the molecular docking binding energy and hydrogen bond formation between AG and the substrates. Compared with the wild-type, isothermal titration calorimetry (ITC) results showed that the equilibrium dissociation constant (KD ) of the mutant for AA-2G was doubled; the Michaelis constant (Km ) for AA-2G was reduced by 1.15 times; and the yield of synthetic AA-2G was increased by 39%. CONCLUSIONS AND IMPLICATIONS Our work also provides a new reference strategy for the molecular modification of multifunctional enzymes and other enzymes in cascade reactions system.
Collapse
Affiliation(s)
- Shao-Yan Zheng
- Institute of Biomedicine, Jinan University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Bioengineering Medicine, Jinan University, Guangzhou City, Guangdong Province, China
- National Engineering Research Center of Genetic Medicine, Jinan University, Guangzhou City, Guangdong Province, China
| | - Wei-Jie Zhou
- Institute of Biomedicine, Jinan University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Bioengineering Medicine, Jinan University, Guangzhou City, Guangdong Province, China
- National Engineering Research Center of Genetic Medicine, Jinan University, Guangzhou City, Guangdong Province, China
| | - Xiang-Na Lin
- Institute of Biomedicine, Jinan University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Bioengineering Medicine, Jinan University, Guangzhou City, Guangdong Province, China
- National Engineering Research Center of Genetic Medicine, Jinan University, Guangzhou City, Guangdong Province, China
| | - Fei-Fei Li
- Department of Bioengineering, College of Life Science and Technology, Jinan University, Guangzhou City, Guangdong Province, China
| | - Chun-Fang Xie
- Institute of Biomedicine, Jinan University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Bioengineering Medicine, Jinan University, Guangzhou City, Guangdong Province, China
- National Engineering Research Center of Genetic Medicine, Jinan University, Guangzhou City, Guangdong Province, China
- Department of Bioengineering, College of Life Science and Technology, Jinan University, Guangzhou City, Guangdong Province, China
| | - Da-Ling Liu
- Department of Bioengineering, College of Life Science and Technology, Jinan University, Guangzhou City, Guangdong Province, China
| | - Dong-Sheng Yao
- Institute of Biomedicine, Jinan University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Bioengineering Medicine, Jinan University, Guangzhou City, Guangdong Province, China
- National Engineering Research Center of Genetic Medicine, Jinan University, Guangzhou City, Guangdong Province, China
| |
Collapse
|
20
|
de Atauri P, Foguet C, Cascante M. Control analysis in the identification of key enzymes driving metabolic adaptations: Towards drug target discovery. Biosystems 2023; 231:104984. [PMID: 37506820 DOI: 10.1016/j.biosystems.2023.104984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 07/18/2023] [Accepted: 07/25/2023] [Indexed: 07/30/2023]
Abstract
Metabolic Control Analysis (MCA) marked a turning point in understanding the design principles of metabolic network control by establishing control coefficients as a means to quantify the degree of control that an enzyme exerts on flux or metabolite concentrations. MCA has demonstrated that control of metabolic pathways is distributed among many enzymes rather than depending on a single rate-limiting step. MCA also proved that this distribution depends not only on the stoichiometric structure of the network but also on other kinetic determinants, such as the degree of saturation of the enzyme active site, the distance to thermodynamic equilibrium, and metabolite feedback regulatory loops. Consequently, predicting the alterations that occur during metabolic adaptation in response to strong changes involving a redistribution in such control distribution can be challenging. Here, using the framework provided by MCA, we illustrate how control distribution in a metabolic pathway/network depends on enzyme kinetic determinants and to what extent the redistribution of control affects our predictions on candidate enzymes suitable as targets for small molecule inhibition in the drug discovery process. Our results uncover that kinetic determinants can lead to unexpected control distribution and outcomes that cannot be predicted solely from stoichiometric determinants. We also unveil that the inference of key enzyme-drivers of an observed metabolic adaptation can be dramatically improved using mean control coefficients and ruling out those enzyme activities that are associated with low control coefficients. As the use of constraint-based stoichiometric genome-scale metabolic models (GSMMs) becomes increasingly prevalent for identifying genes/enzymes that could be potential drug targets, we anticipate that incorporating kinetic determinants and ruling out enzymes with low control coefficients into GSMM workflows will facilitate more accurate predictions and reveal novel therapeutic targets.
Collapse
Affiliation(s)
- Pedro de Atauri
- Department of Biochemistry and Molecular Biomedicine & Institute of Biomedicine of Universitat de Barcelona, Faculty of Biology, Universitat de Barcelona, Barcelona, 08028, Spain; Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Instituto de Salud Carlos III (ISCIII), Madrid, 28020, Spain.
| | - Carles Foguet
- British Heart Foundation Cardiovascular Epidemiology Unit and Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, CB2 0BD, United Kingdom
| | - Marta Cascante
- Department of Biochemistry and Molecular Biomedicine & Institute of Biomedicine of Universitat de Barcelona, Faculty of Biology, Universitat de Barcelona, Barcelona, 08028, Spain; Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Instituto de Salud Carlos III (ISCIII), Madrid, 28020, Spain.
| |
Collapse
|
21
|
Zhang Q, Zheng W, Song Z, Zhang Q, Yang L, Wu J, Lin J, Xu G, Yu H. Machine Learning Enables Prediction of Pyrrolysyl-tRNA Synthetase Substrate Specificity. ACS Synth Biol 2023; 12:2403-2417. [PMID: 37486975 DOI: 10.1021/acssynbio.3c00225] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]
Abstract
Knowledge about the substrate scope for a given enzyme is informative for elucidating biochemical pathways and also for expanding applications of the enzyme. However, no general methods are available to accurately predict the substrate specificity of an enzyme. Pyrrolysyl-tRNA synthetase (PylRS) is a powerful tool for incorporating various noncanonical amino acids (NCAAs) into proteins, which enabled us to probe, image, rationally engineer, and evolve protein structure and function. However, the incorporation of a new NCAA typically requires the selection of large libraries of PylRS with randomized mutations at active sites, and this process requires multiple rounds of selection for each new substrate. Therefore, a single aminoacyl-tRNA synthetase with broad substrate promiscuity is ideal to facilitate widespread applications of the genetic NCAA incorporation technique. Herein, machine learning models were developed to predict the substrate specificity of PylRS to accept novel NCAAs that could be incorporated into proteins by three PylRS mutants. The models were built from a training set of 285 unique enzyme-substrate pairs of three PylRS mutants including IFRS, BtaRS, and MFRS against 95 NCAAs. The best BaggingTree (BT) model was then used for virtually screening a NCAAs library containing 1474 phenylalanine, tyrosine, tryptophan, and alanine analogues, and 156 NCAAs were predicted to be accepted by at least one of the three PylRS mutants. Then, 27 NCAAs including 24 positive and 3 negative substrates were experimentally tested for their activities, and 20 of the 24 positive substrates showed weak or strong activity and were accepted by at least one PylRS mutant, among which 11 NCAAs were never reported to be incorporated into proteins before. Three negative substrates did not show any activity. Experimental results suggested that the BT model provides a three-class classification accuracy of 0.69 and a binary classification accuracy of 0.86. This study expanded the substrate scope of three PylRS variants and provided a framework for developing machine learning models to predict substrate specificity of other PylRS variants.
Collapse
Affiliation(s)
- Qunfeng Zhang
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Wenlong Zheng
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
| | - Zhongdi Song
- Key Laboratory of Pollution Exposure and Health Intervention of Zhejiang Province, Interdisciplinary Research Academy, Zhejiang Shuren University, Hangzhou 310015, China
| | - Qiang Zhang
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Lirong Yang
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
| | - Jianping Wu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
| | - Jianping Lin
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Gang Xu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Haoran Yu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
| |
Collapse
|
22
|
Kroll A, Rousset Y, Hu XP, Liebrand NA, Lercher MJ. Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning. Nat Commun 2023; 14:4139. [PMID: 37438349 DOI: 10.1038/s41467-023-39840-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 06/27/2023] [Indexed: 07/14/2023] Open
Abstract
The turnover number kcat, a measure of enzyme efficiency, is central to understanding cellular physiology and resource allocation. As experimental kcat estimates are unavailable for the vast majority of enzymatic reactions, the development of accurate computational prediction methods is highly desirable. However, existing machine learning models are limited to a single, well-studied organism, or they provide inaccurate predictions except for enzymes that are highly similar to proteins in the training set. Here, we present TurNuP, a general and organism-independent model that successfully predicts turnover numbers for natural reactions of wild-type enzymes. We constructed model inputs by representing complete chemical reactions through differential reaction fingerprints and by representing enzymes through a modified and re-trained Transformer Network model for protein sequences. TurNuP outperforms previous models and generalizes well even to enzymes that are not similar to proteins in the training set. Parameterizing metabolic models with TurNuP-predicted kcat values leads to improved proteome allocation predictions. To provide a powerful and convenient tool for the study of molecular biochemistry and physiology, we implemented a TurNuP web server.
Collapse
Affiliation(s)
- Alexander Kroll
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Yvan Rousset
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Xiao-Pan Hu
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Nina A Liebrand
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Martin J Lercher
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany.
| |
Collapse
|
23
|
Dourado H, Liebermeister W, Ebenhöh O, Lercher MJ. Mathematical properties of optimal fluxes in cellular reaction networks at balanced growth. PLoS Comput Biol 2023; 19:e1011156. [PMID: 37279246 DOI: 10.1371/journal.pcbi.1011156] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 05/04/2023] [Indexed: 06/08/2023] Open
Abstract
The physiology of biological cells evolved under physical and chemical constraints, such as mass conservation across the network of biochemical reactions, nonlinear reaction kinetics, and limits on cell density. For unicellular organisms, the fitness that governs this evolution is mainly determined by the balanced cellular growth rate. We previously introduced growth balance analysis (GBA) as a general framework to model and analyze such nonlinear systems, revealing important analytical properties of optimal balanced growth states. It has been shown that at optimality, only a minimal subset of reactions can have nonzero flux. However, no general principles have been established to determine if a specific reaction is active at optimality. Here, we extend the GBA framework to study the optimality of each biochemical reaction, and we identify the mathematical conditions determining whether a reaction is active or not at optimal growth in a given environment. We reformulate the mathematical problem in terms of a minimal number of dimensionless variables and use the Karush-Kuhn-Tucker (KKT) conditions to identify fundamental principles of optimal resource allocation in GBA models of any size and complexity. Our approach helps to identify from first principles the economic values of biochemical reactions, expressed as marginal changes in cellular growth rate; these economic values can be related to the costs and benefits of proteome allocation into the reactions' catalysts. Our formulation also generalizes the concepts of Metabolic Control Analysis to models of growing cells. We show how the extended GBA framework unifies and extends previous approaches of cellular modeling and analysis, putting forward a program to analyze cellular growth through the stationarity conditions of a Lagrangian function. GBA thereby provides a general theoretical toolbox for the study of fundamental mathematical properties of balanced cellular growth.
Collapse
Affiliation(s)
- Hugo Dourado
- Institute for Computer Science and Department of Biology, Heinrich-Heine Universität, Düsseldorf, Germany
| | | | - Oliver Ebenhöh
- Quantitative and Theoretical Biology, Heinrich-Heine Universität, Düsseldorf, Germany
| | - Martin J Lercher
- Institute for Computer Science and Department of Biology, Heinrich-Heine Universität, Düsseldorf, Germany
| |
Collapse
|
24
|
Kroll A, Ranjan S, Engqvist MKM, Lercher MJ. A general model to predict small molecule substrates of enzymes based on machine and deep learning. Nat Commun 2023; 14:2787. [PMID: 37188731 DOI: 10.1038/s41467-023-38347-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 04/21/2023] [Indexed: 05/17/2023] Open
Abstract
For most proteins annotated as enzymes, it is unknown which primary and/or secondary reactions they catalyze. Experimental characterizations of potential substrates are time-consuming and costly. Machine learning predictions could provide an efficient alternative, but are hampered by a lack of information regarding enzyme non-substrates, as available training data comprises mainly positive examples. Here, we present ESP, a general machine-learning model for the prediction of enzyme-substrate pairs with an accuracy of over 91% on independent and diverse test data. ESP can be applied successfully across widely different enzymes and a broad range of metabolites included in the training data, outperforming models designed for individual, well-studied enzyme families. ESP represents enzymes through a modified transformer model, and is trained on data augmented with randomly sampled small molecules assigned as non-substrates. By facilitating easy in silico testing of potential substrates, the ESP web server may support both basic and applied science.
Collapse
Affiliation(s)
- Alexander Kroll
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Sahasra Ranjan
- Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, 400076, India
| | - Martin K M Engqvist
- Department of Biology and Bioengineering, Chalmers University of Technology, SE-412 96, Gothenburg, Sweden
- EnginZyme AB, Tomtebodevägen 6, 17165, Stockholm, Sweden
| | - Martin J Lercher
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany.
| |
Collapse
|
25
|
Vasina M, Kovar D, Damborsky J, Ding Y, Yang T, deMello A, Mazurenko S, Stavrakis S, Prokop Z. In-depth analysis of biocatalysts by microfluidics: An emerging source of data for machine learning. Biotechnol Adv 2023; 66:108171. [PMID: 37150331 DOI: 10.1016/j.biotechadv.2023.108171] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 05/04/2023] [Accepted: 05/04/2023] [Indexed: 05/09/2023]
Abstract
Nowadays, the vastly increasing demand for novel biotechnological products is supported by the continuous development of biocatalytic applications which provide sustainable green alternatives to chemical processes. The success of a biocatalytic application is critically dependent on how quickly we can identify and characterize enzyme variants fitting the conditions of industrial processes. While miniaturization and parallelization have dramatically increased the throughput of next-generation sequencing systems, the subsequent characterization of the obtained candidates is still a limiting process in identifying the desired biocatalysts. Only a few commercial microfluidic systems for enzyme analysis are currently available, and the transformation of numerous published prototypes into commercial platforms is still to be streamlined. This review presents the state-of-the-art, recent trends, and perspectives in applying microfluidic tools in the functional and structural analysis of biocatalysts. We discuss the advantages and disadvantages of available technologies, their reproducibility and robustness, and readiness for routine laboratory use. We also highlight the unexplored potential of microfluidics to leverage the power of machine learning for biocatalyst development.
Collapse
Affiliation(s)
- Michal Vasina
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - David Kovar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - Yun Ding
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland
| | - Tianjin Yang
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland; Department of Biochemistry, University of Zurich, 8057 Zurich, Switzerland
| | - Andrew deMello
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic.
| | - Stavros Stavrakis
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland.
| | - Zbynek Prokop
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic.
| |
Collapse
|
26
|
Yu T, Boob AG, Volk MJ, Liu X, Cui H, Zhao H. Machine learning-enabled retrobiosynthesis of molecules. Nat Catal 2023. [DOI: 10.1038/s41929-022-00909-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
27
|
Jiang Y, Ran X, Yang ZJ. Data-driven enzyme engineering to identify function-enhancing enzymes. Protein Eng Des Sel 2023; 36:gzac009. [PMID: 36214500 PMCID: PMC10365845 DOI: 10.1093/protein/gzac009] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 08/08/2022] [Accepted: 09/28/2022] [Indexed: 01/22/2023] Open
Abstract
Identifying function-enhancing enzyme variants is a 'holy grail' challenge in protein science because it will allow researchers to expand the biocatalytic toolbox for late-stage functionalization of drug-like molecules, environmental degradation of plastics and other pollutants, and medical treatment of food allergies. Data-driven strategies, including statistical modeling, machine learning, and deep learning, have largely advanced the understanding of the sequence-structure-function relationships for enzymes. They have also enhanced the capability of predicting and designing new enzymes and enzyme variants for catalyzing the transformation of new-to-nature reactions. Here, we reviewed the recent progresses of data-driven models that were applied in identifying efficiency-enhancing mutants for catalytic reactions. We also discussed existing challenges and obstacles faced by the community. Although the review is by no means comprehensive, we hope that the discussion can inform the readers about the state-of-the-art in data-driven enzyme engineering, inspiring more joint experimental-computational efforts to develop and apply data-driven modeling to innovate biocatalysts for synthetic and pharmaceutical applications.
Collapse
Affiliation(s)
- Yaoyukun Jiang
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
| | - Xinchun Ran
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
| | - Zhongyue J Yang
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
- Vanderbilt Institute of Chemical Biology, Vanderbilt University, Nashville, TN 37235, USA
- Data Science Institute, Vanderbilt University, Nashville, TN 37235, USA
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, TN 37235, USA
| |
Collapse
|
28
|
Maeda K, Hatae A, Sakai Y, Boogerd FC, Kurata H. MLAGO: machine learning-aided global optimization for Michaelis constant estimation of kinetic modeling. BMC Bioinformatics 2022; 23:455. [PMID: 36319952 PMCID: PMC9624028 DOI: 10.1186/s12859-022-05009-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 10/26/2022] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Kinetic modeling is a powerful tool for understanding the dynamic behavior of biochemical systems. For kinetic modeling, determination of a number of kinetic parameters, such as the Michaelis constant (Km), is necessary, and global optimization algorithms have long been used for parameter estimation. However, the conventional global optimization approach has three problems: (i) It is computationally demanding. (ii) It often yields unrealistic parameter values because it simply seeks a better model fitting to experimentally observed behaviors. (iii) It has difficulty in identifying a unique solution because multiple parameter sets can allow a kinetic model to fit experimental data equally well (the non-identifiability problem). RESULTS To solve these problems, we propose the Machine Learning-Aided Global Optimization (MLAGO) method for Km estimation of kinetic modeling. First, we use a machine learning-based Km predictor based only on three factors: EC number, KEGG Compound ID, and Organism ID, then conduct a constrained global optimization-based parameter estimation by using the machine learning-predicted Km values as the reference values. The machine learning model achieved relatively good prediction scores: RMSE = 0.795 and R2 = 0.536, making the subsequent global optimization easy and practical. The MLAGO approach reduced the error between simulation and experimental data while keeping Km values close to the machine learning-predicted values. As a result, the MLAGO approach successfully estimated Km values with less computational cost than the conventional method. Moreover, the MLAGO approach uniquely estimated Km values, which were close to the measured values. CONCLUSIONS MLAGO overcomes the major problems in parameter estimation, accelerates kinetic modeling, and thus ultimately leads to better understanding of complex cellular systems. The web application for our machine learning-based Km predictor is accessible at https://sites.google.com/view/kazuhiro-maeda/software-tools-web-apps , which helps modelers perform MLAGO on their own parameter estimation tasks.
Collapse
Affiliation(s)
- Kazuhiro Maeda
- grid.258806.10000 0001 2110 1386Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502 Japan
| | - Aoi Hatae
- grid.258806.10000 0001 2110 1386Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502 Japan
| | - Yukie Sakai
- grid.258806.10000 0001 2110 1386Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502 Japan
| | - Fred C. Boogerd
- grid.12380.380000 0004 1754 9227Department of Molecular Cell Biology, Faculty of Science, VU University Amsterdam, O
- 2 Building, Amsterdam, The Netherlands
| | - Hiroyuki Kurata
- grid.258806.10000 0001 2110 1386Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502 Japan
| |
Collapse
|
29
|
Li F, Chen Y, Anton M, Nielsen J. GotEnzymes: an extensive database of enzyme parameter predictions. Nucleic Acids Res 2022; 51:D583-D586. [PMID: 36169223 PMCID: PMC9825421 DOI: 10.1093/nar/gkac831] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 09/08/2022] [Accepted: 09/26/2022] [Indexed: 01/29/2023] Open
Abstract
Enzyme parameters are essential for quantitatively understanding, modelling, and engineering cells. However, experimental measurements cover only a small fraction of known enzyme-compound pairs in model organisms, much less in other organisms. Artificial intelligence (AI) techniques have accelerated the pace of exploring enzyme properties by predicting these in a high-throughput manner. Here, we present GotEnzymes, an extensive database with enzyme parameter predictions by AI approaches, which is publicly available at https://metabolicatlas.org/gotenzymes for interactive web exploration and programmatic access. The first release of this data resource contains predicted turnover numbers of over 25.7 million enzyme-compound pairs across 8099 organisms. We believe that GotEnzymes, with the readily-predicted enzyme parameters, would bring a speed boost to biological research covering both experimental and computational fields that involve working with candidate enzymes.
Collapse
Affiliation(s)
- Feiran Li
- Correspondence may also be addressed to Feiran Li.
| | | | | | - Jens Nielsen
- To whom correspondence should be addressed. Tel: +46 31 772 3804;
| |
Collapse
|
30
|
Wilken SE, Besançon M, Kratochvíl M, Foko Kuate CA, Trefois C, Gu W, Ebenhöh O. Interrogating the effect of enzyme kinetics on metabolism using differentiable constraint-based models. Metab Eng 2022; 74:72-82. [PMID: 36152931 DOI: 10.1016/j.ymben.2022.09.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 09/08/2022] [Accepted: 09/10/2022] [Indexed: 10/31/2022]
Abstract
Metabolic models are typically characterized by a large number of parameters. Traditionally, metabolic control analysis is applied to differential equation-based models to investigate the sensitivity of predictions to parameters. A corresponding theory for constraint-based models is lacking, due to their formulation as optimization problems. Here, we show that optimal solutions of optimization problems can be efficiently differentiated using constrained optimization duality and implicit differentiation. We use this to calculate the sensitivities of predicted reaction fluxes and enzyme concentrations to turnover numbers in an enzyme-constrained metabolic model of Escherichia coli. The sensitivities quantitatively identify rate limiting enzymes and are mathematically precise, unlike current finite difference based approaches used for sensitivity analysis. Further, efficient differentiation of constraint-based models unlocks the ability to use gradient information for parameter estimation. We demonstrate this by improving, genome-wide, the state-of-the-art turnover number estimates for E. coli. Finally, we show that this technique can be generalized to arbitrarily complex models. By differentiating the optimal solution of a model incorporating both thermodynamic and kinetic rate equations, the effect of metabolite concentrations on biomass growth can be elucidated. We benchmark these metabolite sensitivities against a large experimental gene knockdown study, and find good alignment between the predicted sensitivities and in vivo metabolome changes. In sum, we demonstrate several applications of differentiating optimal solutions of constraint-based metabolic models, and show how it connects to classic metabolic control analysis.
Collapse
Affiliation(s)
- St Elmo Wilken
- Institute of Quantitative and Theoretical Biology, Heinrich-Heine-Universität Düsseldorf, Universitätsstraße 1, 40225, Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, Heinrich-Heine-Universität Düsseldorf, Universitätsstraße 1, 40225, Düsseldorf, Germany.
| | - Mathieu Besançon
- Department for AI in Society, Science, and Technology, Zuse Institute Berlin, Takustraße 7, 14195, Berlin, Germany
| | - Miroslav Kratochvíl
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, L-4367, Belvaux, Luxembourg
| | - Chilperic Armel Foko Kuate
- Institute of Quantitative and Theoretical Biology, Heinrich-Heine-Universität Düsseldorf, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Christophe Trefois
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, L-4367, Belvaux, Luxembourg
| | - Wei Gu
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, L-4367, Belvaux, Luxembourg
| | - Oliver Ebenhöh
- Institute of Quantitative and Theoretical Biology, Heinrich-Heine-Universität Düsseldorf, Universitätsstraße 1, 40225, Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, Heinrich-Heine-Universität Düsseldorf, Universitätsstraße 1, 40225, Düsseldorf, Germany
| |
Collapse
|
31
|
Yanai I, Lercher MJ. What puzzle are you in? Genome Biol 2022; 23:179. [PMID: 36008862 PMCID: PMC9404603 DOI: 10.1186/s13059-022-02748-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Itai Yanai
- Institute for Computational Medicine, NYU Langone Health, New York, NY, 10016, USA.
| | - Martin J Lercher
- Institute for Computer Science & Department of Biology, Heinrich Heine University, 40225, Düsseldorf, Germany.
| |
Collapse
|
32
|
Li F, Yuan L, Lu H, Li G, Chen Y, Engqvist MKM, Kerkhoven EJ, Nielsen J. Deep learning-based kcat prediction enables improved enzyme-constrained model reconstruction. Nat Catal 2022. [DOI: 10.1038/s41929-022-00798-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
AbstractEnzyme turnover numbers (kcat) are key to understanding cellular metabolism, proteome allocation and physiological diversity, but experimentally measured kcat data are sparse and noisy. Here we provide a deep learning approach (DLKcat) for high-throughput kcat prediction for metabolic enzymes from any organism merely from substrate structures and protein sequences. DLKcat can capture kcat changes for mutated enzymes and identify amino acid residues with a strong impact on kcat values. We applied this approach to predict genome-scale kcat values for more than 300 yeast species. Additionally, we designed a Bayesian pipeline to parameterize enzyme-constrained genome-scale metabolic models from predicted kcat values. The resulting models outperformed the corresponding original enzyme-constrained genome-scale metabolic models from previous pipelines in predicting phenotypes and proteomes, and enabled us to explain phenotypic differences. DLKcat and the enzyme-constrained genome-scale metabolic model construction pipeline are valuable tools to uncover global trends of enzyme kinetics and physiological diversity, and to further elucidate cellular metabolism on a large scale.
Collapse
|
33
|
Aledo JC. renz: An R package for the analysis of enzyme kinetic data. BMC Bioinformatics 2022; 23:182. [PMID: 35578161 PMCID: PMC9112463 DOI: 10.1186/s12859-022-04729-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 05/11/2022] [Indexed: 12/04/2022] Open
Abstract
Background Complex enzymatic models are required for analyzing kinetic data derived under conditions that may not satisfy the assumptions associated with Michaelis–Menten kinetics. To analyze these data, several software packages have been developed. However, the complexity introduced by these programs is often dispensable when analyzing data conforming to the canonical Michaelis–Menten model. In these cases, the sophisticated routines of these packages become inefficient and unnecessarily intricated for the intended purpose, reason for which most users resort to general-purpose graphing programs. However, this approach, in addition of being time-consuming, is prone to human error, and can lead to misleading estimates of kinetic parameters, particularly when unweighted regression analyses of transformed kinetic data are performed. Results To fill the existing gap between highly specialized and general-purpose software, we have developed an easy-to-use R package, renz, designed for accurate and efficient estimation of enzyme kinetic parameters. The package provides different methods that can be clustered into four categories, depending on whether they are based on data fitting to a single progress curve (evolution of substrate concentration over time) or, alternatively, based on the dependency of initial rates on substrate concentration (differential rate equation). A second criterion to be considered is whether the experimental data need to be manipulated to obtain linear functions or, alternatively, data are directly fitted using non-linear regression analysis. The current program is a cross-platform, free and open-source software that can be obtained from the CRAN repository. The package is accompanied by five vignettes, which are intended to guide users to choose the appropriate method in each case, as well as providing the basic theoretical foundations of each method. These vignettes use real experimental data to illustrate the use of the package utilities. Conclusions renz is a rigorous and yet easy-to-use software devoted to the analysis of kinetic data. This application has been designed to meet the needs of users who are not practicing enzymologists, but who need to accurately estimate the kinetic parameters of enzymes. The current software saves time and minimizes the risk of making mistakes or introducing biases due to uncorrected error propagation effects.
Collapse
Affiliation(s)
- Juan Carlos Aledo
- Departamento de Biología Molecular Y Bioquímica, Universidad de Málaga, 29071, Málaga, Spain.
| |
Collapse
|
34
|
Oliver SG. From Petri Plates to Petri Nets, a revolution in yeast biology. FEMS Yeast Res 2022; 22:6526310. [PMID: 35142857 PMCID: PMC8862034 DOI: 10.1093/femsyr/foac008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 01/26/2022] [Accepted: 02/07/2022] [Indexed: 11/22/2022] Open
Affiliation(s)
- Stephen G Oliver
- Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK
| |
Collapse
|
35
|
Abstract
Michaelis constants (Km) are essential to predict the catalytic rate of enzymes, but are not widely available. A new study in PLOS Biology uses artificial intelligence (AI) to accurately predict Km on a proteome-wide scale, paving the way for dynamic, genome-wide modeling of metabolism.
Collapse
Affiliation(s)
- Albert A. Antolin
- Department of Data Science, The Institute of Cancer Research, London, United Kingdom
- Division of Cancer Therapeutics, The Institute of Cancer Research, London, United Kingdom
| | - Marta Cascante
- Department of Biochemistry and Molecular Biomedicine & Institute of Biomedicine of Universitat de Barcelona, Faculty of Biology, Universitat de Barcelona, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD) and Metabolomics node at Spanish National Bioinformatics Institute (INB-ISCIII-ES-ELIXIR), Instituto de Salud Carlos III (ISCIII), Madrid, Spain
| |
Collapse
|