1
|
Kim J, Woo J, Park JY, Kim KJ, Kim D. Deep learning for NAD/NADP cofactor prediction and engineering using transformer attention analysis in enzymes. Metab Eng 2025; 87:86-94. [PMID: 39571721 DOI: 10.1016/j.ymben.2024.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 09/25/2024] [Accepted: 11/17/2024] [Indexed: 12/13/2024]
Abstract
Understanding and manipulating the cofactor preferences of NAD(P)-dependent oxidoreductases, the most widely distributed enzyme group in nature, is increasingly crucial in bioengineering. However, large-scale identification of the cofactor preferences and the design of mutants to switch cofactor specificity remain as complex tasks. Here, we introduce DISCODE (Deep learning-based Iterative pipeline to analyze Specificity of COfactors and to Design Enzyme), a novel transformer-based deep learning model to predict NAD(P) cofactor preferences. For model training, a total of 7,132 NAD(P)-dependent enzyme sequences were collected. Leveraging whole-length sequence information, DISCODE classifies the cofactor preferences of NAD(P)-dependent oxidoreductase protein sequences without structural or taxonomic limitation. The model showed 97.4% and 97.3% of accuracy and F1 score, respectively. A notable feature of DISCODE is the interpretability of its transformer layers. Analysis of attention layers in the model enables identification of several residues that showed significantly higher attention weights. They were well aligned with structurally important residues that closely interact with NAD(P), facilitating the identification of key residues for determining cofactor specificities. These key residues showed high consistency with verified cofactor switching mutants. Integrated into an enzyme design pipeline, DISCODE coupled with attention analysis, enables a fully automated approach to redesign cofactor specificity.
Collapse
Affiliation(s)
- Jaehyung Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea
| | - Jihoon Woo
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea
| | - Joon Young Park
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea
| | - Kyung-Jin Kim
- School of Life Sciences, BK21 FOUR KNU Creative BioResearch Group, KNU Institute of Microbiology, Kyungpook National University, Daegu, 41566, Republic of Korea
| | - Donghyuk Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea.
| |
Collapse
|
2
|
Lin YJ, Hsieh PH, Mao CC, Shih YH, Chen SH, Lin CY. Interpretation of machine learning-based prediction models and functional metagenomic approach to identify critical genes in HBCD degradation. JOURNAL OF HAZARDOUS MATERIALS 2024; 486:136976. [PMID: 39740553 DOI: 10.1016/j.jhazmat.2024.136976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 11/01/2024] [Accepted: 12/22/2024] [Indexed: 01/02/2025]
Abstract
Hexabromocyclododecane (HBCD) poses significant environmental risks, and identifying HBCD-degrading microbes and their enzymatic mechanisms is challenging due to the complexity of microbial interactions and metabolic pathways. This study aimed to identify critical genes involved in HBCD biodegradation through two approaches: functional annotation of metagenomes and the interpretation of machine learning-based prediction models. Our functional analysis revealed a rich metabolic potential in Chiang Chun soil (CCS) metagenomes, particularly in carbohydrate metabolism. Among the machine learning algorithms tested, random forest models outperformed others, especially when trained on datasets reflecting the degradation patterns of species like Dehalococcoides mccartyi and Pseudomonas aeruginosa. These models highlighted enzymes such as EC 1.8.3.2 (thiol oxidase) and EC 4.1.1.43 (phenylpyruvate decarboxylase) as inhibitors of degradation, while EC 2.7.1.83 (pseudouridine kinase) was linked to enhanced degradation. This dual-methodology approach not only deepens our understanding of microbial functions in HBCD degradation but also provides an unbiased view of the microbial and enzymatic interactions involved, offering a more targeted and effective bioremediation strategy.
Collapse
Affiliation(s)
- Yu-Jie Lin
- Institute of Information Science, Academia Sinica, No. 128, Section 2, Academia Road, Nankang, Taipei 11529, Taiwan
| | - Ping-Heng Hsieh
- Institute of Information Science, Academia Sinica, No. 128, Section 2, Academia Road, Nankang, Taipei 11529, Taiwan
| | - Chun-Chia Mao
- Institute of Information Science, Academia Sinica, No. 128, Section 2, Academia Road, Nankang, Taipei 11529, Taiwan
| | - Yang-Hsin Shih
- Department of Agricultural Chemistry, National Taiwan University, No. 1, Section 4, Roosevelt Rd., Taipei 10617, Taiwan
| | - Shu-Hwa Chen
- TMU Research Center of Cancer Translational Medicine, Taipei Medical University, No. 250, Wuxing St., Taipei 11031, Taiwan
| | - Chung-Yen Lin
- Institute of Information Science, Academia Sinica, No. 128, Section 2, Academia Road, Nankang, Taipei 11529, Taiwan; Institute of Fisheries Science, National Taiwan University, No. 1, Section 4, Roosevelt Rd., Taipei 10617, Taiwan.
| |
Collapse
|
3
|
Yang Y, Jerger A, Feng S, Wang Z, Brasfield C, Cheung MS, Zucker J, Guan Q. Improved enzyme functional annotation prediction using contrastive learning with structural inference. Commun Biol 2024; 7:1690. [PMID: 39715863 DOI: 10.1038/s42003-024-07359-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 12/03/2024] [Indexed: 12/25/2024] Open
Abstract
Recent years have witnessed the remarkable progress of deep learning within the realm of scientific disciplines, yielding a wealth of promising outcomes. A prominent challenge within this domain has been the task of predicting enzyme function, a complex problem that has seen the development of numerous computational methods, particularly those rooted in deep learning techniques. However, the majority of these methods have primarily focused on either amino acid sequence data or protein structure data, neglecting the potential synergy of combining both modalities. To address this gap, we propose a Contrastive Learning framework for Enzyme functional ANnotation prediction combined with protein amino acid sequences and Contact maps (CLEAN-Contact). We rigorously evaluate the performance of our CLEAN-Contact framework against the state-of-the-art enzyme function prediction models using multiple benchmark datasets. Using CLEAN-Contact, we predict previously unknown enzyme functions within the proteome of Prochlorococcus marinus MED4. Our findings convincingly demonstrate the substantial superiority of our CLEAN-Contact framework, marking a significant step forward in enzyme function prediction accuracy.
Collapse
Affiliation(s)
- Yuxin Yang
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Ave, Cleveland, OH, 44195, USA
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Ave, Cleveland, OH, 44195, USA
- Department of Computer Science, Kent State University, 800 E Summit St, Kent, OH, 44242, USA
| | - Abby Jerger
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, 1100 Dexter Ave N, Seattle, WA, 98109, USA
| | - Song Feng
- Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99354, USA
| | - Zixu Wang
- Department of Computer Science, Kent State University, 800 E Summit St, Kent, OH, 44242, USA
| | - Christina Brasfield
- Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99354, USA
| | - Margaret S Cheung
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, 1100 Dexter Ave N, Seattle, WA, 98109, USA
| | - Jeremy Zucker
- Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99354, USA.
| | - Qiang Guan
- Department of Computer Science, Kent State University, 800 E Summit St, Kent, OH, 44242, USA.
| |
Collapse
|
4
|
Sunil RS, Lim SC, Itharajula M, Mutwil M. The gene function prediction challenge: Large language models and knowledge graphs to the rescue. CURRENT OPINION IN PLANT BIOLOGY 2024; 82:102665. [PMID: 39579414 DOI: 10.1016/j.pbi.2024.102665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 10/23/2024] [Accepted: 10/24/2024] [Indexed: 11/25/2024]
Abstract
Elucidating gene function is one of the ultimate goals of plant science. Despite this, only ∼15 % of all genes in the model plant Arabidopsis thaliana have comprehensively experimentally verified functions. While bioinformatical gene function prediction approaches can guide biologists in their experimental efforts, neither the performance of the gene function prediction methods nor the number of experimental characterization of genes has increased dramatically in recent years. In this review, we will discuss the status quo and the trajectory of gene function elucidation and outline the recent advances in gene function prediction approaches. We will then discuss how recent artificial intelligence advances in large language models and knowledge graphs can be leveraged to accelerate gene function predictions and keep us updated with scientific literature.
Collapse
Affiliation(s)
- Rohan Shawn Sunil
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Shan Chun Lim
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Manoj Itharajula
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore.
| |
Collapse
|
5
|
Kumar K, Pazare M, Ratnaparkhi GS, Kamat SS. CG17192 is a Phospholipase That Regulates Signaling Lipids in the Drosophila Gut upon Infection. Biochemistry 2024; 63:3000-3010. [PMID: 39442931 DOI: 10.1021/acs.biochem.4c00579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2024]
Abstract
The chemoproteomics technique, activity-based protein profiling (ABPP), has proven to be an invaluable tool in assigning functions to enzymes. The serine hydrolase (SH) enzyme superfamily, in particular, has served as an excellent example in displaying the versatility of various ABPP platforms and has resulted in a comprehensive cataloging of the biochemical activities associated within this superfamily. Besides SHs, in mammals, several other enzyme classes have been thoroughly investigated using ABPP platforms. However, the utility of ABPP platforms in fly models remains underexplored. Realizing this knowledge gap, leveraging complementary ABPP platforms, we reported the full array of SH activities during various developmental stages and adult tissues in the fruit fly (Drosophila melanogaster). Following up on this study, using ABPP, we mapped SH activities in adult fruit flies in an infection model and found that a gut-resident lipase CG17192 showed increased activity during infection. To assign a biological function to this uncharacterized lipase, we performed an untargeted lipidomics analysis and found that phosphatidylinositols were significantly elevated when CG17192 was depleted in the adult fruit fly gut. Next, we overexpressed this lipase in insect cells, and using biochemical assays, we show that CG17192 is a secreted enzyme that has phospholipase C (PLC) type activity, with phosphatidylinositol being a preferred substrate. Finally, we show during infection that heightened CG17192 regulates phosphatidylinositol levels and, by doing so, likely modulates signaling pathways in the adult fruit fly gut that might be involved in the resolution of this pathophysiological condition.
Collapse
Affiliation(s)
- Kundan Kumar
- Department of Biology, Indian Institute of Science Education and Research, Dr. Homi Bhabha Road, Pashan, Pune 411008, Maharashtra, India
| | - Mrunal Pazare
- Department of Biology, Indian Institute of Science Education and Research, Dr. Homi Bhabha Road, Pashan, Pune 411008, Maharashtra, India
| | - Girish S Ratnaparkhi
- Department of Biology, Indian Institute of Science Education and Research, Dr. Homi Bhabha Road, Pashan, Pune 411008, Maharashtra, India
| | - Siddhesh S Kamat
- Department of Biology, Indian Institute of Science Education and Research, Dr. Homi Bhabha Road, Pashan, Pune 411008, Maharashtra, India
| |
Collapse
|
6
|
Kumar R, Tambrini SJ, Jiang G. NAD(P)-Dependent Glucose Dehydrogenases: Underestimated Multifunctional Biocatalysts. Chembiochem 2024:e202400716. [PMID: 39531513 DOI: 10.1002/cbic.202400716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Revised: 11/11/2024] [Accepted: 11/12/2024] [Indexed: 11/16/2024]
Abstract
The last decade has witnessed tremendous progress in the field of biocatalysis. One of the most frequently utilized enzymes in diverse biocatalytic applications is NAD(P)-dependent glucose dehydrogenases (GDHs). Traditionally, these enzymes are employed for their role in regenerating NAD(P)H in various enzymatic reactions utilizing glucose. However, recent studies have expanded the scope of GDHs beyond cofactor regeneration, highlighting their potential as biocatalysts in diverse chemical transformations. GDHs have demonstrated versatility in catalyzing key reactions in the synthesis of various drug molecules and intermediates, including ketone reduction to produce alcohols, imine reduction of C=N bonds to yield amines, reduction of aldehydes to alcohols, and dehydrogenation of cyclohexanol derivatives. This review highlights recent advancements in elucidating the multifunctional roles of NAD(P)-dependent glucose dehydrogenases (GDHs) in biocatalysis, with an emphasis on their growing applications and significant potential in small molecule synthesis.
Collapse
Affiliation(s)
- Rohit Kumar
- Department of Pharmaceutical Sciences, Eugene Applebaum College of Pharmacy and Health Sciences, Wayne State University, Detroit, MI, 48201, USA
| | - Samantha J Tambrini
- Department of Pharmaceutical Sciences, Eugene Applebaum College of Pharmacy and Health Sciences, Wayne State University, Detroit, MI, 48201, USA
| | - Guangde Jiang
- Department of Pharmaceutical Sciences, Eugene Applebaum College of Pharmacy and Health Sciences, Wayne State University, Detroit, MI, 48201, USA
| |
Collapse
|
7
|
de Crécy-Lagard V, Dias R, Friedberg I, Yuan Y, Swairjo MA. Limitations of Current Machine-Learning Models in Predicting Enzymatic Functions for Uncharacterized Proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.01.601547. [PMID: 39005379 PMCID: PMC11244979 DOI: 10.1101/2024.07.01.601547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Thirty to seventy percent of proteins in any given genome have no assigned function and have been labeled as the protein "unknome". This large knowledge gap prevents the biological community from fully leveraging the plethora of genomic data that is now available. Machine-learning approaches are showing some promise in propagating functional knowledge from experimentally characterized proteins to the correct set of isofunctional orthologs. However, they largely fail to predict enzymatic functions unseen in the training set, as shown by dissecting the predictions made for over 450 enzymes of unknown function from the model bacteria Escherichia coli uxgsing the DeepECTransformer platform. Lessons from these failures can help the community develop machine-learning methods that assist domain experts in making testable functional predictions for more members of the uncharacterized proteome. Article Summary Many proteins in any genome, ranging from 30 to 70%, lack an assigned function. This knowledge gap limits the full use of the vast available genomic data. Machine learning has shown promise in transferring functional knowledge from proteins of known functions to similar ones, but largely fails to predict novel functions not seen in its training data. Understanding these failures can guide the development of better machine-learning methods to help experts make accurate functional predictions for uncharacterized proteins.
Collapse
|
8
|
Gill D, Dib MJ, Cronjé HT, Karhunen V, Woolf B, Gagnon E, Daghlas I, Nyberg M, Drakeman D, Burgess S. Common pitfalls in drug target Mendelian randomization and how to avoid them. BMC Med 2024; 22:473. [PMID: 39407214 PMCID: PMC11481744 DOI: 10.1186/s12916-024-03700-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 10/10/2024] [Indexed: 10/19/2024] Open
Abstract
BACKGROUND Drug target Mendelian randomization describes the use of genetic variants as instrumental variables for studying the effects of pharmacological agents. The paradigm can be used to inform on all aspects of drug development and has become increasingly popular over the last decade, particularly given the time- and cost-efficiency with which it can be performed even before commencing clinical studies. MAIN BODY In this review, we describe the recent emergence of drug target Mendelian randomization, its common pitfalls, how best to address them, as well as potential future directions. Throughout, we offer advice based on our experiences on how to approach these types of studies, which we hope will be useful for both practitioners and those translating the findings from such work. CONCLUSIONS Drug target Mendelian randomization is nuanced and requires a combination of biological, statistical, genetic, epidemiological, clinical, and pharmaceutical expertise to be utilized to its full potential. Unfortunately, these skillsets are relatively infrequently combined in any given study.
Collapse
Affiliation(s)
- Dipender Gill
- Sequoia Genetics, London, UK.
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, 90 Wood Lane, London, W12 0BZ, UK.
| | - Marie-Joe Dib
- Cardiovascular Division, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Héléne T Cronjé
- Sequoia Genetics, London, UK
- Medical Research Council Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Ville Karhunen
- Sequoia Genetics, London, UK
- Medical Research Council Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Benjamin Woolf
- Medical Research Council Biostatistics Unit, University of Cambridge, Cambridge, UK
- School of Psychological Science, University of Bristol, Bristol, UK
- Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Eloi Gagnon
- Centre de recherche de l'Institut universitaire de cardiologie et de pneumologie de Québec, Laval University, Québec, Canada
| | - Iyas Daghlas
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Michael Nyberg
- Cardiovascular Biology, Global Drug Discovery, Novo Nordisk A/S, Maaloev, Denmark
| | - Donald Drakeman
- University of Cambridge Centre for Health Leadership & Enterprise, Judge Business School, Trumpington Street, Cambridge, UK
- Advent Venture Partners, London, UK
| | - Stephen Burgess
- Sequoia Genetics, London, UK
- Medical Research Council Biostatistics Unit, University of Cambridge, Cambridge, UK
- Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| |
Collapse
|
9
|
Zou X, Mo Z, Wang L, Chen S, Lee SY. Overcoming Bacteriophage Contamination in Bioprocessing: Strategies and Applications. SMALL METHODS 2024:e2400932. [PMID: 39359025 DOI: 10.1002/smtd.202400932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Revised: 09/14/2024] [Indexed: 10/04/2024]
Abstract
Bacteriophage contamination has a devastating impact on the viability of bacterial hosts and can significantly reduce the productivity of bioprocesses in biotechnological industries. The consequences range from widespread fermentation failure to substantial economic losses, highlighting the urgent need for effective countermeasures. Conventional prevention methods, which focus primarily on the physical removal of bacteriophages from equipment, bioprocess units, and the environment, have proven ineffective in preventing phage entry and contamination. The coevolutionary dynamics between phages and their bacterial hosts have spurred the development of a diverse repertoire of antiviral defense mechanisms within microbial communities. These naturally occurring defense strategies can be harnessed through genetic engineering to convert phage-sensitive hosts into robust, phage-resistant cell factories, providing a strategic approach to mitigate the threats posed by bacteriophages to industrial bacterial processes. In this review, an overview of the various defense strategies and immune systems that curb the propagation of bacteriophages and highlight their applications in fermentation bioprocesses to combat phage contamination is provided. Additionally, the tactics employed by phages to circumvent these defense strategies are also discussed, as preventing the emergence of phage escape mutants is a key component of effective contamination management.
Collapse
Affiliation(s)
- Xuan Zou
- Intensive Care Unit, Shenzhen Key Laboratory of Microbiology in Genomic Modification & Editing and Application, Shenzhen Institute of Translational Medicine, Medical Innovation Technology Transformation Center of Shenzhen Second People's Hospital, Shenzhen Univeristy Medical School, The First Affiliated Hospital of Shenzhen University, Shenzhen, Guangdong, 518035, China
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Synthetic Biology Research Center, Shenzhen University, Shenzhen, Guangdong, 518035, China
| | - Ziran Mo
- Department of Respiratory Diseases, Institute of Pediatrics, Shenzhen Children's Hospital, Shenzhen, Guangdong, 518026, China
- Department of Gastroenterology, Hubei Clinical Center and Key Laboratory of Intestinal and Colorectal Disease, Ministry of Education Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Taikang Center for Life and Medical Sciences, Zhongnan Hospital of Wuhan University, School of Pharmaceutical Sciences, Wuhan University, Wuhan, 430071, China
| | - Lianrong Wang
- Department of Respiratory Diseases, Institute of Pediatrics, Shenzhen Children's Hospital, Shenzhen, Guangdong, 518026, China
- Department of Gastroenterology, Hubei Clinical Center and Key Laboratory of Intestinal and Colorectal Disease, Ministry of Education Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Taikang Center for Life and Medical Sciences, Zhongnan Hospital of Wuhan University, School of Pharmaceutical Sciences, Wuhan University, Wuhan, 430071, China
| | - Shi Chen
- Intensive Care Unit, Shenzhen Key Laboratory of Microbiology in Genomic Modification & Editing and Application, Shenzhen Institute of Translational Medicine, Medical Innovation Technology Transformation Center of Shenzhen Second People's Hospital, Shenzhen Univeristy Medical School, The First Affiliated Hospital of Shenzhen University, Shenzhen, Guangdong, 518035, China
- Synthetic Biology Research Center, Shenzhen University, Shenzhen, Guangdong, 518035, China
- Department of Gastroenterology, Hubei Clinical Center and Key Laboratory of Intestinal and Colorectal Disease, Ministry of Education Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Taikang Center for Life and Medical Sciences, Zhongnan Hospital of Wuhan University, School of Pharmaceutical Sciences, Wuhan University, Wuhan, 430071, China
| | - Sang Yup Lee
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea
- BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon, 34141, Republic of Korea
- Graduate School of Engineering Biology, KAIST, Daejeon, 34141, Republic of Korea
| |
Collapse
|
10
|
Mi J, Wang H, Li J, Sun J, Li C, Wan J, Zeng Y, Gao J. GGN-GO: geometric graph networks for predicting protein function by multi-scale structure features. Brief Bioinform 2024; 25:bbae559. [PMID: 39487084 PMCID: PMC11530295 DOI: 10.1093/bib/bbae559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 10/03/2024] [Accepted: 10/17/2024] [Indexed: 11/04/2024] Open
Abstract
Recent advances in high-throughput sequencing have led to an explosion of genomic and transcriptomic data, offering a wealth of protein sequence information. However, the functions of most proteins remain unannotated. Traditional experimental methods for annotation of protein functions are costly and time-consuming. Current deep learning methods typically rely on Graph Convolutional Networks to propagate features between protein residues. However, these methods fail to capture fine atomic-level geometric structural features and cannot directly compute or propagate structural features (such as distances, directions, and angles) when transmitting features, often simplifying them to scalars. Additionally, difficulties in capturing long-range dependencies limit the model's ability to identify key nodes (residues). To address these challenges, we propose a geometric graph network (GGN-GO) for predicting protein function that enriches feature extraction by capturing multi-scale geometric structural features at the atomic and residue levels. We use a geometric vector perceptron to convert these features into vector representations and aggregate them with node features for better understanding and propagation in the network. Moreover, we introduce a graph attention pooling layer captures key node information by adaptively aggregating local functional motifs, while contrastive learning enhances graph representation discriminability through random noise and different views. The experimental results show that GGN-GO outperforms six comparative methods in tasks with the most labels for both experimentally validated and predicted protein structures. Furthermore, GGN-GO identifies functional residues corresponding to those experimentally confirmed, showcasing its interpretability and the ability to pinpoint key protein regions. The code and data are available at: https://github.com/MiJia-ID/GGN-GO.
Collapse
Affiliation(s)
- Jia Mi
- The College of Information Science and Technology, Beijing University of Chemical Technology, Beijing
| | - Han Wang
- The College of Information Science and Technology, Beijing University of Chemical Technology, Beijing
| | - Jing Li
- The College of Life Science and Technology, Beijing University of Chemical Technology, Beijing
| | - Jinghong Sun
- The College of Information Science and Technology, Beijing University of Chemical Technology, Beijing
| | - Chang Li
- The College of Information Science and Technology, Beijing University of Chemical Technology, Beijing
| | - Jing Wan
- The College of Information Science and Technology, Beijing University of Chemical Technology, Beijing
| | - Yuan Zeng
- Microbial Resource and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences
- Chinese National Microbiology Data Center (NMDC)
| | - Jingyang Gao
- The College of Information Science and Technology, Beijing University of Chemical Technology, Beijing
| |
Collapse
|
11
|
Lu H, Xiao L, Liao W, Yan X, Nielsen J. Cell factory design with advanced metabolic modelling empowered by artificial intelligence. Metab Eng 2024; 85:61-72. [PMID: 39038602 DOI: 10.1016/j.ymben.2024.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/06/2024] [Accepted: 07/06/2024] [Indexed: 07/24/2024]
Abstract
Advances in synthetic biology and artificial intelligence (AI) have provided new opportunities for modern biotechnology. High-performance cell factories, the backbone of industrial biotechnology, are ultimately responsible for determining whether a bio-based product succeeds or fails in the fierce competition with petroleum-based products. To date, one of the greatest challenges in synthetic biology is the creation of high-performance cell factories in a consistent and efficient manner. As so-called white-box models, numerous metabolic network models have been developed and used in computational strain design. Moreover, great progress has been made in AI-powered strain engineering in recent years. Both approaches have advantages and disadvantages. Therefore, the deep integration of AI with metabolic models is crucial for the construction of superior cell factories with higher titres, yields and production rates. The detailed applications of the latest advanced metabolic models and AI in computational strain design are summarized in this review. Additionally, approaches for the deep integration of AI and metabolic models are discussed. It is anticipated that advanced mechanistic metabolic models powered by AI will pave the way for the efficient construction of powerful industrial chassis strains in the coming years.
Collapse
Affiliation(s)
- Hongzhong Lu
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China.
| | - Luchi Xiao
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China
| | - Wenbin Liao
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China; Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, 200237, PR China
| | - Xuefeng Yan
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, 200237, PR China
| | - Jens Nielsen
- BioInnovation Institute, Ole Måløes Vej, DK2200, Copenhagen N, Denmark; Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden.
| |
Collapse
|
12
|
Federico CA, Trotsyuk AA. Biomedical Data Science, Artificial Intelligence, and Ethics: Navigating Challenges in the Face of Explosive Growth. Annu Rev Biomed Data Sci 2024; 7:1-14. [PMID: 38598860 DOI: 10.1146/annurev-biodatasci-102623-104553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Advances in biomedical data science and artificial intelligence (AI) are profoundly changing the landscape of healthcare. This article reviews the ethical issues that arise with the development of AI technologies, including threats to privacy, data security, consent, and justice, as they relate to donors of tissue and data. It also considers broader societal obligations, including the importance of assessing the unintended consequences of AI research in biomedicine. In addition, this article highlights the challenge of rapid AI development against the backdrop of disparate regulatory frameworks, calling for a global approach to address concerns around data misuse, unintended surveillance, and the equitable distribution of AI's benefits and burdens. Finally, a number of potential solutions to these ethical quandaries are offered. Namely, the merits of advocating for a collaborative, informed, and flexible regulatory approach that balances innovation with individual rights and public welfare, fostering a trustworthy AI-driven healthcare ecosystem, are discussed.
Collapse
Affiliation(s)
- Carole A Federico
- Center for Biomedical Ethics, Stanford University School of Medicine, Stanford, California, USA; ,
| | - Artem A Trotsyuk
- Center for Biomedical Ethics, Stanford University School of Medicine, Stanford, California, USA; ,
| |
Collapse
|
13
|
Swamidatta SH, Lichman BR. Beyond co-expression: pathway discovery for plant pharmaceuticals. Curr Opin Biotechnol 2024; 88:103147. [PMID: 38833915 DOI: 10.1016/j.copbio.2024.103147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 05/07/2024] [Accepted: 05/09/2024] [Indexed: 06/06/2024]
Abstract
Plant natural products have been an important source of medicinal molecules since ancient times. To gain access to the whole diversity of these molecules for pharmaceutical applications, it is important to understand their biosynthetic origins. Whilst co-expression is a reliable tool for identifying gene candidates, a variety of complementary methods can aid in screening or refining candidate selection. Here, we review recently employed plant biosynthetic pathway discovery approaches, and highlight future directions in the field.
Collapse
Affiliation(s)
- Sandesh H Swamidatta
- Centre for Novel Agricultural Products, Department of Biology, University of York, York YO10 5DD, UK
| | - Benjamin R Lichman
- Centre for Novel Agricultural Products, Department of Biology, University of York, York YO10 5DD, UK.
| |
Collapse
|
14
|
Song Y, Prather KLJ. Strategies in engineering sustainable biochemical synthesis through microbial systems. Curr Opin Chem Biol 2024; 81:102493. [PMID: 38971129 DOI: 10.1016/j.cbpa.2024.102493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 05/24/2024] [Accepted: 06/05/2024] [Indexed: 07/08/2024]
Abstract
Growing environmental concerns and the urgency to address climate change have increased demand for the development of sustainable alternatives to fossil-derived fuels and chemicals. Microbial systems, possessing inherent biosynthetic capabilities, present a promising approach for achieving this goal. This review discusses the coupling of systems and synthetic biology to enable the elucidation and manipulation of microbial phenotypes for the production of chemicals that can substitute for petroleum-derived counterparts and contribute to advancing green biotechnology. The integration of artificial intelligence with metabolic engineering to facilitate precise and data-driven design of biosynthetic pathways is also discussed, along with the identification of current limitations and proposition of strategies for optimizing biosystems, thereby propelling the field of chemical biology towards sustainable chemical production.
Collapse
Affiliation(s)
- Yoseb Song
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Kristala L J Prather
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
15
|
Gou Y, Li D, Zhao M, Li M, Zhang J, Zhou Y, Xiao F, Liu G, Ding H, Sun C, Ye C, Dong C, Gao J, Gao D, Bao Z, Huang L, Xu Z, Lian J. Intein-mediated temperature control for complete biosynthesis of sanguinarine and its halogenated derivatives in yeast. Nat Commun 2024; 15:5238. [PMID: 38898098 PMCID: PMC11186835 DOI: 10.1038/s41467-024-49554-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 06/10/2024] [Indexed: 06/21/2024] Open
Abstract
While sanguinarine has gained recognition for antimicrobial and antineoplastic activities, its complex conjugated structure and low abundance in plants impede broad applications. Here, we demonstrate the complete biosynthesis of sanguinarine and halogenated derivatives using highly engineered yeast strains. To overcome sanguinarine cytotoxicity, we establish a splicing intein-mediated temperature-responsive gene expression system (SIMTeGES), a simple strategy that decouples cell growth from product synthesis without sacrificing protein activity. To debottleneck sanguinarine biosynthesis, we identify two reticuline oxidases and facilitated functional expression of flavoproteins and cytochrome P450 enzymes via protein molecular engineering. After comprehensive metabolic engineering, we report the production of sanguinarine at a titer of 448.64 mg L-1. Additionally, our engineered strain enables the biosynthesis of fluorinated sanguinarine, showcasing the biotransformation of halogenated derivatives through more than 15 biocatalytic steps. This work serves as a blueprint for utilizing yeast as a scalable platform for biomanufacturing diverse benzylisoquinoline alkaloids and derivatives.
Collapse
Affiliation(s)
- Yuanwei Gou
- Key Laboratory of Biomass Chemical Engineering of Ministry of Education & National Key Laboratory of Biobased Transportation Fuel Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China
| | - Dongfang Li
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China
| | - Minghui Zhao
- Key Laboratory of Biomass Chemical Engineering of Ministry of Education & National Key Laboratory of Biobased Transportation Fuel Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China
| | - Mengxin Li
- Key Laboratory of Biomass Chemical Engineering of Ministry of Education & National Key Laboratory of Biobased Transportation Fuel Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, China
| | - Jiaojiao Zhang
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China
| | - Yilian Zhou
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China
| | - Feng Xiao
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China
| | - Gaofei Liu
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China
| | - Haote Ding
- Key Laboratory of Biomass Chemical Engineering of Ministry of Education & National Key Laboratory of Biobased Transportation Fuel Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China
| | - Chenfan Sun
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China
| | - Cuifang Ye
- Key Laboratory of Biomass Chemical Engineering of Ministry of Education & National Key Laboratory of Biobased Transportation Fuel Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, China
| | - Chang Dong
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China
| | - Jucan Gao
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China
| | - Di Gao
- Key Laboratory of Biomass Chemical Engineering of Ministry of Education & National Key Laboratory of Biobased Transportation Fuel Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, China
| | - Zehua Bao
- Key Laboratory of Biomass Chemical Engineering of Ministry of Education & National Key Laboratory of Biobased Transportation Fuel Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China
| | - Lei Huang
- Key Laboratory of Biomass Chemical Engineering of Ministry of Education & National Key Laboratory of Biobased Transportation Fuel Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China
| | - Zhinan Xu
- Key Laboratory of Biomass Chemical Engineering of Ministry of Education & National Key Laboratory of Biobased Transportation Fuel Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, China
| | - Jiazhang Lian
- Key Laboratory of Biomass Chemical Engineering of Ministry of Education & National Key Laboratory of Biobased Transportation Fuel Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, China.
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China.
| |
Collapse
|
16
|
Liu Z, Zhou Y, Wang H, Liu C, Wang L. Recent advances in understanding the fitness and survival mechanisms of Vibrio parahaemolyticus. Int J Food Microbiol 2024; 417:110691. [PMID: 38631283 DOI: 10.1016/j.ijfoodmicro.2024.110691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 03/14/2024] [Accepted: 04/02/2024] [Indexed: 04/19/2024]
Abstract
The presence of Vibrio parahaemolyticus (Vp) in different production stages of seafood has generated negative impacts on both public health and the sustainability of the industry. To further better investigate the fitness of Vp at the phenotypical level, a great number of studies have been conducted in recent years using plate counting methods. In the meantime, with the increasing accessibility of the next generation sequencing and the advances in analytical chemistry techniques, omics-oriented biotechnologies have further advanced our knowledge in the survival and virulence mechanisms of Vp at various molecular levels. These observations provide insights to guide the development of novel prevention and control strategies and benefit the monitoring and mitigation of food safety risks associated with Vp contamination. To timely capture these recent advances, this review firstly summarizes the most recent phenotypical level studies and provide insights about the survival of Vp under important in vitro stresses and on aquatic products. After that, molecular survival mechanisms of Vp at transcriptomic and proteomic levels are summarized and discussed. Looking forward, other newer omics-biotechnology such as metabolomics and secretomics show great potential to be used for confirming the cellular responses of Vp. Powerful data mining tools from the field of machine learning and artificial intelligence, that can better utilize the omics data and solve complex problems in the processing, analysis, and interpretation of omics data, will further improve our mechanistic understanding of Vp.
Collapse
Affiliation(s)
- Zhuosheng Liu
- Department of Food Science and Technology, University of California Davis, Davis, CA 95618, USA
| | - Yi Zhou
- Department of Food Science and Technology, University of California Davis, Davis, CA 95618, USA
| | - Hongye Wang
- Department of Food Science and Technology, University of California Davis, Davis, CA 95618, USA
| | - Chengchu Liu
- University of Maryland Sea Grant Extension Program, UMES Center for Food Science and Technology, Princess Anne, MD, United States
| | - Luxin Wang
- Department of Food Science and Technology, University of California Davis, Davis, CA 95618, USA.
| |
Collapse
|
17
|
de Crécy-Lagard V, Swairjo MA. On the necessity to include multiple types of evidence when predicting molecular function of proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.18.571875. [PMID: 38187591 PMCID: PMC10769224 DOI: 10.1101/2023.12.18.571875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Machine learning-based platforms are currently revolutionizing many fields of molecular biology including structure prediction for monomers or complexes, predicting the consequences of mutations, or predicting the functions of proteins. However, these platforms use training sets based on currently available knowledge and, in essence, are not built to discover novelty. Hence, claims of discovering novel functions for protein families using artificial intelligence should be carefully dissected, as the dangers of overpredictions are real as we show in a detailed analysis of the prediction made by Kim et al 1 on the function of the YciO protein in the model organism Escherichia coli .
Collapse
|