1
|
Kim J, Woo J, Park JY, Kim KJ, Kim D. Deep learning for NAD/NADP cofactor prediction and engineering using transformer attention analysis in enzymes. Metab Eng 2025; 87:86-94. [PMID: 39571721 DOI: 10.1016/j.ymben.2024.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 09/25/2024] [Accepted: 11/17/2024] [Indexed: 12/13/2024]
Abstract
Understanding and manipulating the cofactor preferences of NAD(P)-dependent oxidoreductases, the most widely distributed enzyme group in nature, is increasingly crucial in bioengineering. However, large-scale identification of the cofactor preferences and the design of mutants to switch cofactor specificity remain as complex tasks. Here, we introduce DISCODE (Deep learning-based Iterative pipeline to analyze Specificity of COfactors and to Design Enzyme), a novel transformer-based deep learning model to predict NAD(P) cofactor preferences. For model training, a total of 7,132 NAD(P)-dependent enzyme sequences were collected. Leveraging whole-length sequence information, DISCODE classifies the cofactor preferences of NAD(P)-dependent oxidoreductase protein sequences without structural or taxonomic limitation. The model showed 97.4% and 97.3% of accuracy and F1 score, respectively. A notable feature of DISCODE is the interpretability of its transformer layers. Analysis of attention layers in the model enables identification of several residues that showed significantly higher attention weights. They were well aligned with structurally important residues that closely interact with NAD(P), facilitating the identification of key residues for determining cofactor specificities. These key residues showed high consistency with verified cofactor switching mutants. Integrated into an enzyme design pipeline, DISCODE coupled with attention analysis, enables a fully automated approach to redesign cofactor specificity.
Collapse
Affiliation(s)
- Jaehyung Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea
| | - Jihoon Woo
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea
| | - Joon Young Park
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea
| | - Kyung-Jin Kim
- School of Life Sciences, BK21 FOUR KNU Creative BioResearch Group, KNU Institute of Microbiology, Kyungpook National University, Daegu, 41566, Republic of Korea
| | - Donghyuk Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea.
| |
Collapse
|
2
|
Ye Y, Jiang H, Xu R, Wang S, Zheng L, Guo J. The INSIGHT platform: Enhancing NAD(P)-dependent specificity prediction for co-factor specificity engineering. Int J Biol Macromol 2024; 278:135064. [PMID: 39182884 DOI: 10.1016/j.ijbiomac.2024.135064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 08/22/2024] [Accepted: 08/23/2024] [Indexed: 08/27/2024]
Abstract
Enzyme specificity towards cofactors like NAD(P)H is crucial for applications in bioremediation and eco-friendly chemical synthesis. Despite their role in converting pollutants and creating sustainable products, predicting enzyme specificity faces challenges due to sparse data and inadequate models. To bridge this gap, we developed the cutting-edge INSIGHT platform to enhance the prediction of coenzyme specificity in NAD(P)-dependent enzymes. INSIGHT integrates extensive data from principal bioinformatics resources, concentrating on both NADH and NADPH specificities, and utilizes advanced protein language models to refine the predictions. This integration not only strengthens computational predictions but also meets the practical demands of high-throughput screening and optimization. Experimental validation confirms INSIGHT's effectiveness, boosting our ability to engineer enzymes for efficient, sustainable industrial and environmental processes. This work advances the practical use of computational tools in enzyme research, addressing industrial needs and offering scalable solutions for environmental challenges.
Collapse
Affiliation(s)
- Yilin Ye
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao
| | | | - Ran Xu
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., China
| | | | - Jingjing Guo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao.
| |
Collapse
|
3
|
Filgueiras JPC, Zámocký M, Turchetto-Zolet AC. Unraveling the evolutionary origin of the P5CS gene: a story of gene fusion and horizontal transfer. Front Mol Biosci 2024; 11:1341684. [PMID: 38693917 PMCID: PMC11061531 DOI: 10.3389/fmolb.2024.1341684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 03/25/2024] [Indexed: 05/03/2024] Open
Abstract
The accumulation of proline in response to the most diverse types of stress is a widespread defense mechanism. In prokaryotes, fungi, and certain unicellular eukaryotes (green algae), the first two reactions of proline biosynthesis occur through two distinct enzymes, γ-glutamyl kinase (GK E.C. 2.7.2.11) and γ-glutamyl phosphate reductase (GPR E.C. 1.2.1.41), encoded by two different genes, ProB and ProA, respectively. Plants, animals, and a few unicellular eukaryotes carry out these reactions through a single bifunctional enzyme, the Δ1-pyrroline-5-carboxylate synthase (P5CS), which has the GK and GPR domains fused. To better understand the origin and diversification of the P5CS gene, we use a robust phylogenetic approach with a broad sampling of the P5CS, ProB and ProA genes, including species from all three domains of life. Our results suggest that the collected P5CS genes have arisen from a single fusion event between the ProA and ProB gene paralogs. A peculiar fusion event occurred in an ancestral eukaryotic lineage and was spread to other lineages through horizontal gene transfer. As for the diversification of this gene family, the phylogeny of the P5CS gene in plants shows that there have been multiple independent processes of duplication and loss of this gene, with the duplications being related to old polyploidy events.
Collapse
Affiliation(s)
- João Pedro Carmo Filgueiras
- Graduate Program in Genetics and Molecular Biology, Department of Genetics, Institute of Biosciences, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
| | - Marcel Zámocký
- Laboratory of Phylogenomic Ecology, Institute of Molecular Biology, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Andreia Carina Turchetto-Zolet
- Graduate Program in Genetics and Molecular Biology, Department of Genetics, Institute of Biosciences, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
| |
Collapse
|
4
|
Vadaddi SM, Zhao Q, Savoie BM. Graph to Activation Energy Models Easily Reach Irreducible Errors but Show Limited Transferability. J Phys Chem A 2024; 128:2543-2555. [PMID: 38517281 DOI: 10.1021/acs.jpca.3c07240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Activation energy characterization of competing reactions is a costly but crucial step for understanding the kinetic relevance of distinct reaction pathways, product yields, and myriad other properties of reacting systems. The standard methodology for activation energy characterization has historically been a transition state search using the highest level of theory that can be afforded. However, recently, several groups have popularized the idea of predicting activation energies directly based on nothing more than the reactant and product graphs, a sufficiently complex neural network, and a broad enough data set. Here, we have revisited this task using the recently developed Reaction Graph Depth 1 (RGD1) transition state data set and several newly developed graph attention architectures. All of these new architectures achieve similar state-of-the-art results of ∼4 kcal/mol mean absolute error on withheld testing sets of reactions but poor performance on external testing sets composed of reactions with differing mechanisms, reaction molecularity, or reactant size distribution. Limited transferability is also shown to be shared by other contemporary graph to activation energy architectures through a series of case studies. We conclude that an array of standard graph architectures can already achieve results comparable to the irreducible error of available reaction data sets but that out-of-distribution performance remains poor.
Collapse
Affiliation(s)
- Sai Mahit Vadaddi
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Qiyuan Zhao
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| |
Collapse
|
5
|
Wang Z, Wang S, Li Y, Guo J, Wei Y, Mu Y, Zheng L, Li W. A new paradigm for applying deep learning to protein-ligand interaction prediction. Brief Bioinform 2024; 25:bbae145. [PMID: 38581420 PMCID: PMC10998640 DOI: 10.1093/bib/bbae145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/21/2024] [Accepted: 03/18/2024] [Indexed: 04/08/2024] Open
Abstract
Protein-ligand interaction prediction presents a significant challenge in drug design. Numerous machine learning and deep learning (DL) models have been developed to accurately identify docking poses of ligands and active compounds against specific targets. However, current models often suffer from inadequate accuracy or lack practical physical significance in their scoring systems. In this research paper, we introduce IGModel, a novel approach that utilizes the geometric information of protein-ligand complexes as input for predicting the root mean square deviation of docking poses and the binding strength (pKd, the negative value of the logarithm of binding affinity) within the same prediction framework. This ensures that the output scores carry intuitive meaning. We extensively evaluate the performance of IGModel on various docking power test sets, including the CASF-2016 benchmark, PDBbind-CrossDocked-Core and DISCO set, consistently achieving state-of-the-art accuracies. Furthermore, we assess IGModel's generalizability and robustness by evaluating it on unbiased test sets and sets containing target structures generated by AlphaFold2. The exceptional performance of IGModel on these sets demonstrates its efficacy. Additionally, we visualize the latent space of protein-ligand interactions encoded by IGModel and conduct interpretability analysis, providing valuable insights. This study presents a novel framework for DL-based prediction of protein-ligand interactions, contributing to the advancement of this field. The IGModel is available at GitHub repository https://github.com/zchwang/IGModel.
Collapse
Affiliation(s)
- Zechen Wang
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| | - Sheng Wang
- Shanghai Zelixir Biotech, Xiangke Road, 200030, Shanghai, China
| | - Yangyang Li
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| | - Jingjing Guo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Rua de Luís Gonzaga Gomes, Macao, China
| | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Xueyuan Road 1068, Shenzhen, 518055 Guang Dong, China
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, Singapore
| | - Liangzhen Zheng
- Shanghai Zelixir Biotech, Xiangke Road, 200030, Shanghai, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Xueyuan Road 1068, Shenzhen, 518055 Guang Dong, China
| | - Weifeng Li
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| |
Collapse
|
6
|
Han S, Lee JE, Kang S, So M, Jin H, Lee JH, Baek S, Jun H, Kim TY, Lee YS. Standigm ASK™: knowledge graph and artificial intelligence platform applied to target discovery in idiopathic pulmonary fibrosis. Brief Bioinform 2024; 25:bbae035. [PMID: 38349059 PMCID: PMC10862655 DOI: 10.1093/bib/bbae035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 12/28/2023] [Indexed: 02/15/2024] Open
Abstract
Standigm ASK™ revolutionizes healthcare by addressing the critical challenge of identifying pivotal target genes in disease mechanisms-a fundamental aspect of drug development success. Standigm ASK™ integrates a unique combination of a heterogeneous knowledge graph (KG) database and an attention-based neural network model, providing interpretable subgraph evidence. Empowering users through an interactive interface, Standigm ASK™ facilitates the exploration of predicted results. Applying Standigm ASK™ to idiopathic pulmonary fibrosis (IPF), a complex lung disease, we focused on genes (AMFR, MDFIC and NR5A2) identified through KG evidence. In vitro experiments demonstrated their relevance, as TGFβ treatment induced gene expression changes associated with epithelial-mesenchymal transition characteristics. Gene knockdown reversed these changes, identifying AMFR, MDFIC and NR5A2 as potential therapeutic targets for IPF. In summary, Standigm ASK™ emerges as an innovative KG and artificial intelligence platform driving insights in drug target discovery, exemplified by the identification and validation of therapeutic targets for IPF.
Collapse
Affiliation(s)
- Seokjin Han
- Standigm Inc., Nonhyeon-ro 85-gil, 06234, Seoul, Republic of Korea
| | - Ji Eun Lee
- College of Pharmacy, Ewha Womans University, Ewhayeodae-gil, 03760, Seoul, Republic of Korea
| | - Seolhee Kang
- Standigm Inc., Nonhyeon-ro 85-gil, 06234, Seoul, Republic of Korea
| | - Minyoung So
- Standigm Inc., Nonhyeon-ro 85-gil, 06234, Seoul, Republic of Korea
| | - Hee Jin
- College of Pharmacy, Ewha Womans University, Ewhayeodae-gil, 03760, Seoul, Republic of Korea
| | - Jang Ho Lee
- Standigm Inc., Nonhyeon-ro 85-gil, 06234, Seoul, Republic of Korea
| | - Sunghyeob Baek
- Standigm Inc., Nonhyeon-ro 85-gil, 06234, Seoul, Republic of Korea
| | - Hyungjin Jun
- Standigm Inc., Nonhyeon-ro 85-gil, 06234, Seoul, Republic of Korea
| | - Tae Yong Kim
- Standigm Inc., Nonhyeon-ro 85-gil, 06234, Seoul, Republic of Korea
| | - Yun-Sil Lee
- College of Pharmacy, Ewha Womans University, Ewhayeodae-gil, 03760, Seoul, Republic of Korea
| |
Collapse
|
7
|
Lin X, Chen J, Ma W, Tang W, Wang Y. EEG emotion recognition using improved graph neural network with channel selection. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 231:107380. [PMID: 36745954 DOI: 10.1016/j.cmpb.2023.107380] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 01/11/2023] [Accepted: 01/26/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND AND OBJECTIVE Emotion classification tasks based on electroencephalography (EEG) are an essential part of artificial intelligence, with promising applications in healthcare areas such as autism research and emotion detection in pregnant women. However, the complex data acquisition environment provides a variable number of EEG channels, which interferes with the model to simulate the process of information transfer in the human brain. Therefore, this paper proposes an improved graph convolution model with dynamic channel selection. METHODS The proposed model combines the advantages of 1D convolution and graph convolution to capture the intra- and inter-channel EEG features, respectively. We add functional connectivity in the graph structure that helps to simulate the relationship between brain regions further. In addition, an adjustable scale of channel selection can be performed based on the attention distribution in the graph structure. RESULTS We conducted various experiments on the DEAP-Twente, DEAP-Geneva, and SEED datasets and achieved average accuracies of 90.74%, 91%, and 90.22%, respectively, which exceeded most existing models. Meanwhile, with only 20% of the EEG channels retained, the models achieved average accuracies of 82.78%, 84%, and 83.93% on the above three datasets, respectively. CONCLUSIONS The experimental results show that the proposed model can achieve effective emotion classification in complex dataset environments. Also, the proposed channel selection method is informative for reducing the cost of affective computing.
Collapse
Affiliation(s)
- Xuefen Lin
- School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, PR China
| | - Jielin Chen
- School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, PR China.
| | - Weifeng Ma
- School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, PR China
| | - Wei Tang
- School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, PR China
| | - Yuchen Wang
- School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, PR China
| |
Collapse
|
8
|
Li G, Buric F, Zrimec J, Viknander S, Nielsen J, Zelezniak A, Engqvist MKM. Learning deep representations of enzyme thermal adaptation. Protein Sci 2022; 31:e4480. [PMID: 36261883 PMCID: PMC9679980 DOI: 10.1002/pro.4480] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 09/02/2022] [Accepted: 10/15/2022] [Indexed: 12/14/2022]
Abstract
Temperature is a fundamental environmental factor that shapes the evolution of organisms. Learning thermal determinants of protein sequences in evolution thus has profound significance for basic biology, drug discovery, and protein engineering. Here, we use a data set of over 3 million BRENDA enzymes labeled with optimal growth temperatures (OGTs) of their source organisms to train a deep neural network model (DeepET). The protein-temperature representations learned by DeepET provide a temperature-related statistical summary of protein sequences and capture structural properties that affect thermal stability. For prediction of enzyme optimal catalytic temperatures and protein melting temperatures via a transfer learning approach, our DeepET model outperforms classical regression models trained on rationally designed features and other deep-learning-based representations. DeepET thus holds promise for understanding enzyme thermal adaptation and guiding the engineering of thermostable enzymes.
Collapse
Affiliation(s)
- Gang Li
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
| | - Filip Buric
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
| | - Jan Zrimec
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
- Department of Biotechnology and Systems BiologyNational Institute of BiologyLjubljanaSlovenia
| | - Sandra Viknander
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
| | - Jens Nielsen
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
- BioInnovation InstituteCopenhagen NDenmark
| | - Aleksej Zelezniak
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
- Life Sciences CentreInstitute of Biotechnology, Vilnius UniversityVilniusLithuania
- Randall Centre for Cell & Molecular BiophysicsKing's College London, New Hunt's House, Guy's Campus, SE1 1ULLondonUK
| | - Martin K. M. Engqvist
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
- Enginzyme ABStockholmSweden
| |
Collapse
|
9
|
Ludwiczak J, Winski A, Dunin-Horkawicz S. Localpdb- a Python package to manage protein structures and their annotations. Bioinformatics 2022; 38:2633-2635. [PMID: 35199148 PMCID: PMC9048648 DOI: 10.1093/bioinformatics/btac121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 01/07/2022] [Accepted: 02/21/2022] [Indexed: 12/02/2022] Open
Abstract
Motivation The wealth of protein structures collected in the Protein Data Bank enabled large-scale studies of their function and evolution. Such studies, however, require the generation of customized datasets combining the structural data with miscellaneous accessory resources providing functional, taxonomic and other annotations. Unfortunately, the functionality of currently available tools for the creation of such datasets is limited and their usage frequently requires laborious surveying of various data sources and resolving inconsistencies between their versions. Results To address this problem, we developed localpdb, a versatile Python library for the management of protein structures and their annotations. The library features a flexible plugin system enabling seamless unification of the structural data with diverse auxiliary resources, full version control and powerful functionality of creating highly customized datasets. The localpdb can be used in a wide range of bioinformatic tasks, in particular those involving large-scale protein structural analyses and machine learning. Availability and implementation localpdb is freely available at https://github.com/labstructbioinf/localpdb. Documentation along with the usage examples can be accessed at https://labstructbioinf.github.io/localpdb/.
Collapse
Affiliation(s)
- Jan Ludwiczak
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, 02-097, Poland
| | - Aleksander Winski
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, 02-097, Poland
| | - Stanislaw Dunin-Horkawicz
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, 02-097, Poland
| |
Collapse
|