1
|
Huang Y, Lin Y, Lan W, Huang C, Zhong C. GloEC: a hierarchical-aware global model for predicting enzyme function. Brief Bioinform 2024; 25:bbae365. [PMID: 39073830 DOI: 10.1093/bib/bbae365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 06/18/2024] [Accepted: 07/12/2024] [Indexed: 07/30/2024] Open
Abstract
The annotation of enzyme function is a fundamental challenge in industrial biotechnology and pathologies. Numerous computational methods have been proposed to predict enzyme function by annotating enzyme labels with Enzyme Commission number. However, the existing methods face difficulties in modelling the hierarchical structure of enzyme label in a global view. Moreover, they haven't gone entirely to leverage the mutual interactions between different levels of enzyme label. In this paper, we formulate the hierarchy of enzyme label as a directed enzyme graph and propose a hierarchy-GCN (Graph Convolutional Network) encoder to globally model enzyme label dependency on the enzyme graph. Based on the enzyme hierarchy encoder, we develop an end-to-end hierarchical-aware global model named GloEC to predict enzyme function. GloEC learns hierarchical-aware enzyme label embeddings via the hierarchy-GCN encoder and conducts deductive fusion of label-aware enzyme features to predict enzyme labels. Meanwhile, our hierarchy-GCN encoder is designed to bidirectionally compute to investigate the enzyme label correlation information in both bottom-up and top-down manners, which has not been explored in enzyme function prediction. Comparative experiments on three benchmark datasets show that GloEC achieves better predictive performance as compared to the existing methods. The case studies also demonstrate that GloEC is capable of effectively predicting the function of isoenzyme. GloEC is available at: https://github.com/hyr0771/GloEC.
Collapse
Affiliation(s)
- Yiran Huang
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China
- Key Laboratory of Parallel, Distributed and Intelligent Computing in Guangxi Universities and Colleges, Guangxi University, Nanning 530004, China
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, China
| | - Yufu Lin
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China
| | - Wei Lan
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China
- Key Laboratory of Parallel, Distributed and Intelligent Computing in Guangxi Universities and Colleges, Guangxi University, Nanning 530004, China
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, China
| | - Cuiyu Huang
- College of Chemistry, Tianjin Key Laboratory of Biosensing and Molecular Recognition, Nankai University, Tianjin 300071, China
| | - Cheng Zhong
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China
- Key Laboratory of Parallel, Distributed and Intelligent Computing in Guangxi Universities and Colleges, Guangxi University, Nanning 530004, China
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, China
| |
Collapse
|
2
|
Qian W, Wang X, Kang Y, Pan P, Hou T, Hsieh CY. A general model for predicting enzyme functions based on enzymatic reactions. J Cheminform 2024; 16:38. [PMID: 38556873 PMCID: PMC10983695 DOI: 10.1186/s13321-024-00827-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 03/16/2024] [Indexed: 04/02/2024] Open
Abstract
Accurate prediction of the enzyme comission (EC) numbers for chemical reactions is essential for the understanding and manipulation of enzyme functions, biocatalytic processes and biosynthetic planning. A number of machine leanring (ML)-based models have been developed to classify enzymatic reactions, showing great advantages over costly and long-winded experimental verifications. However, the prediction accuracy for most available models trained on the records of chemical reactions without specifying the enzymatic catalysts is rather limited. In this study, we introduced BEC-Pred, a BERT-based multiclassification model, for predicting EC numbers associated with reactions. Leveraging transfer learning, our approach achieves precise forecasting across a wide variety of Enzyme Commission (EC) numbers solely through analysis of the SMILES sequences of substrates and products. BEC-Pred model outperformed other sequence and graph-based ML methods, attaining a higher accuracy of 91.6%, surpassing them by 5.5%, and exhibiting superior F1 scores with improvements of 6.6% and 6.0%, respectively. The enhanced performance highlights the potential of BEC-Pred to serve as a reliable foundational tool to accelerate the cutting-edge research in synthetic biology and drug metabolism. Moreover, we discussed a few examples on how BEC-Pred could accurately predict the enzymatic classification for the Novozym 435-induced hydrolysis and lipase efficient catalytic synthesis. We anticipate that BEC-Pred will have a positive impact on the progression of enzymatic research.
Collapse
Affiliation(s)
- Wenjia Qian
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Xiaorui Wang
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, 999078, China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Peichen Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
3
|
Tang W, Deng Z, Zhou H, Zhang W, Hu F, Choi KS, Wang S. MVDINET: A Novel Multi-Level Enzyme Function Predictor With Multi-View Deep Interactive Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:84-94. [PMID: 38015669 DOI: 10.1109/tcbb.2023.3337158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
As a class of extremely significant of biocatalysts, enzymes play an important role in the process of biological reproduction and metabolism. Therefore, the prediction of enzyme function is of great significance in biomedicine fields. Recently, computational methods for predicting enzyme function have been proposed, and they effectively reduce the cost of enzyme function prediction. However, there are still deficiencies for effectively mining the discriminant information for enzyme function recognition in existing methods. In this study, we present MVDINET, a novel method for multi-level enzyme function prediction. First, the initial multi-view feature data is extracted by the enzyme sequence. Then, the above initial views are fed into various deep specific network modules to learn the depth-specificity information. Further, a deep view interaction network is designed to extract the interaction information. Finally, the specificity information and interaction information are fed into a multi-view adaptively weighted classification. We compressively evaluate MVDINET on benchmark datasets and demonstrate that MVDINET is superior to existing methods.
Collapse
|
4
|
Han SR, Park M, Kosaraju S, Lee J, Lee H, Lee JH, Oh TJ, Kang M. Evidential deep learning for trustworthy prediction of enzyme commission number. Brief Bioinform 2023; 25:bbad401. [PMID: 37991247 PMCID: PMC10664415 DOI: 10.1093/bib/bbad401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/25/2023] [Accepted: 10/19/2023] [Indexed: 11/23/2023] Open
Abstract
The rapid growth of uncharacterized enzymes and their functional diversity urge accurate and trustworthy computational functional annotation tools. However, current state-of-the-art models lack trustworthiness on the prediction of the multilabel classification problem with thousands of classes. Here, we demonstrate that a novel evidential deep learning model (named ECPICK) makes trustworthy predictions of enzyme commission (EC) numbers with data-driven domain-relevant evidence, which results in significantly enhanced predictive power and the capability to discover potential new motif sites. ECPICK learns complex sequential patterns of amino acids and their hierarchical structures from 20 million enzyme data. ECPICK identifies significant amino acids that contribute to the prediction without multiple sequence alignment. Our intensive assessment showed not only outstanding enhancement of predictive performance on the largest databases of Uniprot, Protein Data Bank (PDB) and Kyoto Encyclopedia of Genes and Genomes (KEGG), but also a capability to discover new motif sites in microorganisms. ECPICK is a reliable EC number prediction tool to identify protein functions of an increasing number of uncharacterized enzymes.
Collapse
Affiliation(s)
- So-Ra Han
- Department of Life Science and Biochemical Engineering, Sun Moon University, Asan, Republic of Korea
- Bio Big Data-based Chungnam Smart Clean Research Leader Training Program, SunMoon University, Asan, Republic of Korea
| | - Mingyu Park
- Bio Big Data-based Chungnam Smart Clean Research Leader Training Program, SunMoon University, Asan, Republic of Korea
- Division of Computer Science and Engineering, Sun Moon University, Asan, Republic of Korea
| | - Sai Kosaraju
- Department of Computer Science, University of Nevada, Las Vegas, NV, USA
| | - JeungMin Lee
- Bio Big Data-based Chungnam Smart Clean Research Leader Training Program, SunMoon University, Asan, Republic of Korea
- Division of Computer Science and Engineering, Sun Moon University, Asan, Republic of Korea
| | - Hyun Lee
- Bio Big Data-based Chungnam Smart Clean Research Leader Training Program, SunMoon University, Asan, Republic of Korea
- Division of Computer Science and Engineering, Sun Moon University, Asan, Republic of Korea
- Genome-based BioIT Convergence Institute, Asan, Republic of Korea
| | - Jun Hyuck Lee
- Research Unit of Cryogenic Novel Material, Korea Polar Research Institute, Incheon, Republic of Korea
| | - Tae-Jin Oh
- Department of Life Science and Biochemical Engineering, Sun Moon University, Asan, Republic of Korea
- Bio Big Data-based Chungnam Smart Clean Research Leader Training Program, SunMoon University, Asan, Republic of Korea
- Genome-based BioIT Convergence Institute, Asan, Republic of Korea
- Department of Pharmaceutical Engineering and Biotechnology, Sun Moon University, Asan, Republic of Korea
| | - Mingon Kang
- Department of Computer Science, University of Nevada, Las Vegas, NV, USA
| |
Collapse
|
5
|
Yu Y, Rué Casamajo A, Finnigan W, Schnepel C, Barker R, Morrill C, Heath RS, De Maria L, Turner NJ, Scrutton NS. Structure-Based Design of Small Imine Reductase Panels for Target Substrates. ACS Catal 2023; 13:12310-12321. [PMID: 37736118 PMCID: PMC10510103 DOI: 10.1021/acscatal.3c02278] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 08/20/2023] [Indexed: 09/23/2023]
Abstract
Biocatalysis is important in the discovery, development, and manufacture of pharmaceuticals. However, the identification of enzymes for target transformations of interest requires major screening efforts. Here, we report a structure-based computational workflow to prioritize protein sequences by a score based on predicted activities on substrates, thereby reducing a resource-intensive laboratory-based biocatalyst screening. We selected imine reductases (IREDs) as a class of biocatalysts to illustrate the application of the computational workflow termed IREDFisher. Validation by using published data showed that IREDFisher can retrieve the best enzymes and increase the hit rate by identifying the top 20 ranked sequences. The power of IREDFisher is confirmed by computationally screening 1400 sequences for chosen reductive amination reactions with different levels of complexity. Highly active IREDs were identified by only testing 20 samples in vitro. Our speed test shows that it only takes 90 min to rank 85 sequences from user input and 30 min for the established IREDFisher database containing 591 IRED sequences. IREDFisher is available as a user-friendly web interface (https://enzymeevolver.com/IREDFisher). IREDFisher enables the rapid discovery of IREDs for applications in synthesis and directed evolution studies, with minimal time and resource expenditure. Future use of the workflow with other enzyme families could be implemented following the modification of the workflow scoring function.
Collapse
Affiliation(s)
- Yuqi Yu
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
- Augmented
Biologics Discovery & Design, Department of Biologics Engineering, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB21 6GH, U.K.
| | - Arnau Rué Casamajo
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - William Finnigan
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Christian Schnepel
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Rhys Barker
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Charlotte Morrill
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Rachel S. Heath
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Leonardo De Maria
- Medicinal
Chemistry, Research and Early Development, Respiratory and Immunology
(RI), BioPharmaceuticals R&D, AstraZeneca, Gothenburg 43150, Sweden
| | - Nicholas J. Turner
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Nigel S. Scrutton
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| |
Collapse
|
6
|
Rappoport D, Jinich A. Enzyme Substrate Prediction from Three-Dimensional Feature Representations Using Space-Filling Curves. J Chem Inf Model 2023; 63:1637-1648. [PMID: 36802628 DOI: 10.1021/acs.jcim.3c00005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Abstract
Compact and interpretable structural feature representations are required for accurately predicting properties and function of proteins. In this work, we construct and evaluate three-dimensional feature representations of protein structures based on space-filling curves (SFCs). We focus on the problem of enzyme substrate prediction, using two ubiquitous enzyme families as case studies: the short-chain dehydrogenase/reductases (SDRs) and the S-adenosylmethionine-dependent methyltransferases (SAM-MTases). Space-filling curves such as the Hilbert curve and the Morton curve generate a reversible mapping from discretized three-dimensional to one-dimensional representations and thus help to encode three-dimensional molecular structures in a system-independent way and with only a few adjustable parameters. Using three-dimensional structures of SDRs and SAM-MTases generated using AlphaFold2, we assess the performance of the SFC-based feature representations in predictions on a new benchmark database of enzyme classification tasks including their cofactor and substrate selectivity. Gradient-boosted tree classifiers yield binary prediction accuracy of 0.77-0.91 and area under curve (AUC) characteristics of 0.83-0.92 for the classification tasks. We investigate the effects of amino acid encoding, spatial orientation, and (the few) parameters of SFC-based encodings on the accuracy of the predictions. Our results suggest that geometry-based approaches such as SFCs are promising for generating protein structural representations and are complementary to the existing protein feature representations such as evolutionary scale modeling (ESM) sequence embeddings.
Collapse
Affiliation(s)
- Dmitrij Rappoport
- Department of Chemistry, University of California, Irvine, 1102 Natural Sciences 2, Irvine, California 92697, United States
| | - Adrian Jinich
- Weill Cornell Medicine, 1300 York Avenue, Box 65, New York, New York 10065, United States
| |
Collapse
|
7
|
Zhu YH, Zhang C, Yu DJ, Zhang Y. Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction. PLoS Comput Biol 2022; 18:e1010793. [PMID: 36548439 PMCID: PMC9822105 DOI: 10.1371/journal.pcbi.1010793] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 01/06/2023] [Accepted: 12/05/2022] [Indexed: 12/24/2022] Open
Abstract
Accurate identification of protein function is critical to elucidate life mechanisms and design new drugs. We proposed a novel deep-learning method, ATGO, to predict Gene Ontology (GO) attributes of proteins through a triplet neural-network architecture embedded with pre-trained language models from protein sequences. The method was systematically tested on 1068 non-redundant benchmarking proteins and 3328 targets from the third Critical Assessment of Protein Function Annotation (CAFA) challenge. Experimental results showed that ATGO achieved a significant increase of the GO prediction accuracy compared to the state-of-the-art approaches in all aspects of molecular function, biological process, and cellular component. Detailed data analyses showed that the major advantage of ATGO lies in the utilization of pre-trained transformer language models which can extract discriminative functional pattern from the feature embeddings. Meanwhile, the proposed triplet network helps enhance the association of functional similarity with feature similarity in the sequence embedding space. In addition, it was found that the combination of the network scores with the complementary homology-based inferences could further improve the accuracy of the predicted models. These results demonstrated a new avenue for high-accuracy deep-learning function prediction that is applicable to large-scale protein function annotations from sequence alone.
Collapse
Affiliation(s)
- Yi-Heng Zhu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, People’s Republic of China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, People’s Republic of China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
8
|
Ferdous S, Shihab IF, Reuel NF. Effects of Sequence Features on Machine-Learned Enzyme Classification Fidelity. Biochem Eng J 2022; 187:108612. [PMID: 37215687 PMCID: PMC10194028 DOI: 10.1016/j.bej.2022.108612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Assigning enzyme commission (EC) numbers using sequence information alone has been the subject of recent classification algorithms where statistics, homology and machine-learning based methods are used. This work benchmarks performance of a few of these algorithms as a function of sequence features such as chain length and amino acid composition (AAC). This enables determination of optimal classification windows for de novo sequence generation and enzyme design. In this work we developed a parallelization workflow which efficiently processes >500,000 annotated sequences through each candidate algorithm and a visualization workflow to observe the performance of the classifier over changing enzyme length, main EC class and AAC. We applied these workflows to the entire SwissProt database to date (n = 565245) using two, locally installable classifiers, ECpred and DeepEC, and collecting results from two other webserver-based tools, Deepre and BENZ-ws. It is observed that all the classifiers exhibit peak performance in the range of 300 to 500 amino acids in length. In terms of main EC class, classifiers were most accurate at predicting translocases (EC-6) and were least accurate in determining hydrolases (EC-3) and oxidoreductases (EC-1). We also identified AAC ranges that are most common in the annotated enzymes and found that all classifiers work best in this common range. Among the four classifiers, ECpred showed the best consistency in changing feature space. These workflows can be used to benchmark new algorithms as they are developed and find optimum design spaces for the generation of new, synthetic enzymes.
Collapse
Affiliation(s)
- Sakib Ferdous
- Department of Chemical and Biological Engineering, Iowa State University
| | | | - Nigel F. Reuel
- Department of Chemical and Biological Engineering, Iowa State University
| |
Collapse
|
9
|
Nourani E, Asgari E, McHardy AC, Mofrad MRK. TripletProt: Deep Representation Learning of Proteins Based On Siamese Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3744-3753. [PMID: 34460382 DOI: 10.1109/tcbb.2021.3108718] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Pretrained representations have recently gained attention in various machine learning applications. Nonetheless, the high computational costs associated with training these models have motivated alternative approaches for representation learning. Herein we introduce TripletProt, a new approach for protein representation learning based on the Siamese neural networks. Representation learning of biological entities which capture essential features can alleviate many of the challenges associated with supervised learning in bioinformatics. The most important distinction of our proposed method is relying on the protein-protein interaction (PPI) network. The computational cost of the generated representations for any potential application is significantly lower than comparable methods since the length of the representations is significantly smaller than that in other approaches. TripletProt offers great potentials for the protein informatics tasks and can be widely applied to similar tasks. We evaluate TripletProt comprehensively in protein functional annotation tasks including sub-cellular localization (14 categories) and gene ontology prediction (more than 2000 classes), which are both challenging multi-class, multi-label classification machine learning problems. We compare the performance of TripletProt with the state-of-the-art approaches including a recurrent language model-based approach (i.e., UniRep), as well as a protein-protein interaction (PPI) network and sequence-based method (i.e., DeepGO). Our TripletProt showed an overall improvement of F1 score in the above mentioned comprehensive functional annotation tasks, solely relying on the PPI network. Availability: The source code and datasets are available at https://github.com/EsmaeilNourani/TripletProt.
Collapse
|
10
|
Vasina M, Velecký J, Planas-Iglesias J, Marques SM, Skarupova J, Damborsky J, Bednar D, Mazurenko S, Prokop Z. Tools for computational design and high-throughput screening of therapeutic enzymes. Adv Drug Deliv Rev 2022; 183:114143. [PMID: 35167900 DOI: 10.1016/j.addr.2022.114143] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2021] [Revised: 02/04/2022] [Accepted: 02/09/2022] [Indexed: 12/16/2022]
Abstract
Therapeutic enzymes are valuable biopharmaceuticals in various biomedical applications. They have been successfully applied for fibrinolysis, cancer treatment, enzyme replacement therapies, and the treatment of rare diseases. Still, there is a permanent demand to find new or better therapeutic enzymes, which would be sufficiently soluble, stable, and active to meet specific medical needs. Here, we highlight the benefits of coupling computational approaches with high-throughput experimental technologies, which significantly accelerate the identification and engineering of catalytic therapeutic agents. New enzymes can be identified in genomic and metagenomic databases, which grow thanks to next-generation sequencing technologies exponentially. Computational design and machine learning methods are being developed to improve catalytically potent enzymes and predict their properties to guide the selection of target enzymes. High-throughput experimental pipelines, increasingly relying on microfluidics, ensure functional screening and biochemical characterization of target enzymes to reach efficient therapeutic enzymes.
Collapse
Affiliation(s)
- Michal Vasina
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic
| | - Jan Velecký
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic
| | - Joan Planas-Iglesias
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic
| | - Sergio M Marques
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic
| | - Jana Skarupova
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic; Enantis, INBIT, Kamenice 34, Brno, Czech Republic
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic.
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic.
| | - Zbynek Prokop
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic.
| |
Collapse
|
11
|
Kamran AB, Naveed H. GOntoSim: a semantic similarity measure based on LCA and common descendants. Sci Rep 2022; 12:3818. [PMID: 35264663 PMCID: PMC8907294 DOI: 10.1038/s41598-022-07624-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 02/14/2022] [Indexed: 11/20/2022] Open
Abstract
The Gene Ontology (GO) is a controlled vocabulary that captures the semantics or context of an entity based on its functional role. Biomedical entities are frequently compared to each other to find similarities to help in data annotation and knowledge transfer. In this study, we propose GOntoSim, a novel method to determine the functional similarity between genes. GOntoSim quantifies the similarity between pairs of GO terms, by taking the graph structure and the information content of nodes into consideration. Our measure quantifies the similarity between the ancestors of the GO terms accurately. It also takes into account the common children of the GO terms. GOntoSim is evaluated using the entire Enzyme Dataset containing 10,890 proteins and 97,544 GO annotations. The enzymes are clustered and compared with the Gold Standard EC numbers. At level 1 of the EC Numbers for Molecular Function, GOntoSim achieves a purity score of 0.75 as compared to 0.47 and 0.51 GOGO and Wang. GOntoSim can handle the noisy IEA annotations. We achieve a purity score of 0.94 in contrast to 0.48 for both GOGO and Wang at level 1 of the EC Numbers with IEA annotations. GOntoSim can be freely accessed at (http://www.cbrlab.org/GOntoSim.html).
Collapse
Affiliation(s)
- Amna Binte Kamran
- Computational Biology Research Lab, Department of Computer Science, National University of Computer & Emerging Sciences (NUCES-FAST), Islamabad, 44800, Pakistan
| | - Hammad Naveed
- Computational Biology Research Lab, Department of Computer Science, National University of Computer & Emerging Sciences (NUCES-FAST), Islamabad, 44800, Pakistan.
| |
Collapse
|
12
|
Khan KA, Memon SA, Naveed H. A hierarchical deep learning based approach for multi-functional enzyme classification. Protein Sci 2021; 30:1935-1945. [PMID: 34118089 DOI: 10.1002/pro.4146] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 06/10/2021] [Accepted: 06/11/2021] [Indexed: 11/09/2022]
Abstract
Enzymes are critical proteins in every organism. They speed up essential chemical reactions, help fight diseases, and have a wide use in the pharmaceutical and manufacturing industries. Wet lab experiments to figure out an enzyme's function are time consuming and expensive. Therefore, the need for computational approaches to address this problem are becoming necessary. Usually, an enzyme is extremely specific in performing its function. However, there exist enzymes that can perform multiple functions. A multi-functional enzyme has vast potential as it reduces the need to discover/use different enzymes for different functions. We propose an approach to predict a multi-functional enzyme's function up to the most specific fourth level of the hierarchy of the Enzyme Commission (EC) number. Previous studies can only predict the function of the enzyme till level 1. Using a dataset of 2,583 multi-functional enzymes, we achieved a hierarchical subset accuracy of 71.4% and a Macro F1 Score of 96.1% at the fourth level. The robustness of the network was further tested on a multi-functional isoforms dataset. Our method is broadly applicable and may be used to discover better enzymes. The web-server can be freely accessed at http://hecnet.cbrlab.org/.
Collapse
Affiliation(s)
- Kinaan Aamir Khan
- Computational Biology Research Lab, National University of Computer and Emerging Sciences, Islamabad, Pakistan
| | - Safyan Aman Memon
- Computational Biology Research Lab, National University of Computer and Emerging Sciences, Islamabad, Pakistan
| | - Hammad Naveed
- Computational Biology Research Lab, National University of Computer and Emerging Sciences, Islamabad, Pakistan
| |
Collapse
|
13
|
Marques SM, Planas-Iglesias J, Damborsky J. Web-based tools for computational enzyme design. Curr Opin Struct Biol 2021; 69:19-34. [PMID: 33667757 DOI: 10.1016/j.sbi.2021.01.010] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 01/14/2021] [Accepted: 01/27/2021] [Indexed: 12/30/2022]
Abstract
Enzymes are in high demand for very diverse biotechnological applications. However, natural biocatalysts often need to be engineered for fine-tuning their properties towards the end applications, such as the activity, selectivity, stability to temperature or co-solvents, and solubility. Computational methods are increasingly used in this task, providing predictions that narrow down the space of possible mutations significantly and can enormously reduce the experimental burden. Many computational tools are available as web-based platforms, making them accessible to non-expert users. These platforms are typically user-friendly, contain walk-throughs, and do not require deep expertise and installations. Here we describe some of the most recent outstanding web-tools for enzyme engineering and formulate future perspectives in this field.
Collapse
Affiliation(s)
- Sérgio M Marques
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Centre for Clinical Research, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Joan Planas-Iglesias
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Centre for Clinical Research, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Centre for Clinical Research, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic.
| |
Collapse
|