1
|
Qian W, Wang X, Kang Y, Pan P, Hou T, Hsieh CY. A general model for predicting enzyme functions based on enzymatic reactions. J Cheminform 2024; 16:38. [PMID: 38556873 PMCID: PMC10983695 DOI: 10.1186/s13321-024-00827-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 03/16/2024] [Indexed: 04/02/2024] Open
Abstract
Accurate prediction of the enzyme comission (EC) numbers for chemical reactions is essential for the understanding and manipulation of enzyme functions, biocatalytic processes and biosynthetic planning. A number of machine leanring (ML)-based models have been developed to classify enzymatic reactions, showing great advantages over costly and long-winded experimental verifications. However, the prediction accuracy for most available models trained on the records of chemical reactions without specifying the enzymatic catalysts is rather limited. In this study, we introduced BEC-Pred, a BERT-based multiclassification model, for predicting EC numbers associated with reactions. Leveraging transfer learning, our approach achieves precise forecasting across a wide variety of Enzyme Commission (EC) numbers solely through analysis of the SMILES sequences of substrates and products. BEC-Pred model outperformed other sequence and graph-based ML methods, attaining a higher accuracy of 91.6%, surpassing them by 5.5%, and exhibiting superior F1 scores with improvements of 6.6% and 6.0%, respectively. The enhanced performance highlights the potential of BEC-Pred to serve as a reliable foundational tool to accelerate the cutting-edge research in synthetic biology and drug metabolism. Moreover, we discussed a few examples on how BEC-Pred could accurately predict the enzymatic classification for the Novozym 435-induced hydrolysis and lipase efficient catalytic synthesis. We anticipate that BEC-Pred will have a positive impact on the progression of enzymatic research.
Collapse
Affiliation(s)
- Wenjia Qian
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Xiaorui Wang
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, 999078, China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Peichen Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
2
|
Tavakoli M, Miller RJ, Angel MC, Pfeiffer MA, Gutman ES, Mood AD, Van Vranken D, Baldi P. PMechDB: A Public Database of Elementary Polar Reaction Steps. J Chem Inf Model 2024; 64:1975-1983. [PMID: 38483315 DOI: 10.1021/acs.jcim.3c01810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
Most online chemical reaction databases are not publicly accessible or are fully downloadable. These databases tend to contain reactions in noncanonicalized formats and often lack comprehensive information regarding reaction pathways, intermediates, and byproducts. Within the few publicly available databases, reactions are typically stored in the form of unbalanced, overall transformations with minimal interpretability of the underlying chemistry. These limitations present significant obstacles to data-driven applications including the development of machine learning models. As an effort to overcome these challenges, we introduce PMechDB, a publicly accessible platform designed to curate, aggregate, and share polar chemical reaction data in the form of elementary reaction steps. Our initial version of PMechDB consists of over 100,000 such steps. In the PMechDB, all reactions are stored as canonicalized and balanced elementary steps, featuring accurate atom mapping and arrow-pushing mechanisms. As an online interactive database, PMechDB provides multiple interfaces that enable users to search, download, and upload chemical reactions. We anticipate that the public availability of PMechDB and its standardized data representation will prove beneficial for chemoinformatics research and education and the development of data-driven, interpretable models for predicting reactions and pathways. PMechDB platform is accessible online at https://deeprxn.ics.uci.edu/pmechdb.
Collapse
Affiliation(s)
- Mohammadamin Tavakoli
- Department of Computer Science, University of California, Irvine, Irvine, California 92697, United States
| | - Ryan J Miller
- Department of Computer Science, University of California, Irvine, Irvine, California 92697, United States
| | - Mirana Claire Angel
- Department of Computer Science, University of California, Irvine, Irvine, California 92697, United States
| | - Michael A Pfeiffer
- Department of Chemistry, University of California, Irvine, Irvine, California 92697, United States
| | - Eugene S Gutman
- Department of Chemistry, University of California, Irvine, Irvine, California 92697, United States
| | - Aaron D Mood
- Department of Chemistry, University of California, Irvine, Irvine, California 92697, United States
| | - David Van Vranken
- Department of Chemistry, University of California, Irvine, Irvine, California 92697, United States
| | - Pierre Baldi
- Department of Computer Science, University of California, Irvine, Irvine, California 92697, United States
| |
Collapse
|
3
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
4
|
He Y, Liu G, Hu S, Wang X, Jia J, Zhou H, Yan X. Implementing comprehensive machine learning models of multispecies toxicity assessment to improve regulation of organic compounds. JOURNAL OF HAZARDOUS MATERIALS 2023; 458:131942. [PMID: 37390684 DOI: 10.1016/j.jhazmat.2023.131942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 06/12/2023] [Accepted: 06/24/2023] [Indexed: 07/02/2023]
Abstract
Machine learning has made significant progress in assessing the risk associated with hazardous chemicals. However, most models were constructed by randomly selecting one algorithm and one toxicity endpoint towards single species, which may cause biased regulation of chemicals. In the present study, we implemented comprehensive prediction models involving multiple advanced machine learning and end-to-end deep learning to assess the aquatic toxicity of chemicals. The generated optimal models accurately unravel the quantitative structure-toxicity relationships, with the correlation coefficients of all training sets from 0.59 to 0.81 and of the test sets from 0.56 to 0.83. For each chemical, its ecological risk was determined from the toxicity information towards multiple species. The results also revealed the toxicity mechanism of chemicals was species sensitivity, and the high-level organisms were faced with more serious side effects from hazardous substances. The proposed approach was finally applied to screen over 16,000 compounds and identify high-risk chemicals. We believe that the current approach can provide a useful tool for predicting the toxicity of diverse organic chemicals and help regulatory authorities make more reasonable decisions.
Collapse
Affiliation(s)
- Ying He
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Guohong Liu
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China; School of Agriculture and Biological Sciences, Qiannan Normal University for Nationalities, Duyun 558000, China
| | - Song Hu
- School of Environmental Science and Engineering, Shandong University, Qingdao 266237, China
| | - Xiaohong Wang
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Jianbo Jia
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Hongyu Zhou
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China.
| | - Xiliang Yan
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China; School of Agriculture and Biological Sciences, Qiannan Normal University for Nationalities, Duyun 558000, China.
| |
Collapse
|
5
|
Abstract
Cyclopropanes that carry an electron-accepting group react as electrophiles in polar, ring-opening reactions. Analogous reactions at cyclopropanes with additional C2 substituents allow one to access difunctionalized products. Consequently, functionalized cyclopropanes are frequently used building blocks in organic synthesis. The polarization of the C1-C2 bond in 1-acceptor-2-donor-substituted cyclopropanes not only favorably enhances reactivity toward nucleophiles but also directs the nucleophilic attack toward the already substituted C2 position. Monitoring the kinetics of non-catalytic ring-opening reactions with a series of thiophenolates and other strong nucleophiles, such as azide ions, in DMSO provided the inherent SN2 reactivity of electrophilic cyclopropanes. The experimentally determined second-order rate constants k 2 for cyclopropane ring-opening reactions were then compared to those of related Michael additions. Interestingly, cyclopropanes with aryl substituents at the C2 position reacted faster than their unsubstituted analogues. Variation of the electronic properties of the aryl groups at C2 gave rise to parabolic Hammett relationships.
Collapse
Affiliation(s)
- Andreas Eitzinger
- Department Chemie, Ludwig-Maximilians-Universität München, Butenandtstr. 5–13, 81377München, Germany
| | - Armin R. Ofial
- Department Chemie, Ludwig-Maximilians-Universität München, Butenandtstr. 5–13, 81377München, Germany
| |
Collapse
|
6
|
Li L, Mayer RJ, Ofial AR, Mayr H. One-Bond-Nucleophilicity and -Electrophilicity Parameters: An Efficient Ordering System for 1,3-Dipolar Cycloadditions. J Am Chem Soc 2023; 145:7416-7434. [PMID: 36952671 DOI: 10.1021/jacs.2c13872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Diazoalkanes are ambiphilic 1,3-dipoles that undergo fast Huisgen cycloadditions with both electron-rich and electron-poor dipolarophiles but react slowly with alkenes of low polarity. Frontier molecular orbital (FMO) theory considering the 3-center-4-electron π-system of the propargyl fragment of diazoalkanes is commonly applied to rationalize these reactivity trends. However, we recently found that a change in the mechanism from cycloadditions to azo couplings takes place due to the existence of a previously overlooked lower-lying unoccupied molecular orbital. We now propose an alternative approach to analyze 1,3-dipolar cycloaddition reactions, which relies on the linear free energy relationship lg k2(20 °C) = sN(N + E) (eq 1) with two solvent-dependent parameters (N, sN) to characterize nucleophiles and one parameter (E) for electrophiles. Rate constants for the cycloadditions of diazoalkanes with dipolarophiles were measured and compared with those calculated for the formation of zwitterions by eq 1. The difference between experimental and predicted Gibbs energies of activation is interpreted as the energy of concert, i.e., the stabilization of the transition states by the concerted formation of two new bonds. By linking the plot of lg k2 vs N for nucleophilic dipolarophiles with that of lg k2 vs E for electrophilic dipolarophiles, one obtains V-shaped plots which provide absolute rate constants for the stepwise reactions on the borderlines. These plots furthermore predict relative reactivities of dipolarophiles in concerted, highly asynchronous cycloadditions more precisely than the classical correlations of rate constants with FMO energies or ionization potentials. DFT calculations using the SMD solvent model confirm these interpretations.
Collapse
Affiliation(s)
- Le Li
- Department Chemie, Ludwig-Maximilians-Universität München, Butenandtstr. 5-13, 81377 München, Germany
| | - Robert J Mayer
- CNRS, ISIS, Université de Strasbourg, 8 Allee Gaspard Monge, 67000 Strasbourg, France
| | - Armin R Ofial
- Department Chemie, Ludwig-Maximilians-Universität München, Butenandtstr. 5-13, 81377 München, Germany
| | - Herbert Mayr
- Department Chemie, Ludwig-Maximilians-Universität München, Butenandtstr. 5-13, 81377 München, Germany
| |
Collapse
|
7
|
Tavakoli M, Chiu YTT, Baldi P, Carlton AM, Van Vranken D. RMechDB: A Public Database of Elementary Radical Reaction Steps. J Chem Inf Model 2023; 63:1114-1123. [PMID: 36799778 PMCID: PMC9976277 DOI: 10.1021/acs.jcim.2c01359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
We introduce RMechDB, an open-access platform for aggregating, curating, and distributing reliable data about elementary radical reaction steps for computational radical reaction modeling and prediction. RMechDB contains over 5,300 elementary radical reaction steps, each with a single transition state at or around room temperature. These elementary step reactions are manually curated plausible arrow-pushing steps for organic radical reactions. The steps were taken from a variety of sources. Over 2,000 mechanistic steps were extracted from textbooks and/or constructed from research publications. Another 3,000 were taken from gas-phase atmospheric reactions of isoprene and other organic molecules on the MCM (Master Chemical Mechanism) Web site. Reactions are encoded in the SMIRKS format with accurate atom mapping and annotations for arrow-pushing mechanisms. At its core, RMechDB consists of a database schema with an online interactive search interface and a request portal for downloading the raw form of elementary step reactions with their metadata. It also offers an interface for submitting new reactions to RMechDB and expanding the data set through community contributions. Although there are several applications for RMechDB, it is primarily designed as a central platform of radical elementary steps with a unified and structured representation. We believe that this open access to this data and platform enables the extension of data-driven models for chemical reaction predictions and other chemoinformatics predictive tasks.
Collapse
Affiliation(s)
- Mohammadamin Tavakoli
- Department
of Computer Science, University of California,
Irvine, Irvine, California 92697, United States
| | - Yin Ting T. Chiu
- Department
of Chemistry, University of California,
Irvine, Irvine, California 92697, United States
| | - Pierre Baldi
- Department
of Computer Science, University of California,
Irvine, Irvine, California 92697, United States,E-mail:
| | - Ann Marie Carlton
- Department
of Chemistry, University of California,
Irvine, Irvine, California 92697, United States
| | - David Van Vranken
- Department
of Chemistry, University of California,
Irvine, Irvine, California 92697, United States,E-mail:
| |
Collapse
|
8
|
Abstract
Reactivity scales are useful research tools for chemists, both experimental and computational. However, to determine the reactivity of a single molecule, multiple measurements need to be carried out, which is a time-consuming and resource-intensive task. In this Tutorial Review, we present alternative approaches for the efficient generation of quantitative structure-reactivity relationships that are based on quantum chemistry, supervised learning, and uncertainty quantification. First published in 2002, we observe a tendency for these relationships to become not only more predictive but also more interpretable over time.
Collapse
Affiliation(s)
- Maike Vahl
- Institute of Physical and Theoretical Chemistry, Technische Universität Braunschweig, Gaußstraße 17, 38106 Braunschweig, Germany.
| | - Jonny Proppe
- Institute of Physical and Theoretical Chemistry, Technische Universität Braunschweig, Gaußstraße 17, 38106 Braunschweig, Germany.
| |
Collapse
|
9
|
McAulay K, Bilsland A, Bon M. Reactivity of Covalent Fragments and Their Role in Fragment Based Drug Discovery. Pharmaceuticals (Basel) 2022; 15:1366. [PMID: 36355538 PMCID: PMC9694498 DOI: 10.3390/ph15111366] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/30/2022] [Accepted: 11/04/2022] [Indexed: 09/27/2023] Open
Abstract
Fragment based drug discovery has long been used for the identification of new ligands and interest in targeted covalent inhibitors has continued to grow in recent years, with high profile drugs such as osimertinib and sotorasib gaining FDA approval. It is therefore unsurprising that covalent fragment-based approaches have become popular and have recently led to the identification of novel targets and binding sites, as well as ligands for targets previously thought to be 'undruggable'. Understanding the properties of such covalent fragments is important, and characterizing and/or predicting reactivity can be highly useful. This review aims to discuss the requirements for an electrophilic fragment library and the importance of differing warhead reactivity. Successful case studies from the world of drug discovery are then be examined.
Collapse
Affiliation(s)
- Kirsten McAulay
- Cancer Research Horizons—Therapeutic Innovation, Cancer Research UK Beatson Institute, Garscube Estate, Switchback Road, Glasgow G61 1BD, UK
- Centre for Targeted Protein Degradation, University of Dundee, Nethergate, Dundee DD1 4HN, UK
| | - Alan Bilsland
- Cancer Research Horizons—Therapeutic Innovation, Cancer Research UK Beatson Institute, Garscube Estate, Switchback Road, Glasgow G61 1BD, UK
| | - Marta Bon
- Cancer Research Horizons—Therapeutic Innovation, Cancer Research UK Beatson Institute, Garscube Estate, Switchback Road, Glasgow G61 1BD, UK
- Exscientia, The Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, UK
| |
Collapse
|
10
|
Liu T, Chu X, Fan D, Ma Z, Dai Y, Zhu Z, Wang Y, Gao J. Intelligent prediction model of ammonia solubility in designable green solvents based on microstructure group contribution. Mol Phys 2022. [DOI: 10.1080/00268976.2022.2124203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Affiliation(s)
- Tianxiong Liu
- College of Chemical Engineering, Qingdao University of Science and Technology, Qingdao, People’s Republic of China
| | - Xiaojun Chu
- College of Chemical Engineering, Qingdao University of Science and Technology, Qingdao, People’s Republic of China
| | - Dingchao Fan
- College of Chemical Engineering, Qingdao University of Science and Technology, Qingdao, People’s Republic of China
| | - Zhaoyuan Ma
- College of Chemical Engineering, Qingdao University of Science and Technology, Qingdao, People’s Republic of China
| | - Yasen Dai
- College of Chemical Engineering, Qingdao University of Science and Technology, Qingdao, People’s Republic of China
| | - Zhaoyou Zhu
- College of Chemical Engineering, Qingdao University of Science and Technology, Qingdao, People’s Republic of China
| | - Yinglong Wang
- College of Chemical Engineering, Qingdao University of Science and Technology, Qingdao, People’s Republic of China
| | - Jun Gao
- College of Chemical and Environmental Engineering, Shandong University of Science and Technology, Qingdao, People’s Republic of China
| |
Collapse
|
11
|
Rarey M, Nicklaus MC, Warr W. Special Issue on Reaction Informatics and Chemical Space. J Chem Inf Model 2022; 62:2009-2010. [DOI: 10.1021/acs.jcim.2c00390] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Affiliation(s)
- Matthias Rarey
- Universität Hamburg, ZBH − Center for Bioinformatics, 20146 Hamburg, Germany
| | - Marc C. Nicklaus
- NCI, NIH, CADD Group, NCI-Frederick, Frederick, Maryland 21702, United States
| | - Wendy Warr
- Wendy Warr & Associates, Cheshire CW4 7HZ, U.K
| |
Collapse
|