1
|
Chen LY, Li YP. Machine learning-guided strategies for reaction conditions design and optimization. Beilstein J Org Chem 2024; 20:2476-2492. [PMID: 39376489 PMCID: PMC11457048 DOI: 10.3762/bjoc.20.212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 09/19/2024] [Indexed: 10/09/2024] Open
Abstract
This review surveys the recent advances and challenges in predicting and optimizing reaction conditions using machine learning techniques. The paper emphasizes the importance of acquiring and processing large and diverse datasets of chemical reactions, and the use of both global and local models to guide the design of synthetic processes. Global models exploit the information from comprehensive databases to suggest general reaction conditions for new reactions, while local models fine-tune the specific parameters for a given reaction family to improve yield and selectivity. The paper also identifies the current limitations and opportunities in this field, such as the data quality and availability, and the integration of high-throughput experimentation. The paper demonstrates how the combination of chemical engineering, data science, and ML algorithms can enhance the efficiency and effectiveness of reaction conditions design, and enable novel discoveries in synthetic chemistry.
Collapse
Affiliation(s)
- Lung-Yi Chen
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
| | - Yi-Pei Li
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
- Taiwan International Graduate Program on Sustainable Chemical Science and Technology (TIGP-SCST), No. 128, Sec. 2, Academia Road, Taipei 11529, Taiwan
| |
Collapse
|
2
|
Toniato A, Vaucher AC, Lehmann MM, Luksch T, Schwaller P, Stenta M, Laino T. Fast Customization of Chemical Language Models to Out-of-Distribution Data Sets. CHEMISTRY OF MATERIALS : A PUBLICATION OF THE AMERICAN CHEMICAL SOCIETY 2023; 35:8806-8815. [PMID: 38027545 PMCID: PMC10653079 DOI: 10.1021/acs.chemmater.3c01406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/09/2023] [Accepted: 10/09/2023] [Indexed: 12/01/2023]
Abstract
The world is on the verge of a new industrial revolution, and language models are poised to play a pivotal role in this transformative era. Their ability to offer intelligent insights and forecasts has made them a valuable asset for businesses seeking a competitive advantage. The chemical industry, in particular, can benefit significantly from harnessing their power. Since 2016 already, language models have been applied to tasks such as predicting reaction outcomes or retrosynthetic routes. While such models have demonstrated impressive abilities, the lack of publicly available data sets with universal coverage is often the limiting factor for achieving even higher accuracies. This makes it imperative for organizations to incorporate proprietary data sets into their model training processes to improve their performance. So far, however, these data sets frequently remain untapped as there are no established criteria for model customization. In this work, we report a successful methodology for retraining language models on reaction outcome prediction and single-step retrosynthesis tasks, using proprietary, nonpublic data sets. We report a considerable boost in accuracy by combining patent and proprietary data in a multidomain learning formulation. This exercise, inspired by a real-world use case, enables us to formulate guidelines that can be adopted in different corporate settings to customize chemical language models easily.
Collapse
Affiliation(s)
- Alessandra Toniato
- IBM
Research Europe, Rüschlikon 8803, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| | - Alain C. Vaucher
- IBM
Research Europe, Rüschlikon 8803, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| | | | | | - Philippe Schwaller
- IBM
Research Europe, Rüschlikon 8803, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| | - Marco Stenta
- Syngenta
Crop Protection AG, Stein 4332, Switzerland
| | - Teodoro Laino
- IBM
Research Europe, Rüschlikon 8803, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| |
Collapse
|
3
|
Huang M, Zhu K, Wang Y, Lou C, Sun H, Li W, Tang Y, Liu G. In Silico Prediction of Metabolic Reaction Catalyzed by Human Aldehyde Oxidase. Metabolites 2023; 13:metabo13030449. [PMID: 36984889 PMCID: PMC10059660 DOI: 10.3390/metabo13030449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 03/17/2023] [Accepted: 03/17/2023] [Indexed: 03/30/2023] Open
Abstract
Aldehyde oxidase (AOX) plays an important role in drug metabolism. Human AOX (hAOX) is widely distributed in the body, and there are some differences between species. Currently, animal models cannot accurately predict the metabolism of hAOX. Therefore, more and more in silico models have been constructed for the prediction of the hAOX metabolism. These models are based on molecular docking and quantum chemistry theory, which are time-consuming and difficult to automate. Therefore, in this study, we compared traditional machine learning methods, graph convolutional neural network methods, and sequence-based methods with limited data, and proposed a ligand-based model for the metabolism prediction catalyzed by hAOX. Compared with the published models, our model achieved better performance (ACC = 0.91, F1 = 0.77). What's more, we built a web server to predict the sites of metabolism (SOMs) for hAOX. In summary, this study provides a convenient and automatable model and builds a web server named Meta-hAOX for accelerating the drug design and optimization stage.
Collapse
Affiliation(s)
- Mengting Huang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Keyun Zhu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Yimeng Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Chaofeng Lou
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Huimin Sun
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
4
|
Wang X, Yao C, Zhang Y, Yu J, Qiao H, Zhang C, Wu Y, Bai R, Duan H. From theory to experiment: transformer-based generation enables rapid discovery of novel reactions. J Cheminform 2022; 14:60. [PMID: 36056425 PMCID: PMC9438336 DOI: 10.1186/s13321-022-00638-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Accepted: 08/11/2022] [Indexed: 11/10/2022] Open
Abstract
Deep learning methods, such as reaction prediction and retrosynthesis analysis, have demonstrated their significance in the chemical field. However, the de novo generation of novel reactions using artificial intelligence technology requires further exploration. Inspired by molecular generation, we proposed a novel task of reaction generation. Herein, Heck reactions were applied to train the transformer model, a state-of-art natural language process model, to generate 4717 reactions after sampling and processing. Then, 2253 novel Heck reactions were confirmed by organizing chemists to judge the generated reactions. More importantly, further organic synthesis experiments were performed to verify the accuracy and feasibility of representative reactions. The total process, from Heck reaction generation to experimental verification, required only 15 days, demonstrating that our model has well-learned reaction rules in-depth and can contribute to novel reaction discovery and chemical space exploration.
Collapse
Affiliation(s)
- Xinqiao Wang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
| | - Chuansheng Yao
- College of Pharmacy, School of Medicine, Hangzhou Normal University, Hangzhou, People's Republic of China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou, People's Republic of China
| | - Yun Zhang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
| | - Jiahui Yu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
| | - Haoran Qiao
- College of Mathematics and Physics, Shanghai University of Electric Power, Shanghai, 201203, People's Republic of China
| | - Chengyun Zhang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
| | - Yejian Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
| | - Renren Bai
- College of Pharmacy, School of Medicine, Hangzhou Normal University, Hangzhou, People's Republic of China.
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou, People's Republic of China.
| | - Hongliang Duan
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China.
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica (SIMM), Chinese Academy of Sciences, Shanghai, 201203, China.
| |
Collapse
|
5
|
Venkatasubramanian V, Mann V. Artificial intelligence in reaction prediction and chemical synthesis. Curr Opin Chem Eng 2022. [DOI: 10.1016/j.coche.2021.100749] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
6
|
Xu J, Zhang Y, Han J, Su A, Qiao H, Zhang C, Tang J, Shen X, Sun B, Yu W, Zhai S, Wang X, Wu Y, Su W, Duan H. Providing direction for mechanistic inferences in radical cascade cyclization using Transformer model. Org Chem Front 2022. [DOI: 10.1039/d2qo00188h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Even in modern organic chemistry, predicting or proposing a reaction mechanism and speculating on reaction intermediates remains challenging. For example, it is challenging to predict the regioselectivity of radical attraction...
Collapse
|