1
|
Han Y, Deng M, Liu K, Chen J, Wang Y, Xu YN, Dian L. Computer-Aided Synthesis Planning (CASP) and Machine Learning: Optimizing Chemical Reaction Conditions. Chemistry 2024; 30:e202401626. [PMID: 39083362 DOI: 10.1002/chem.202401626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 07/27/2024] [Accepted: 07/28/2024] [Indexed: 08/02/2024]
Abstract
Computer-aided synthesis planning (CASP) has garnered increasing attention in light of recent advancements in machine learning models. While the focus is on reverse synthesis or forward outcome prediction, optimizing reaction conditions remains a significant challenge. For datasets with multiple variables, the choice of descriptors and models is pivotal. This selection dictates the effective extraction of conditional features and the achievement of higher prediction accuracy. This review delineates the origins of data in conditional optimization, the criteria for descriptor selection, the response models, and the metrics for outcome evaluation, aiming to acquaint readers with the latest research trends and facilitate more informed research in this domain.
Collapse
Affiliation(s)
- Yu Han
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Mingjing Deng
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Ke Liu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Jia Chen
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Yuting Wang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Yu-Ning Xu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Longyang Dian
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
- Suzhou Institute of Shandong University, No. 388 Ruoshui Road, Suzhou Industrial Park, Suzhou, 215123, P. R. China
| |
Collapse
|
2
|
Xu H, Zhao Y, Zhang Y, Han J, Zan P, He S, Bo X. Deep active learning with high structural discriminability for molecular mutagenicity prediction. Commun Biol 2024; 7:1071. [PMID: 39217273 PMCID: PMC11366013 DOI: 10.1038/s42003-024-06758-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 08/21/2024] [Indexed: 09/04/2024] Open
Abstract
The assessment of mutagenicity is essential in drug discovery, as it may lead to cancer and germ cells damage. Although in silico methods have been proposed for mutagenicity prediction, their performance is hindered by the scarcity of labeled molecules. However, experimental mutagenicity testing can be time-consuming and costly. One solution to reduce the annotation cost is active learning, where the algorithm actively selects the most valuable molecules from a vast chemical space and presents them to the oracle (e.g., a human expert) for annotation, thereby rapidly improving the model's predictive performance with a smaller annotation cost. In this paper, we propose muTOX-AL, a deep active learning framework, which can actively explore the chemical space and identify the most valuable molecules, resulting in competitive performance with a small number of labeled samples. The experimental results show that, compared to the random sampling strategy, muTOX-AL can reduce the number of training molecules by about 57%. Additionally, muTOX-AL exhibits outstanding molecular structural discriminability, allowing it to pick molecules with high structural similarity but opposite properties.
Collapse
Affiliation(s)
- Huiyan Xu
- Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronics Engineering and Automation, Shanghai University, Shanghai, China
- Academy of Military Medical Sciences, Beijing, China
| | - Yanpeng Zhao
- Academy of Military Medical Sciences, Beijing, China
| | - Yixin Zhang
- Academy of Military Medical Sciences, Beijing, China
| | - Junshan Han
- Academy of Military Medical Sciences, Beijing, China
| | - Peng Zan
- Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronics Engineering and Automation, Shanghai University, Shanghai, China.
| | - Song He
- Academy of Military Medical Sciences, Beijing, China.
| | - Xiaochen Bo
- Academy of Military Medical Sciences, Beijing, China.
| |
Collapse
|
3
|
Fralish Z, Reker D. Finding the most potent compounds using active learning on molecular pairs. Beilstein J Org Chem 2024; 20:2152-2162. [PMID: 39224230 PMCID: PMC11368049 DOI: 10.3762/bjoc.20.185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 08/02/2024] [Indexed: 09/04/2024] Open
Abstract
Active learning allows algorithms to steer iterative experimentation to accelerate and de-risk molecular optimizations, but actively trained models might still exhibit poor performance during early project stages where the training data is limited and model exploitation might lead to analog identification with limited scaffold diversity. Here, we present ActiveDelta, an adaptive approach that leverages paired molecular representations to predict improvements from the current best training compound to prioritize further data acquisition. We apply the ActiveDelta concept to both graph-based deep (Chemprop) and tree-based (XGBoost) models during exploitative active learning for 99 Ki benchmarking datasets. We show that both ActiveDelta implementations excel at identifying more potent inhibitors compared to the standard exploitative active learning implementations of Chemprop, XGBoost, and Random Forest. The ActiveDelta approach is also able to identify more chemically diverse inhibitors in terms of their Murcko scaffolds. Finally, deep models such as Chemprop trained on data selected through ActiveDelta approaches can more accurately identify inhibitors in test data created through simulated time-splits. Overall, this study highlights the large potential for molecular pairing approaches to further improve popular active learning strategies in low data regimes by enabling faster and more accurate identification of more diverse molecular hits against critical drug targets.
Collapse
Affiliation(s)
- Zachary Fralish
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, USA
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, USA
| |
Collapse
|
4
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
5
|
Li B, Su S, Zhu C, Lin J, Hu X, Su L, Yu Z, Liao K, Chen H. A deep learning framework for accurate reaction prediction and its application on high-throughput experimentation data. J Cheminform 2023; 15:72. [PMID: 37568183 PMCID: PMC10422736 DOI: 10.1186/s13321-023-00732-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 06/30/2023] [Indexed: 08/13/2023] Open
Abstract
In recent years, it has been seen that artificial intelligence (AI) starts to bring revolutionary changes to chemical synthesis. However, the lack of suitable ways of representing chemical reactions and the scarceness of reaction data has limited the wider application of AI to reaction prediction. Here, we introduce a novel reaction representation, GraphRXN, for reaction prediction. It utilizes a universal graph-based neural network framework to encode chemical reactions by directly taking two-dimension reaction structures as inputs. The GraphRXN model was evaluated by three publically available chemical reaction datasets and gave on-par or superior results compared with other baseline models. To further evaluate the effectiveness of GraphRXN, wet-lab experiments were carried out for the purpose of generating reaction data. GraphRXN model was then built on high-throughput experimentation data and a decent accuracy (R2 of 0.712) was obtained on our in-house data. This highlights that the GraphRXN model can be deployed in an integrated workflow which combines robotics and AI technologies for forward reaction prediction.
Collapse
Affiliation(s)
- Baiqing Li
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China
| | - Shimin Su
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China
| | - Chan Zhu
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China
| | - Jie Lin
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China
| | - Xinyue Hu
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China
| | - Lebin Su
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China
| | - Zhunzhun Yu
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China
| | - Kuangbiao Liao
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China.
| | - Hongming Chen
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong, China.
| |
Collapse
|
6
|
Liang S, Yin L, Zhang D, Su D, Qu HY. ResNet14Attention network for identifying the titration end-point of potassium dichromate. Heliyon 2023; 9:e18992. [PMID: 37609400 PMCID: PMC10440524 DOI: 10.1016/j.heliyon.2023.e18992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 07/21/2023] [Accepted: 08/04/2023] [Indexed: 08/24/2023] Open
Abstract
With the rapid development of industry, the increasing discharge of sewage causes the detection of water quality to be of increasing importance. Potassium dichromate titration is one of the most important testing methods in water quality detection; the ability to accurately identify the titration end-point of potassium dichromate is currently a research challenge. To identify titration end-point quickly and accurately, this study proposes a ResNet14Attention network, which utilizes residual modules that focus on original image information and an attention mechanism that focuses highly on classification targets. The proposed ResNet14Attention network is compared with 12 convolutional neural networks such as ResNet series networks, VGG, and GoogLeNet. The results of comparison experiments reveal that only the proposed ResNet14Attention network has the highest training and testing accuracy of 100% among all convolutional neural networks in the comparison experiment; the proposed ResNet14Attention network has the highest training speed compared to all the networks that over 90% accuracy.
Collapse
Affiliation(s)
- Siwen Liang
- Guangxi Key Laboratory of Power System Optimization and Energy Technology, Guangxi University, Nanning, Guangxi, 530004, China
| | - Linfei Yin
- Guangxi Key Laboratory of Power System Optimization and Energy Technology, Guangxi University, Nanning, Guangxi, 530004, China
| | - Dashui Zhang
- School of Chemistry and Chemical Engineering, Nanning University, Nanning, Guangxi, 530004, China
| | - Dongwei Su
- Guangxi Key Laboratory of Power System Optimization and Energy Technology, Guangxi University, Nanning, Guangxi, 530004, China
| | - Hui-Ying Qu
- School of Chemistry and Chemical Engineering, Nanning University, Nanning, Guangxi, 530004, China
| |
Collapse
|
7
|
Li SW, Xu LC, Zhang C, Zhang SQ, Hong X. Reaction performance prediction with an extrapolative and interpretable graph model based on chemical knowledge. Nat Commun 2023; 14:3569. [PMID: 37322041 DOI: 10.1038/s41467-023-39283-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 05/31/2023] [Indexed: 06/17/2023] Open
Abstract
Accurate prediction of reactivity and selectivity provides the desired guideline for synthetic development. Due to the high-dimensional relationship between molecular structure and synthetic function, it is challenging to achieve the predictive modelling of synthetic transformation with the required extrapolative ability and chemical interpretability. To meet the gap between the rich domain knowledge of chemistry and the advanced molecular graph model, herein we report a knowledge-based graph model that embeds the digitalized steric and electronic information. In addition, a molecular interaction module is developed to enable the learning of the synergistic influence of reaction components. In this study, we demonstrate that this knowledge-based graph model achieves excellent predictions of reaction yield and stereoselectivity, whose extrapolative ability is corroborated by additional scaffold-based data splittings and experimental verifications with new catalysts. Because of the embedding of local environment, the model allows the atomic level of interpretation of the steric and electronic influence on the overall synthetic performance, which serves as a useful guide for the molecular engineering towards the target synthetic function. This model offers an extrapolative and interpretable approach for reaction performance prediction, pointing out the importance of chemical knowledge-constrained reaction modelling for synthetic purpose.
Collapse
Affiliation(s)
- Shu-Wen Li
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China
| | - Li-Cheng Xu
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China
| | - Cheng Zhang
- Department of Chemistry, University of Science and Technology of China, Hefei, China
| | - Shuo-Qing Zhang
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China.
| | - Xin Hong
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China.
- Beijing National Laboratory for Molecular Sciences, Zhongguancun North First Street No. 2, Beijing, 100190, PR China.
- Key Laboratory of Precise Synthesis of Functional Molecules of Zhejiang Province, School of Science, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang Province, China.
| |
Collapse
|
8
|
Fan X, Wang Y, Yu C, Lv Y, Zhang H, Yang Q, Wen M, Lu H, Zhang Z. A Universal and Accurate Method for Easily Identifying Components in Raman Spectroscopy Based on Deep Learning. Anal Chem 2023; 95:4863-4870. [PMID: 36908216 DOI: 10.1021/acs.analchem.2c03853] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
Raman spectroscopy has been widely used to provide the structural fingerprint for molecular identification. Due to interference from coexisting components, noise, baseline, and systematic differences between spectrometers, component identification with Raman spectra is challenging, especially for mixtures. In this study, a method entitled DeepRaman has been proposed to solve those problems by combining the comparison ability of a pseudo-Siamese neural network (pSNN) and the input-shape flexibility of spatial pyramid pooling (SPP). DeepRaman was trained, validated, and tested with 41,564 augmented Raman spectra from two databases (pharmaceutical material and S.T. Japan). It can achieve 96.29% accuracy, 98.40% true positive rate (TPR), and 94.36% true negative rate (TNR) on the test set. Another six data sets measured on different instruments were used to evaluate the performance of the proposed method from different aspects. DeepRaman can provide accurate identification results and significantly outperform the hit quality index (HQI) method and other deep learning models. In addition, it performs well in cases of different spectral complexity and low-content components. Once the model is established, it can be used directly on different data sets without retraining or transfer learning. Furthermore, it also obtains promising results for the analysis of surface-enhanced Raman spectroscopy (SERS) data sets and Raman imaging data sets. In summary, it is an accurate, universal, and ready-to-use method for component identification in various application scenarios.
Collapse
Affiliation(s)
- Xiaqiong Fan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yue Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Chuanxiu Yu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yuanxia Lv
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Ming Wen
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
9
|
Ruan Y, Lin S, Mo Y. AROPS: A Framework of Automated Reaction Optimization with Parallelized Scheduling. J Chem Inf Model 2023; 63:770-781. [PMID: 36653913 DOI: 10.1021/acs.jcim.2c01168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
With the development of automated experimental platforms and optimization algorithms, chemists can easily optimize chemical reactions in an automated and high-throughput fashion. However, the modules in existing automated experimental platforms are operated in a linear fashion without orchestrating with the optimization algorithm, thus leaving room for further efficiency improvement. Here, we introduced a framework of automated reaction optimization with parallelized scheduling (AROPS) to realize the integration of the optimization algorithm and module scheduling. AROPS relies on a customized Bayesian optimizer to solve multi-reactor/analyzer reaction optimization problems with three different scheduling modes to arrange tasks for various experimental modules. In addition, a mechanism based on probability of improvement (PI) for discarding unpromising ongoing experiments was developed to facilitate freeing up valuable experimental resources in parallelized optimization. We tested the performance of AROPS using a hardware emulator on three representative benchmark reactions encountered in organic synthesis, illustrating that AROPS can trade off optimization time and cost according to the chemists' preference.
Collapse
Affiliation(s)
- Yixiang Ruan
- College of Chemical and Biological Engineering, Zhejiang University, Hangzhou310027, China.,ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou311215, China
| | - Sen Lin
- Shanghai ChemLex Technology Co., Ltd., Shanghai201210, China
| | - Yiming Mo
- College of Chemical and Biological Engineering, Zhejiang University, Hangzhou310027, China.,ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou311215, China
| |
Collapse
|
10
|
Guan X, Li Z, Zhou Y, Shao W, Zhang D. Active learning for efficient analysis of high-throughput nanopore data. Bioinformatics 2022; 39:6851141. [PMID: 36445037 PMCID: PMC9825740 DOI: 10.1093/bioinformatics/btac764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 10/25/2022] [Accepted: 11/28/2022] [Indexed: 11/30/2022] Open
Abstract
MOTIVATION As the third-generation sequencing technology, nanopore sequencing has been used for high-throughput sequencing of DNA, RNA, and even proteins. Recently, many studies have begun to use machine learning technology to analyze the enormous data generated by nanopores. Unfortunately, the success of this technology is due to the extensive labeled data, which often suffer from enormous labor costs. Therefore, there is an urgent need for a novel technology that can not only rapidly analyze nanopore data with high-throughput, but also significantly reduce the cost of labeling. To achieve the above goals, we introduce active learning to alleviate the enormous labor costs by selecting the samples that need to be labeled. This work applies several advanced active learning technologies to the nanopore data, including the RNA classification dataset (RNA-CD) and the Oxford Nanopore Technologies barcode dataset (ONT-BD). Due to the complexity of the nanopore data (with noise sequence), the bias constraint is introduced to improve the sample selection strategy in active learning. Results: The experimental results show that for the same performance metric, 50% labeling amount can achieve the best baseline performance for ONT-BD, while only 15% labeling amount can achieve the best baseline performance for RNA-CD. Crucially, the experiments show that active learning technology can assist experts in labeling samples, and significantly reduce the labeling cost. Active learning can greatly reduce the dilemma of difficult labeling of high-capacity nanopore data. We hope active learning can be applied to other problems in nanopore sequence analysis. AVAILABILITY AND IMPLEMENTATION The main program is available at https://github.com/guanxiaoyu11/AL-for-nanopore. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaoyu Guan
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing 211106, China
| | - Zhongnian Li
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing 211106, China,School of Computer Science, China University of Mining Technology, Xuzhou 221116, China
| | - Yueying Zhou
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing 211106, China
| | - Wei Shao
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing 211106, China
| | | |
Collapse
|
11
|
Angello NH, Rathore V, Beker W, Wołos A, Jira ER, Roszak R, Wu TC, Schroeder CM, Aspuru-Guzik A, Grzybowski BA, Burke MD. Closed-loop optimization of general reaction conditions for heteroaryl Suzuki-Miyaura coupling. Science 2022; 378:399-405. [DOI: 10.1126/science.adc8743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
General conditions for organic reactions are important but rare, and efforts to identify them usually consider only narrow regions of chemical space. Discovering more general reaction conditions requires considering vast regions of chemical space derived from a large matrix of substrates crossed with a high-dimensional matrix of reaction conditions, rendering exhaustive experimentation impractical. Here, we report a simple closed-loop workflow that leverages data-guided matrix down-selection, uncertainty-minimizing machine learning, and robotic experimentation to discover general reaction conditions. Application to the challenging and consequential problem of heteroaryl Suzuki-Miyaura cross-coupling identified conditions that double the average yield relative to a widely used benchmark that was previously developed using traditional approaches. This study provides a practical road map for solving multidimensional chemical optimization problems with large search spaces.
Collapse
Affiliation(s)
- Nicholas H. Angello
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Vandana Rathore
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | | | - Agnieszka Wołos
- Allchemy, Inc., Highland, IN, USA
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
| | - Edward R. Jira
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Rafał Roszak
- Allchemy, Inc., Highland, IN, USA
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
| | - Tony C. Wu
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Charles M. Schroeder
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Materials Science and Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
- Canadian Institute for Advanced Research, Toronto, ON, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada
| | - Bartosz A. Grzybowski
- Allchemy, Inc., Highland, IN, USA
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan, Republic of Korea
- Department of Chemistry, Ulsan Institute of Science and Technology, Ulsan, Republic of Korea
| | - Martin D. Burke
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Carle Illinois College of Medicine, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
12
|
Chines S, Ehrt C, Potowski M, Biesenkamp F, Grützbach L, Brunner S, van den Broek F, Bali S, Ickstadt K, Brunschweiger A. Navigating chemical reaction space - application to DNA-encoded chemistry. Chem Sci 2022; 13:11221-11231. [PMID: 36320474 PMCID: PMC9517168 DOI: 10.1039/d2sc02474h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 08/31/2022] [Indexed: 12/02/2022] Open
Abstract
Databases contain millions of reactions for compound synthesis, rendering selection of reactions for forward synthetic design of small molecule screening libraries, such as DNA-encoded libraries (DELs), a big data challenge. To support reaction space navigation, we developed the computational workflow Reaction Navigator. Reaction files from a large chemistry database were processed using the open-source KNIME Analytics Platform. Initial processing steps included a customizable filtering cascade that removed reactions with a high probability to be incompatible with DEL, as they would e.g. damage the genetic barcode, to arrive at a comprehensive list of transformations for DEL design with applicability potential. These reactions were displayed and clustered by user-defined molecular reaction descriptors which are independent of reaction core substitution patterns. Thanks to clustering, these can be searched manually to identify reactions for DEL synthesis according to desired reaction criteria, such as ring formation or sp3 content. The workflow was initially applied for mapping chemical reaction space for aromatic aldehydes as an exemplary functional group often used in DEL synthesis. Exemplary reactions have been successfully translated to DNA-tagged substrates and can be applied to library synthesis. The versatility of the Reaction Navigator was then shown by mapping reaction space for different reaction conditions, for amines as a second set of starting materials, and for data from a second database.
Collapse
Affiliation(s)
- Silvia Chines
- TU Dortmund University, Department of Chemistry and Chemical Biology Otto-Hahn-Str. 6 44227 Dortmund Germany
| | | | - Marco Potowski
- TU Dortmund University, Department of Chemistry and Chemical Biology Otto-Hahn-Str. 6 44227 Dortmund Germany
| | - Felix Biesenkamp
- TU Dortmund University, Department of Chemistry and Chemical Biology Otto-Hahn-Str. 6 44227 Dortmund Germany
| | - Lars Grützbach
- TU Dortmund University, Department of Chemistry and Chemical Biology Otto-Hahn-Str. 6 44227 Dortmund Germany
| | - Susanne Brunner
- TU Dortmund University, Department of Statistics Vogelpothsweg 87 44227 Dortmund Germany
| | | | - Shilpa Bali
- Elsevier B.V. Radarweg 29 1043 NX Amsterdam The Netherlands
| | - Katja Ickstadt
- TU Dortmund University, Department of Statistics Vogelpothsweg 87 44227 Dortmund Germany
| | - Andreas Brunschweiger
- TU Dortmund University, Department of Chemistry and Chemical Biology Otto-Hahn-Str. 6 44227 Dortmund Germany
| |
Collapse
|
13
|
Abstract
The problem of human trust is one of the most fundamental problems in applied artificial intelligence in drug discovery. In silico models have been widely used to accelerate the process of drug discovery in recent years. However, most of these models can only give reliable predictions within a limited chemical space that the training set covers (applicability domain). Predictions of samples falling outside the applicability domain are unreliable and sometimes dangerous for the drug-design decision-making process. Uncertainty quantification accordingly has drawn great attention to enable autonomous drug designing. By quantifying the confidence level of model predictions, the reliability of the predictions can be quantitatively represented to assist researchers in their molecular reasoning and experimental design. Here we summarize the state-of-the-art approaches to uncertainty quantification and underline how they can be used for drug design and discovery projects. Furthermore, we also outline four representative application scenarios of uncertainty quantification in drug discovery.
Collapse
Affiliation(s)
- Jie Yu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| |
Collapse
|