1
|
Shee Y, Li H, Zhang P, Nikolic AM, Lu W, Kelly HR, Manee V, Sreekumar S, Buono FG, Song JJ, Newhouse TR, Batista VS. Site-specific template generative approach for retrosynthetic planning. Nat Commun 2024; 15:7818. [PMID: 39251606 PMCID: PMC11385523 DOI: 10.1038/s41467-024-52048-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 08/26/2024] [Indexed: 09/11/2024] Open
Abstract
Retrosynthesis, the strategy of devising laboratory pathways by working backwards from the target compound, is crucial yet challenging. Enhancing retrosynthetic efficiency requires overcoming the vast complexity of chemical space, the limited known interconversions between molecules, and the challenges posed by limited experimental datasets. This study introduces generative machine learning methods for retrosynthetic planning. The approach features three innovations: generating reaction templates instead of reactants or synthons to create novel chemical transformations, allowing user selection of specific bonds to change for human-influenced synthesis, and employing a conditional kernel-elastic autoencoder (CKAE) to measure the similarity between generated and known reactions for chemical viability insights. These features form a coherent retrosynthetic framework, validated experimentally by designing a 3-step synthetic pathway for a challenging small molecule, demonstrating a significant improvement over previous 5-9 step approaches. This work highlights the utility and robustness of generative machine learning in addressing complex challenges in chemical synthesis.
Collapse
Affiliation(s)
- Yu Shee
- Department of Chemistry, Yale University, New Haven, CT, USA
| | - Haote Li
- Department of Chemistry, Yale University, New Haven, CT, USA
| | - Pengpeng Zhang
- Department of Chemistry, Yale University, New Haven, CT, USA
| | | | - Wenxin Lu
- Department of Chemistry, Yale University, New Haven, CT, USA
| | - H Ray Kelly
- Chemical Development, Boehringer Ingelheim Pharmaceuticals Inc, Ridgefield, CT, USA
| | - Vidhyadhar Manee
- Chemical Development, Boehringer Ingelheim Pharmaceuticals Inc, Ridgefield, CT, USA
| | - Sanil Sreekumar
- Chemical Development, Boehringer Ingelheim Pharmaceuticals Inc, Ridgefield, CT, USA
| | - Frederic G Buono
- Chemical Development, Boehringer Ingelheim Pharmaceuticals Inc, Ridgefield, CT, USA
| | - Jinhua J Song
- Chemical Development, Boehringer Ingelheim Pharmaceuticals Inc, Ridgefield, CT, USA
| | | | | |
Collapse
|
2
|
Phan TL, Weinbauer K, Gärtner T, Merkle D, Andersen JL, Fagerberg R, Stadler PF. Reaction rebalancing: a novel approach to curating reaction databases. J Cheminform 2024; 16:82. [PMID: 39030583 PMCID: PMC11264917 DOI: 10.1186/s13321-024-00875-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 06/24/2024] [Indexed: 07/21/2024] Open
Abstract
PURPOSE Reaction databases are a key resource for a wide variety of applications in computational chemistry and biochemistry, including Computer-aided Synthesis Planning (CASP) and the large-scale analysis of metabolic networks. The full potential of these resources can only be realized if datasets are accurate and complete. Missing co-reactants and co-products, i.e., unbalanced reactions, however, are the rule rather than the exception. The curation and correction of such incomplete entries is thus an urgent need. METHODS The SynRBL framework addresses this issue with a dual-strategy: a rule-based method for non-carbon compounds, using atomic symbols and counts for prediction, alongside a Maximum Common Subgraph (MCS)-based technique for carbon compounds, aimed at aligning reactants and products to infer missing entities. RESULTS The rule-based method exceeded 99% accuracy, while MCS-based accuracy varied from 81.19 to 99.33%, depending on reaction properties. Furthermore, an applicability domain and a machine learning scoring function were devised to quantify prediction confidence. The overall efficacy of this framework was delineated through its success rate and accuracy metrics, which spanned from 89.83 to 99.75% and 90.85 to 99.05%, respectively. CONCLUSION The SynRBL framework offers a novel solution for recalibrating chemical reactions, significantly enhancing reaction completeness. With rigorous validation, it achieved groundbreaking accuracy in reaction rebalancing. This sets the stage for future improvement in particular of atom-atom mapping techniques as well as of downstream tasks such as automated synthesis planning. SCIENTIFIC CONTRIBUTION SynRBL features a novel computational approach to correcting unbalanced entries in chemical reaction databases. By combining heuristic rules for inferring non-carbon compounds and common subgraph searches to address carbon unbalance, SynRBL successfully addresses most instances of this problem, which affects the majority of data in most large-scale resources. Compared to alternative solutions, SynRBL achieves a dramatic increase in both success rate and accurary, and provides the first freely available open source solution for this problem.
Collapse
Affiliation(s)
- Tieu-Long Phan
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics and School for Embedded and Composite Artificial Intelligence (SECAI), Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230, Odense M, Denmark.
| | - Klaus Weinbauer
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics and School for Embedded and Composite Artificial Intelligence (SECAI), Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany
- Machine Learning Research Unit, TU Wien Informatics, Erzherzog-Johann-Platz 1 (FB02), A-1040, Wien, Austria
| | - Thomas Gärtner
- Machine Learning Research Unit, TU Wien Informatics, Erzherzog-Johann-Platz 1 (FB02), A-1040, Wien, Austria
| | - Daniel Merkle
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230, Odense M, Denmark
- Faculty of Technology, Bielefeld University, Postfach 100131, 33501, Bielefeld, Germany
| | - Jakob L Andersen
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230, Odense M, Denmark
| | - Rolf Fagerberg
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230, Odense M, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics and School for Embedded and Composite Artificial Intelligence (SECAI), Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany
- Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090, Wien, Austria
- Facultad de Ciencias, Universidad National de Colombia, Bogotá, Colombia
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870, Frederiksberg, Denmark
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA
| |
Collapse
|
3
|
Raghavan P, Rago AJ, Verma P, Hassan MM, Goshu GM, Dombrowski AW, Pandey A, Coley CW, Wang Y. Incorporating Synthetic Accessibility in Drug Design: Predicting Reaction Yields of Suzuki Cross-Couplings by Leveraging AbbVie's 15-Year Parallel Library Data Set. J Am Chem Soc 2024; 146:15070-15084. [PMID: 38768950 PMCID: PMC11157529 DOI: 10.1021/jacs.4c00098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 04/24/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024]
Abstract
Despite the increased use of computational tools to supplement medicinal chemists' expertise and intuition in drug design, predicting synthetic yields in medicinal chemistry endeavors remains an unsolved challenge. Existing design workflows could profoundly benefit from reaction yield prediction, as precious material waste could be reduced, and a greater number of relevant compounds could be delivered to advance the design, make, test, analyze (DMTA) cycle. In this work, we detail the evaluation of AbbVie's medicinal chemistry library data set to build machine learning models for the prediction of Suzuki coupling reaction yields. The combination of density functional theory (DFT)-derived features and Morgan fingerprints was identified to perform better than one-hot encoded baseline modeling, furnishing encouraging results. Overall, we observe modest generalization to unseen reactant structures within the 15-year retrospective library data set. Additionally, we compare predictions made by the model to those made by expert medicinal chemists, finding that the model can often predict both reaction success and reaction yields with greater accuracy. Finally, we demonstrate the application of this approach to suggest structurally and electronically similar building blocks to replace those predicted or observed to be unsuccessful prior to or after synthesis, respectively. The yield prediction model was used to select similar monomers predicted to have higher yields, resulting in greater synthesis efficiency of relevant drug-like molecules.
Collapse
Affiliation(s)
- Priyanka Raghavan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, 77 Massachusetts Ave, Cambridge, Massachusetts 02139, United States
| | - Alexander J. Rago
- Advanced
Chemistry Technologies Group, AbbVie, Inc., 1 N Waukegan Rd, North Chicago, Illinois 60064, United States
| | - Pritha Verma
- Advanced
Chemistry Technologies Group, AbbVie, Inc., 1 N Waukegan Rd, North Chicago, Illinois 60064, United States
| | - Majdi M. Hassan
- RAIDERS
Group, AbbVie, Inc., 1 N Waukegan Rd, North Chicago, Illinois 60064, United States
| | - Gashaw M. Goshu
- Advanced
Chemistry Technologies Group, AbbVie, Inc., 1 N Waukegan Rd, North Chicago, Illinois 60064, United States
| | - Amanda W. Dombrowski
- Advanced
Chemistry Technologies Group, AbbVie, Inc., 1 N Waukegan Rd, North Chicago, Illinois 60064, United States
| | - Abhishek Pandey
- RAIDERS
Group, AbbVie, Inc., 1 N Waukegan Rd, North Chicago, Illinois 60064, United States
| | - Connor W. Coley
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, 77 Massachusetts Ave, Cambridge, Massachusetts 02139, United States
| | - Ying Wang
- Advanced
Chemistry Technologies Group, AbbVie, Inc., 1 N Waukegan Rd, North Chicago, Illinois 60064, United States
| |
Collapse
|
4
|
Strieth-Kalthoff F, Szymkuć S, Molga K, Aspuru-Guzik A, Glorius F, Grzybowski BA. Artificial Intelligence for Retrosynthetic Planning Needs Both Data and Expert Knowledge. J Am Chem Soc 2024. [PMID: 38598363 DOI: 10.1021/jacs.4c00338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Rapid advancements in artificial intelligence (AI) have enabled breakthroughs across many scientific disciplines. In organic chemistry, the challenge of planning complex multistep chemical syntheses should conceptually be well-suited for AI. Yet, the development of AI synthesis planners trained solely on reaction-example-data has stagnated and is not on par with the performance of "hybrid" algorithms combining AI with expert knowledge. This Perspective examines possible causes of these shortcomings, extending beyond the established reasoning of insufficient quantities of reaction data. Drawing attention to the intricacies and data biases that are specific to the domain of synthetic chemistry, we advocate augmenting the unique capabilities of AI with the knowledge base and the reasoning strategies of domain experts. By actively involving synthetic chemists, who are the end users of any synthesis planning software, into the development process, we envision to bridge the gap between computer algorithms and the intricate nature of chemical synthesis.
Collapse
Affiliation(s)
- Felix Strieth-Kalthoff
- University of Toronto, Department of Chemistry and Department of Computer Science, 80 St. George St., Toronto, Ontario M5S 3H6, Canada
- University of Toronto, Department of Computer Science, 10 King's College Road, Toronto, Ontario M5S 3G4, Canada
| | - Sara Szymkuć
- Allchemy, 2145 45th Street #201, Highland, Indiana 46322, United States
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
| | - Karol Molga
- Allchemy, 2145 45th Street #201, Highland, Indiana 46322, United States
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
| | - Alán Aspuru-Guzik
- University of Toronto, Department of Chemistry and Department of Computer Science, 80 St. George St., Toronto, Ontario M5S 3H6, Canada
- University of Toronto, Department of Computer Science, 10 King's College Road, Toronto, Ontario M5S 3G4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave., Toronto, Ontario M5G 1M1, Canada
- University of Toronto, Department of Chemical Engineering and Applied Chemistry, 200 College St., Toronto, Ontario M5S 3E5, Canada
- University of Toronto, Department of Materials Science and Engineering, 184 College St., Toronto, Ontario M5S 3E4, Canada
| | - Frank Glorius
- Universität Münster, Organisch-Chemisches Institut, Corrensstr. 36, 48149 Münster, Germany
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
- IBS Center for Algorithmic and Robotized Synthesis, CARS, UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan 689-798, South Korea
- Department of Chemistry, UNIST, 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan 689-798, South Korea
| |
Collapse
|
5
|
Saebi M, Nan B, Herr JE, Wahlers J, Guo Z, Zurański AM, Kogej T, Norrby PO, Doyle AG, Chawla NV, Wiest O. On the use of real-world datasets for reaction yield prediction. Chem Sci 2023; 14:4997-5005. [PMID: 37206399 PMCID: PMC10189898 DOI: 10.1039/d2sc06041h] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 03/09/2023] [Indexed: 09/30/2023] Open
Abstract
The lack of publicly available, large, and unbiased datasets is a key bottleneck for the application of machine learning (ML) methods in synthetic chemistry. Data from electronic laboratory notebooks (ELNs) could provide less biased, large datasets, but no such datasets have been made publicly available. The first real-world dataset from the ELNs of a large pharmaceutical company is disclosed and its relationship to high-throughput experimentation (HTE) datasets is described. For chemical yield predictions, a key task in chemical synthesis, an attributed graph neural network (AGNN) performs as well as or better than the best previous models on two HTE datasets for the Suzuki-Miyaura and Buchwald-Hartwig reactions. However, training the AGNN on an ELN dataset does not lead to a predictive model. The implications of using ELN data for training ML-based models are discussed in the context of yield predictions.
Collapse
Affiliation(s)
- Mandana Saebi
- Department of Computer Science and Engineering and Lucy Family Institute for Data and Society, University of Notre Dame Notre Dame IN 46556 USA
| | - Bozhao Nan
- Department of Chemistry and Biochemistry, University of Notre Dame Notre Dame IN 46556 USA
| | - John E Herr
- Department of Chemistry and Biochemistry, University of Notre Dame Notre Dame IN 46556 USA
| | - Jessica Wahlers
- Department of Chemistry and Biochemistry, University of Notre Dame Notre Dame IN 46556 USA
| | - Zhichun Guo
- Department of Computer Science and Engineering and Lucy Family Institute for Data and Society, University of Notre Dame Notre Dame IN 46556 USA
| | - Andrzej M Zurański
- Department of Chemistry, Princeton University Princeton New Jersey 08544 USA
| | - Thierry Kogej
- Molecular AI, Discovery Sciences, R&D, AstraZeneca Pepparedsleden 1, SE-431 83 Mölndal Gothenburg Sweden
| | - Per-Ola Norrby
- Data Science and Modelling, Pharmaceutical Sciences, R&D, AstraZeneca Pepparedsleden 1, SE-431 83 Mölndal Gothenburg Sweden
| | - Abigail G Doyle
- Department of Chemistry, Princeton University Princeton New Jersey 08544 USA
- Department of Chemistry and Biochemistry, University of California Los Angeles California 90095 USA
| | - Nitesh V Chawla
- Department of Computer Science and Engineering and Lucy Family Institute for Data and Society, University of Notre Dame Notre Dame IN 46556 USA
| | - Olaf Wiest
- Department of Chemistry and Biochemistry, University of Notre Dame Notre Dame IN 46556 USA
| |
Collapse
|
6
|
Chen Y, Ou Y, Zheng P, Huang Y, Ge F, Dral PO. Benchmark of general-purpose machine learning-based quantum mechanical method AIQM1 on reaction barrier heights. J Chem Phys 2023; 158:074103. [PMID: 36813722 DOI: 10.1063/5.0137101] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Artificial intelligence-enhanced quantum mechanical method 1 (AIQM1) is a general-purpose method that was shown to achieve high accuracy for many applications with a speed close to its baseline semiempirical quantum mechanical (SQM) method ODM2*. Here, we evaluate the hitherto unknown performance of out-of-the-box AIQM1 without any refitting for reaction barrier heights on eight datasets, including a total of ∼24 thousand reactions. This evaluation shows that AIQM1's accuracy strongly depends on the type of transition state and ranges from excellent for rotation barriers to poor for, e.g., pericyclic reactions. AIQM1 clearly outperforms its baseline ODM2* method and, even more so, a popular universal potential, ANI-1ccx. Overall, however, AIQM1 accuracy largely remains similar to SQM methods (and B3LYP/6-31G* for most reaction types) suggesting that it is desirable to focus on improving AIQM1 performance for barrier heights in the future. We also show that the built-in uncertainty quantification helps in identifying confident predictions. The accuracy of confident AIQM1 predictions is approaching the level of popular density functional theory methods for most reaction types. Encouragingly, AIQM1 is rather robust for transition state optimizations, even for the type of reactions it struggles with the most. Single-point calculations with high-level methods on AIQM1-optimized geometries can be used to significantly improve barrier heights, which cannot be said for its baseline ODM2* method.
Collapse
Affiliation(s)
- Yuxinxin Chen
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yanchi Ou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yaohuang Huang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
7
|
Khashei M, Nazgouei E, Bakhtiarvand N. Intelligent Discrete Deep Learning Based Classification Methodology in Chemometrics. J Chem Inf Model 2023; 63:1935-1946. [PMID: 36763004 DOI: 10.1021/acs.jcim.2c01535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
In recent years, deep learning models have attracted much attention for classification purposes in chemometrics. The popularity of deep learning models in this field comes from their unique features like universal approximation capability with the desired accuracy. Deep learning classifiers use several intelligent processing layers to model mixed, complex, and nonlinear patterns in the underlying data sets, which is why the development of deep learning based models has never been stopped in the chemometrics literature. Despite the variety of deep learning classification models used in this field, they all use a continuous distance-based cost function in their learning processes. Although using a continuous cost function for learning deep classifiers is a common approach, it conflicts with the discrete nature of the classification problem. In fact, applying a continuous cost function for inherently discrete classification problems can reduce the performance of the classification. In this research, a novel discrete learning based classification approach is proposed and implemented on a deep feed-forward neural network as one of the most commonly used deep learning models to develop a different learning process for deep classification models. The basis of the proposed learning approach is maximizing a discrete matching function of the actual and fitted values instead of minimizing a continuous distance-based cost function. The proposed classification approach is evaluated on five benchmark data sets in the chemistry field. The empirical results indicated the superiority of the proposed discrete deep learning approach over its classic continuous form. The results of this study demonstrate the important effect of discrete learning processes on the performances of deep learning classification models. Therefore, the proposed methodology can be a powerful alternative to common classification approaches to analyze chemical data in the chemometrics field.
Collapse
Affiliation(s)
- Mehdi Khashei
- Department of Industrial and Systems Engineering, Isfahan University of Technology (IUT), Isfahan 84156-83111, Iran
| | - Erfan Nazgouei
- Department of Industrial and Systems Engineering, Isfahan University of Technology (IUT), Isfahan 84156-83111, Iran
| | - Negar Bakhtiarvand
- Department of Industrial and Systems Engineering, Isfahan University of Technology (IUT), Isfahan 84156-83111, Iran
| |
Collapse
|
8
|
Tu Z, Stuyver T, Coley CW. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem Sci 2023; 14:226-244. [PMID: 36743887 PMCID: PMC9811563 DOI: 10.1039/d2sc05089g] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open
Abstract
The field of predictive chemistry relates to the development of models able to describe how molecules interact and react. It encompasses the long-standing task of computer-aided retrosynthesis, but is far more reaching and ambitious in its goals. In this review, we summarize several areas where predictive chemistry models hold the potential to accelerate the deployment, development, and discovery of organic reactions and advance synthetic chemistry.
Collapse
Affiliation(s)
- Zhengkai Tu
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| |
Collapse
|
9
|
Recent advances and challenges in experiment-oriented polymer informatics. Polym J 2022. [DOI: 10.1038/s41428-022-00734-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
10
|
Chines S, Ehrt C, Potowski M, Biesenkamp F, Grützbach L, Brunner S, van den Broek F, Bali S, Ickstadt K, Brunschweiger A. Navigating chemical reaction space - application to DNA-encoded chemistry. Chem Sci 2022; 13:11221-11231. [PMID: 36320474 PMCID: PMC9517168 DOI: 10.1039/d2sc02474h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 08/31/2022] [Indexed: 12/02/2022] Open
Abstract
Databases contain millions of reactions for compound synthesis, rendering selection of reactions for forward synthetic design of small molecule screening libraries, such as DNA-encoded libraries (DELs), a big data challenge. To support reaction space navigation, we developed the computational workflow Reaction Navigator. Reaction files from a large chemistry database were processed using the open-source KNIME Analytics Platform. Initial processing steps included a customizable filtering cascade that removed reactions with a high probability to be incompatible with DEL, as they would e.g. damage the genetic barcode, to arrive at a comprehensive list of transformations for DEL design with applicability potential. These reactions were displayed and clustered by user-defined molecular reaction descriptors which are independent of reaction core substitution patterns. Thanks to clustering, these can be searched manually to identify reactions for DEL synthesis according to desired reaction criteria, such as ring formation or sp3 content. The workflow was initially applied for mapping chemical reaction space for aromatic aldehydes as an exemplary functional group often used in DEL synthesis. Exemplary reactions have been successfully translated to DNA-tagged substrates and can be applied to library synthesis. The versatility of the Reaction Navigator was then shown by mapping reaction space for different reaction conditions, for amines as a second set of starting materials, and for data from a second database.
Collapse
Affiliation(s)
- Silvia Chines
- TU Dortmund University, Department of Chemistry and Chemical Biology Otto-Hahn-Str. 6 44227 Dortmund Germany
| | | | - Marco Potowski
- TU Dortmund University, Department of Chemistry and Chemical Biology Otto-Hahn-Str. 6 44227 Dortmund Germany
| | - Felix Biesenkamp
- TU Dortmund University, Department of Chemistry and Chemical Biology Otto-Hahn-Str. 6 44227 Dortmund Germany
| | - Lars Grützbach
- TU Dortmund University, Department of Chemistry and Chemical Biology Otto-Hahn-Str. 6 44227 Dortmund Germany
| | - Susanne Brunner
- TU Dortmund University, Department of Statistics Vogelpothsweg 87 44227 Dortmund Germany
| | | | - Shilpa Bali
- Elsevier B.V. Radarweg 29 1043 NX Amsterdam The Netherlands
| | - Katja Ickstadt
- TU Dortmund University, Department of Statistics Vogelpothsweg 87 44227 Dortmund Germany
| | - Andreas Brunschweiger
- TU Dortmund University, Department of Chemistry and Chemical Biology Otto-Hahn-Str. 6 44227 Dortmund Germany
| |
Collapse
|
11
|
Pereira A, Albornoz C, Trofymchuk OS. Data-Driven Analysis of Reactions Catalyzed by [CoCp*(CO)I 2]. Organometallics 2022. [DOI: 10.1021/acs.organomet.2c00051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Affiliation(s)
- Alfredo Pereira
- Facultad de Ciencias Químicas y Farmacéuticas, Departamento de Química Orgánica y Fisicoquímica, Universidad de Chile, Sergio Livingstone 1007, Casilla 233, Santiago, Metropolitan Region 8380492, Chile
| | - Camilo Albornoz
- C. Albornoz, Instituto de Química de Recursos Naturales, Universidad de Talca, Talca, Maule Region 3460000, Chile
| | - Oleksandra S. Trofymchuk
- Facultad de Ciencias Químicas y Farmacéuticas, Departamento de Química Orgánica y Fisicoquímica, Universidad de Chile, Sergio Livingstone 1007, Casilla 233, Santiago, Metropolitan Region 8380492, Chile
| |
Collapse
|
12
|
Probst D, Schwaller P, Reymond JL. Reaction classification and yield prediction using the differential reaction fingerprint DRFP. DIGITAL DISCOVERY 2022; 1:91-97. [PMID: 35515081 PMCID: PMC8996827 DOI: 10.1039/d1dd00006c] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 01/12/2022] [Indexed: 01/19/2023]
Abstract
Predicting the nature and outcome of reactions using computational methods is a crucial tool to accelerate chemical research. The recent application of deep learning-based learned fingerprints to reaction classification and reaction yield prediction has shown an impressive increase in performance compared to previous methods such as DFT- and structure-based fingerprints. However, learned fingerprints require large training data sets, are inherently biased, and are based on complex deep learning architectures. Here we present the differential reaction fingerprint DRFP. The DRFP algorithm takes a reaction SMILES as an input and creates a binary fingerprint based on the symmetric difference of two sets containing the circular molecular n-grams generated from the molecules listed left and right from the reaction arrow, respectively, without the need for distinguishing between reactants and reagents. We show that DRFP performs better than DFT-based fingerprints in reaction yield prediction and other structure-based fingerprints in reaction classification, reaching the performance of state-of-the-art learned fingerprints in both tasks while being data-independent. Differential Reaction Fingerprint DRFP is a chemical reaction fingerprint enabling simple machine learning models running on standard hardware to reach DFT- and deep learning-based accuracies in reaction yield prediction and reaction classification.![]()
Collapse
Affiliation(s)
- Daniel Probst
- Department of Chemistry and Biochemistry, University of Bern Freiestrasse 3 3012 Bern Switzerland
| | | | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern Freiestrasse 3 3012 Bern Switzerland
| |
Collapse
|
13
|
Das M, Sharma P, Sunoj RB. Machine learning studies on asymmetric relay Heck reaction—Potential avenues for reaction development. J Chem Phys 2022; 156:114303. [DOI: 10.1063/5.0084432] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The integration of machine learning (ML) methods into chemical catalysis is evolving as a new paradigm for cost and time economic reaction development in recent times. Although there have been several successful applications of ML in catalysis, the prediction of enantioselectivity ( ee) remains challenging. Herein, we describe a ML workflow to predict ee of an important class of catalytic asymmetric transformation, namely, the relay Heck (RH) reaction. A random forest ML model, built using quantum chemically derived mechanistically relevant physical organic descriptors as features, is found to predict the ee remarkably well with a low root mean square error of 8.0 ± 1.3. Importantly, the model is effective in predicting the unseen variants of an asymmetric RH reaction. Furthermore, we predicted the ee for thousands of unexplored complementary reactions, including those leading to a good number of bioactive frameworks, by engaging different combinations of catalysts and substrates drawn from the original dataset. Our ML model developed on the available examples would be able to assist in exploiting the fuller potential of asymmetric RH reactions through a priori predictions before the actual experimentation, which would thus help surpass the trial and error loop to a larger degree.
Collapse
Affiliation(s)
- Manajit Das
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Pooja Sharma
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Raghavan B. Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| |
Collapse
|
14
|
Wen M, Blau SM, Xie X, Dwaraknath S, Persson KA. Improving machine learning performance on small chemical reaction data with unsupervised contrastive pretraining. Chem Sci 2022; 13:1446-1458. [PMID: 35222929 PMCID: PMC8809395 DOI: 10.1039/d1sc06515g] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 01/09/2022] [Indexed: 11/21/2022] Open
Abstract
Machine learning (ML) methods have great potential to transform chemical discovery by accelerating the exploration of chemical space and drawing scientific insights from data. However, modern chemical reaction ML models, such as those based on graph neural networks (GNNs), must be trained on a large amount of labelled data in order to avoid overfitting the data and thus possessing low accuracy and transferability. In this work, we propose a strategy to leverage unlabelled data to learn accurate ML models for small labelled chemical reaction data. We focus on an old and prominent problem-classifying reactions into distinct families-and build a GNN model for this task. We first pretrain the model on unlabelled reaction data using unsupervised contrastive learning and then fine-tune it on a small number of labelled reactions. The contrastive pretraining learns by making the representations of two augmented versions of a reaction similar to each other but distinct from other reactions. We propose chemically consistent reaction augmentation methods that protect the reaction center and find they are the key for the model to extract relevant information from unlabelled data to aid the reaction classification task. The transfer learned model outperforms a supervised model trained from scratch by a large margin. Further, it consistently performs better than models based on traditional rule-driven reaction fingerprints, which have long been the default choice for small datasets, as well as those based on reaction fingerprints derived from masked language modelling. In addition to reaction classification, the effectiveness of the strategy is tested on regression datasets; the learned GNN-based reaction fingerprints can also be used to navigate the chemical reaction space, which we demonstrate by querying for similar reactions. The strategy can be readily applied to other predictive reaction problems to uncover the power of unlabelled data for learning better models with a limited supply of labels.
Collapse
Affiliation(s)
- Mingjian Wen
- Energy Technologies Area, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Samuel M Blau
- Energy Technologies Area, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Xiaowei Xie
- College of Chemistry, University of California Berkeley CA 94720 USA
- Materials Science Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | | | - Kristin A Persson
- Department of Materials Science and Engineering, University of California Berkeley CA 94720 USA
- Molecular Foundry, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| |
Collapse
|
15
|
Komp E, Janulaitis N, Valleau S. Progress towards machine learning reaction rate constants. Phys Chem Chem Phys 2021; 24:2692-2705. [PMID: 34935798 DOI: 10.1039/d1cp04422b] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Quantum and classical reaction rate constant calculations come at the cost of exploring potential energy surfaces. Due to the "curse of dimensionality", their evaluation quickly becomes unfeasible as the system size grows. Machine learning algorithms can accelerate the calculation of reaction rate constants by predicting them using low cost input features. In this perspective, we briefly introduce supervised machine learning algorithms in the context of reaction rate constant prediction. We discuss existing and recently created kinetic datasets and input feature representations as well as the use and design of machine learning algorithms to predict reaction rate constants or quantities required for their computation. Amongst these, we first describe the use of machine learning to predict activation, reaction, solvation and dissociation energies. We then look at the use of machine learning to predict reactive force field parameters, reaction rate constants as well as to help accelerate the search for minimum energy paths. Lastly, we provide an outlook on areas which have yet to be explored so as to improve and evaluate the use of machine learning algorithms for chemical reaction rate constants.
Collapse
Affiliation(s)
- Evan Komp
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA.
| | - Nida Janulaitis
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA.
| | - Stéphanie Valleau
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA.
| |
Collapse
|
16
|
Machine learning modelling of chemical reaction characteristics: yesterday, today, tomorrow. MENDELEEV COMMUNICATIONS 2021. [DOI: 10.1016/j.mencom.2021.11.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
17
|
Gevorgyan A, Hopmann KH, Bayer A. Improved Buchwald–Hartwig Amination by the Use of Lipids and Lipid Impurities. Organometallics 2021. [DOI: 10.1021/acs.organomet.1c00517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Ashot Gevorgyan
- Department of Chemistry, UiT The Arctic University of Norway, 9037 Tromsø, Norway
| | - Kathrin H. Hopmann
- Department of Chemistry, UiT The Arctic University of Norway, 9037 Tromsø, Norway
| | - Annette Bayer
- Department of Chemistry, UiT The Arctic University of Norway, 9037 Tromsø, Norway
| |
Collapse
|
18
|
Gimadiev TR, Lin A, Afonina VA, Batyrshin D, Nugmanov RI, Akhmetshin T, Sidorov P, Duybankova N, Verhoeven J, Wegner J, Ceulemans H, Gedich A, Madzhidov TI, Varnek A. Reaction Data Curation I: Chemical Structures and Transformations Standardization. Mol Inform 2021; 40:e2100119. [PMID: 34427989 DOI: 10.1002/minf.202100119] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 08/13/2021] [Indexed: 12/11/2022]
Abstract
The quality of experimental data for chemical reactions is a critical consideration for any reaction-driven study. However, the curation of reaction data has not been extensively discussed in the literature so far. Here, we suggest a 4 steps protocol that includes the curation of individual structures (reactants and products), chemical transformations, reaction conditions and endpoints. Its implementation in Python3 using CGRTools toolkit has been used to clean three popular reaction databases Reaxys, USPTO and Pistachio. The curated USPTO database is available in the GitHub repository (Laboratoire-de-Chemoinformatique/Reaction_Data_Cleaning).
Collapse
Affiliation(s)
- Timur R Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan
| | - Arkadii Lin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France
| | - Valentina A Afonina
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Dinar Batyrshin
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Ramil I Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Tagir Akhmetshin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France.,Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan
| | | | - Jonas Verhoeven
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Joerg Wegner
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Hugo Ceulemans
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Andrey Gedich
- Arcadia Inc., Bol'shoy Sampsoniyevskiy Prospekt, 28 κopпyc 2, 194044, St Petersburg, Russia
| | - Timur I Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan.,Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France
| |
Collapse
|
19
|
Abstract
As more data are introduced in the building of models of chemical reactivity, the mechanistic component can be reduced until 'big data' applications are reached. These methods no longer depend on underlying mechanistic hypotheses, potentially learning them implicitly through extensive data training. Reactivity models often focus on reaction barriers, but can also be trained to directly predict lab-relevant properties, such as yields or conditions. Calculations with a quantum-mechanical component are still preferred for quantitative predictions of reactivity. Although big data applications tend to be more qualitative, they have the advantage to be broadly applied to different kinds of reactions. There is a continuum of methods in between these extremes, such as methods that use quantum-derived data or descriptors in machine learning models. Here, we present an overview of the recent machine learning applications in the field of chemical reactivity from a mechanistic perspective. Starting with a summary of how reactivity questions are addressed by quantum-mechanical methods, we discuss methods that augment or replace quantum-based modelling with faster alternatives relying on machine learning.
Collapse
|
20
|
Eyke NS, Koscher BA, Jensen KF. Toward Machine Learning-Enhanced High-Throughput Experimentation. TRENDS IN CHEMISTRY 2021. [DOI: 10.1016/j.trechm.2020.12.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
21
|
Schwaller P, Probst D, Vaucher AC, Nair VH, Kreutter D, Laino T, Reymond JL. Mapping the space of chemical reactions using attention-based neural networks. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-020-00284-w] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
22
|
Thakkar A, Johansson S, Jorner K, Buttar D, Reymond JL, Engkvist O. Artificial intelligence and automation in computer aided synthesis planning. REACT CHEM ENG 2021. [DOI: 10.1039/d0re00340a] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In this perspective we deal with questions pertaining to the development of synthesis planning technologies over the course of recent years.
Collapse
Affiliation(s)
- Amol Thakkar
- Hit Discovery
- Discovery Sciences
- R&D
- AstraZeneca
- Gothenburg
| | | | - Kjell Jorner
- Early Chemical Development
- Pharmaceutical Sciences
- R&D
- AstraZeneca
- Macclesfield
| | - David Buttar
- Early Chemical Development
- Pharmaceutical Sciences
- R&D
- AstraZeneca
- Macclesfield
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry
- University of Bern
- 3012 Bern
- Switzerland
| | - Ola Engkvist
- Hit Discovery
- Discovery Sciences
- R&D
- AstraZeneca
- Gothenburg
| |
Collapse
|
23
|
David L, Thakkar A, Mercado R, Engkvist O. Molecular representations in AI-driven drug discovery: a review and practical guide. J Cheminform 2020; 12:56. [PMID: 33431035 PMCID: PMC7495975 DOI: 10.1186/s13321-020-00460-5] [Citation(s) in RCA: 165] [Impact Index Per Article: 41.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Accepted: 09/05/2020] [Indexed: 02/08/2023] Open
Abstract
The technological advances of the past century, marked by the computer revolution and the advent of high-throughput screening technologies in drug discovery, opened the path to the computational analysis and visualization of bioactive molecules. For this purpose, it became necessary to represent molecules in a syntax that would be readable by computers and understandable by scientists of various fields. A large number of chemical representations have been developed over the years, their numerosity being due to the fast development of computers and the complexity of producing a representation that encompasses all structural and chemical characteristics. We present here some of the most popular electronic molecular and macromolecular representations used in drug discovery, many of which are based on graph representations. Furthermore, we describe applications of these representations in AI-driven drug discovery. Our aim is to provide a brief guide on structural representations that are essential to the practice of AI in drug discovery. This review serves as a guide for researchers who have little experience with the handling of chemical representations and plan to work on applications at the interface of these fields.
Collapse
Affiliation(s)
- Laurianne David
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden.
| | - Amol Thakkar
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
- Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland
| | - Rocío Mercado
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
| |
Collapse
|
24
|
Ghiandoni GM, Bodkin MJ, Chen B, Hristozov D, Wallace JEA, Webster J, Gillet VJ. Enhancing reaction-based de novo design using a multi-label reaction class recommender. J Comput Aided Mol Des 2020; 34:783-803. [PMID: 32112286 PMCID: PMC7293200 DOI: 10.1007/s10822-020-00300-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 02/13/2020] [Indexed: 12/31/2022]
Abstract
Reaction-based de novo design refers to the in-silico generation of novel chemical structures by combining reagents using structural transformations derived from known reactions. The driver for using reaction-based transformations is to increase the likelihood of the designed molecules being synthetically accessible. We have previously described a reaction-based de novo design method based on reaction vectors which are transformation rules that are encoded automatically from reaction databases. A limitation of reaction vectors is that they account for structural changes that occur at the core of a reaction only, and they do not consider the presence of competing functionalities that can compromise the reaction outcome. Here, we present the development of a Reaction Class Recommender to enhance the reaction vector framework. The recommender is intended to be used as a filter on the reaction vectors that are applied during de novo design to reduce the combinatorial explosion of in-silico molecules produced while limiting the generated structures to those which are most likely to be synthesisable. The recommender has been validated using an external data set extracted from the recent medicinal chemistry literature and in two simulated de novo design experiments. Results suggest that the use of the recommender drastically reduces the number of solutions explored by the algorithm while preserving the chance of finding relevant solutions and increasing the global synthetic accessibility of the designed molecules.
Collapse
Affiliation(s)
- Gian Marco Ghiandoni
- Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK
| | - Michael J Bodkin
- Evotec (U.K.) Ltd, 114 Innovation Drive, Milton Park, Abingdon, OX14 4RZ, UK
| | - Beining Chen
- Chemistry Department, University of Sheffield, Dainton Building, Brook Hill, Sheffield, S3 7HF, UK
| | - Dimitar Hristozov
- Evotec (U.K.) Ltd, 114 Innovation Drive, Milton Park, Abingdon, OX14 4RZ, UK
| | - James E A Wallace
- Evotec (U.K.) Ltd, 114 Innovation Drive, Milton Park, Abingdon, OX14 4RZ, UK
| | - James Webster
- Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK
| | - Valerie J Gillet
- Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK.
| |
Collapse
|