Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ghiandoni GM, Bodkin MJ, Chen B, Hristozov D, Wallace JEA, Webster J, Gillet VJ. Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature. J Chem Inf Model 2019;59:4167-4187. [PMID: 31529948 DOI: 10.1021/acs.jcim.9b00537] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

For:	Ghiandoni GM, Bodkin MJ, Chen B, Hristozov D, Wallace JEA, Webster J, Gillet VJ. Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature. J Chem Inf Model 2019;59:4167-4187. [PMID: 31529948 DOI: 10.1021/acs.jcim.9b00537] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Number

Cited by Other Article(s)

Shee Y, Li H, Zhang P, Nikolic AM, Lu W, Kelly HR, Manee V, Sreekumar S, Buono FG, Song JJ, Newhouse TR, Batista VS. Site-specific template generative approach for retrosynthetic planning. Nat Commun 2024;15:7818. [PMID: 39251606 PMCID: PMC11385523 DOI: 10.1038/s41467-024-52048-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 08/26/2024] [Indexed: 09/11/2024] Open

Phan TL, Weinbauer K, Gärtner T, Merkle D, Andersen JL, Fagerberg R, Stadler PF. Reaction rebalancing: a novel approach to curating reaction databases. J Cheminform 2024;16:82. [PMID: 39030583 PMCID: PMC11264917 DOI: 10.1186/s13321-024-00875-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 06/24/2024] [Indexed: 07/21/2024] Open

Abstract

PURPOSE

Reaction databases are a key resource for a wide variety of applications in computational chemistry and biochemistry, including Computer-aided Synthesis Planning (CASP) and the large-scale analysis of metabolic networks. The full potential of these resources can only be realized if datasets are accurate and complete. Missing co-reactants and co-products, i.e., unbalanced reactions, however, are the rule rather than the exception. The curation and correction of such incomplete entries is thus an urgent need.

METHODS

The SynRBL framework addresses this issue with a dual-strategy: a rule-based method for non-carbon compounds, using atomic symbols and counts for prediction, alongside a Maximum Common Subgraph (MCS)-based technique for carbon compounds, aimed at aligning reactants and products to infer missing entities.

RESULTS

The rule-based method exceeded 99% accuracy, while MCS-based accuracy varied from 81.19 to 99.33%, depending on reaction properties. Furthermore, an applicability domain and a machine learning scoring function were devised to quantify prediction confidence. The overall efficacy of this framework was delineated through its success rate and accuracy metrics, which spanned from 89.83 to 99.75% and 90.85 to 99.05%, respectively.

CONCLUSION

The SynRBL framework offers a novel solution for recalibrating chemical reactions, significantly enhancing reaction completeness. With rigorous validation, it achieved groundbreaking accuracy in reaction rebalancing. This sets the stage for future improvement in particular of atom-atom mapping techniques as well as of downstream tasks such as automated synthesis planning.

SCIENTIFIC CONTRIBUTION

SynRBL features a novel computational approach to correcting unbalanced entries in chemical reaction databases. By combining heuristic rules for inferring non-carbon compounds and common subgraph searches to address carbon unbalance, SynRBL successfully addresses most instances of this problem, which affects the majority of data in most large-scale resources. Compared to alternative solutions, SynRBL achieves a dramatic increase in both success rate and accurary, and provides the first freely available open source solution for this problem.

Collapse

Affiliation(s)

Tieu-Long Phan Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics and School for Embedded and Composite Artificial Intelligence (SECAI), Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany. Department of Mathematics and Computer Science, University of Southern Denmark, 5230, Odense M, Denmark.
Klaus Weinbauer Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics and School for Embedded and Composite Artificial Intelligence (SECAI), Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany Machine Learning Research Unit, TU Wien Informatics, Erzherzog-Johann-Platz 1 (FB02), A-1040, Wien, Austria
Thomas Gärtner Machine Learning Research Unit, TU Wien Informatics, Erzherzog-Johann-Platz 1 (FB02), A-1040, Wien, Austria
Daniel Merkle Department of Mathematics and Computer Science, University of Southern Denmark, 5230, Odense M, Denmark Faculty of Technology, Bielefeld University, Postfach 100131, 33501, Bielefeld, Germany
Jakob L Andersen Department of Mathematics and Computer Science, University of Southern Denmark, 5230, Odense M, Denmark
Rolf Fagerberg Department of Mathematics and Computer Science, University of Southern Denmark, 5230, Odense M, Denmark
Peter F Stadler Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics and School for Embedded and Composite Artificial Intelligence (SECAI), Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090, Wien, Austria Facultad de Ciencias, Universidad National de Colombia, Bogotá, Colombia Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870, Frederiksberg, Denmark Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA

Collapse

Raghavan P, Rago AJ, Verma P, Hassan MM, Goshu GM, Dombrowski AW, Pandey A, Coley CW, Wang Y. Incorporating Synthetic Accessibility in Drug Design: Predicting Reaction Yields of Suzuki Cross-Couplings by Leveraging AbbVie's 15-Year Parallel Library Data Set. J Am Chem Soc 2024;146:15070-15084. [PMID: 38768950 PMCID: PMC11157529 DOI: 10.1021/jacs.4c00098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 04/24/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024]

Strieth-Kalthoff F, Szymkuć S, Molga K, Aspuru-Guzik A, Glorius F, Grzybowski BA. Artificial Intelligence for Retrosynthetic Planning Needs Both Data and Expert Knowledge. J Am Chem Soc 2024. [PMID: 38598363 DOI: 10.1021/jacs.4c00338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]

Saebi M, Nan B, Herr JE, Wahlers J, Guo Z, Zurański AM, Kogej T, Norrby PO, Doyle AG, Chawla NV, Wiest O. On the use of real-world datasets for reaction yield prediction. Chem Sci 2023;14:4997-5005. [PMID: 37206399 PMCID: PMC10189898 DOI: 10.1039/d2sc06041h] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 03/09/2023] [Indexed: 09/30/2023] Open

Chen Y, Ou Y, Zheng P, Huang Y, Ge F, Dral PO. Benchmark of general-purpose machine learning-based quantum mechanical method AIQM1 on reaction barrier heights. J Chem Phys 2023;158:074103. [PMID: 36813722 DOI: 10.1063/5.0137101] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open

Khashei M, Nazgouei E, Bakhtiarvand N. Intelligent Discrete Deep Learning Based Classification Methodology in Chemometrics. J Chem Inf Model 2023;63:1935-1946. [PMID: 36763004 DOI: 10.1021/acs.jcim.2c01535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]

Abstract

In recent years, deep learning models have attracted much attention for classification purposes in chemometrics. The popularity of deep learning models in this field comes from their unique features like universal approximation capability with the desired accuracy. Deep learning classifiers use several intelligent processing layers to model mixed, complex, and nonlinear patterns in the underlying data sets, which is why the development of deep learning based models has never been stopped in the chemometrics literature. Despite the variety of deep learning classification models used in this field, they all use a continuous distance-based cost function in their learning processes. Although using a continuous cost function for learning deep classifiers is a common approach, it conflicts with the discrete nature of the classification problem. In fact, applying a continuous cost function for inherently discrete classification problems can reduce the performance of the classification. In this research, a novel discrete learning based classification approach is proposed and implemented on a deep feed-forward neural network as one of the most commonly used deep learning models to develop a different learning process for deep classification models. The basis of the proposed learning approach is maximizing a discrete matching function of the actual and fitted values instead of minimizing a continuous distance-based cost function. The proposed classification approach is evaluated on five benchmark data sets in the chemistry field. The empirical results indicated the superiority of the proposed discrete deep learning approach over its classic continuous form. The results of this study demonstrate the important effect of discrete learning processes on the performances of deep learning classification models. Therefore, the proposed methodology can be a powerful alternative to common classification approaches to analyze chemical data in the chemometrics field.

Collapse

Tu Z, Stuyver T, Coley CW. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem Sci 2023;14:226-244. [PMID: 36743887 PMCID: PMC9811563 DOI: 10.1039/d2sc05089g] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open

Recent advances and challenges in experiment-oriented polymer informatics. Polym J 2022. [DOI: 10.1038/s41428-022-00734-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]

Chines S, Ehrt C, Potowski M, Biesenkamp F, Grützbach L, Brunner S, van den Broek F, Bali S, Ickstadt K, Brunschweiger A. Navigating chemical reaction space - application to DNA-encoded chemistry. Chem Sci 2022;13:11221-11231. [PMID: 36320474 PMCID: PMC9517168 DOI: 10.1039/d2sc02474h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 08/31/2022] [Indexed: 12/02/2022] Open

Pereira A, Albornoz C, Trofymchuk OS. Data-Driven Analysis of Reactions Catalyzed by [CoCp*(CO)I₂]. Organometallics 2022. [DOI: 10.1021/acs.organomet.2c00051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]

Probst D, Schwaller P, Reymond JL. Reaction classification and yield prediction using the differential reaction fingerprint DRFP. DIGITAL DISCOVERY 2022;1:91-97. [PMID: 35515081 PMCID: PMC8996827 DOI: 10.1039/d1dd00006c] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 01/12/2022] [Indexed: 01/19/2023]

Das M, Sharma P, Sunoj RB. Machine learning studies on asymmetric relay Heck reaction—Potential avenues for reaction development. J Chem Phys 2022;156:114303. [DOI: 10.1063/5.0084432] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Wen M, Blau SM, Xie X, Dwaraknath S, Persson KA. Improving machine learning performance on small chemical reaction data with unsupervised contrastive pretraining. Chem Sci 2022;13:1446-1458. [PMID: 35222929 PMCID: PMC8809395 DOI: 10.1039/d1sc06515g] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 01/09/2022] [Indexed: 11/21/2022] Open

Abstract

Machine learning (ML) methods have great potential to transform chemical discovery by accelerating the exploration of chemical space and drawing scientific insights from data. However, modern chemical reaction ML models, such as those based on graph neural networks (GNNs), must be trained on a large amount of labelled data in order to avoid overfitting the data and thus possessing low accuracy and transferability. In this work, we propose a strategy to leverage unlabelled data to learn accurate ML models for small labelled chemical reaction data. We focus on an old and prominent problem-classifying reactions into distinct families-and build a GNN model for this task. We first pretrain the model on unlabelled reaction data using unsupervised contrastive learning and then fine-tune it on a small number of labelled reactions. The contrastive pretraining learns by making the representations of two augmented versions of a reaction similar to each other but distinct from other reactions. We propose chemically consistent reaction augmentation methods that protect the reaction center and find they are the key for the model to extract relevant information from unlabelled data to aid the reaction classification task. The transfer learned model outperforms a supervised model trained from scratch by a large margin. Further, it consistently performs better than models based on traditional rule-driven reaction fingerprints, which have long been the default choice for small datasets, as well as those based on reaction fingerprints derived from masked language modelling. In addition to reaction classification, the effectiveness of the strategy is tested on regression datasets; the learned GNN-based reaction fingerprints can also be used to navigate the chemical reaction space, which we demonstrate by querying for similar reactions. The strategy can be readily applied to other predictive reaction problems to uncover the power of unlabelled data for learning better models with a limited supply of labels.

Collapse

Komp E, Janulaitis N, Valleau S. Progress towards machine learning reaction rate constants. Phys Chem Chem Phys 2021;24:2692-2705. [PMID: 34935798 DOI: 10.1039/d1cp04422b] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

Machine learning modelling of chemical reaction characteristics: yesterday, today, tomorrow. MENDELEEV COMMUNICATIONS 2021. [DOI: 10.1016/j.mencom.2021.11.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Gevorgyan A, Hopmann KH, Bayer A. Improved Buchwald–Hartwig Amination by the Use of Lipids and Lipid Impurities. Organometallics 2021. [DOI: 10.1021/acs.organomet.1c00517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Gimadiev TR, Lin A, Afonina VA, Batyrshin D, Nugmanov RI, Akhmetshin T, Sidorov P, Duybankova N, Verhoeven J, Wegner J, Ceulemans H, Gedich A, Madzhidov TI, Varnek A. Reaction Data Curation I: Chemical Structures and Transformations Standardization. Mol Inform 2021;40:e2100119. [PMID: 34427989 DOI: 10.1002/minf.202100119] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 08/13/2021] [Indexed: 12/11/2022]

Affiliation(s)

Timur R Gimadiev Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan
Arkadii Lin Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France
Valentina A Afonina Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
Dinar Batyrshin Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
Ramil I Nugmanov Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
Tagir Akhmetshin Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France.,Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
Pavel Sidorov Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan
Natalia Duybankova Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
Jonas Verhoeven Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
Joerg Wegner Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
Hugo Ceulemans Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
Andrey Gedich Arcadia Inc., Bol'shoy Sampsoniyevskiy Prospekt, 28 κopпyc 2, 194044, St Petersburg, Russia
Timur I Madzhidov Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
Alexandre Varnek Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan.,Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France

Collapse

Organic reactivity from mechanism to machine learning. Nat Rev Chem 2021;5:240-255. [PMID: 37117288 DOI: 10.1038/s41570-021-00260-x] [Citation(s) in RCA: 54] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/10/2021] [Indexed: 12/25/2022]

Eyke NS, Koscher BA, Jensen KF. Toward Machine Learning-Enhanced High-Throughput Experimentation. TRENDS IN CHEMISTRY 2021. [DOI: 10.1016/j.trechm.2020.12.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Schwaller P, Probst D, Vaucher AC, Nair VH, Kreutter D, Laino T, Reymond JL. Mapping the space of chemical reactions using attention-based neural networks. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-020-00284-w] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Thakkar A, Johansson S, Jorner K, Buttar D, Reymond JL, Engkvist O. Artificial intelligence and automation in computer aided synthesis planning. REACT CHEM ENG 2021. [DOI: 10.1039/d0re00340a] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

David L, Thakkar A, Mercado R, Engkvist O. Molecular representations in AI-driven drug discovery: a review and practical guide. J Cheminform 2020;12:56. [PMID: 33431035 PMCID: PMC7495975 DOI: 10.1186/s13321-020-00460-5] [Citation(s) in RCA: 165] [Impact Index Per Article: 41.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Accepted: 09/05/2020] [Indexed: 02/08/2023] Open

Ghiandoni GM, Bodkin MJ, Chen B, Hristozov D, Wallace JEA, Webster J, Gillet VJ. Enhancing reaction-based de novo design using a multi-label reaction class recommender. J Comput Aided Mol Des 2020;34:783-803. [PMID: 32112286 PMCID: PMC7293200 DOI: 10.1007/s10822-020-00300-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 02/13/2020] [Indexed: 12/31/2022]