1
|
Zhu B, Li Z, Jin Z, Zhong Y, Lv T, Ge Z, Li H, Wang T, Lin Y, Liu H, Ma T, Wang S, Liao J, Fan X. Knowledge-based in silico fragmentation and annotation of mass spectra for natural products with MassKG. Comput Struct Biotechnol J 2024; 23:3327-3341. [PMID: 39310281 PMCID: PMC11415640 DOI: 10.1016/j.csbj.2024.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 09/04/2024] [Accepted: 09/04/2024] [Indexed: 09/25/2024] Open
Abstract
Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) is a potent analytical technique utilized for identifying natural products from complex sources. However, due to the structural diversity, annotating LC-MS/MS data of natural products efficiently remains challenging, hindering the discovery process of novel active structures. Here, we introduce MassKG, an algorithm that combines a knowledge-based fragmentation strategy and a deep learning-based molecule generation model to aid in rapid dereplication and the discovery of novel NP structures. Specifically, MassKG has compiled 407,720 known NP structures and, based on this, generated 266,353 new structures using chemical language models for the discovery of potential novel compounds. Furthermore, MassKG demonstrates exceptional performance in spectra annotation compared to state-of-the-art algorithms. To enhance usability, MassKG has been implemented as a web server for annotating tandem mass spectral data (MS/MS, MS2) with a user-friendly interface, automatic reporting, and fragment tree visualization. Lastly, the interpretive capability of MassKG is comprehensively validated through composition analysis and MS annotation of Panax notoginseng, Ginkgo biloba, Codonopsis pilosula, and Astragalus membranaceus. MassKG is now accessible at https://xomics.com.cn/masskg.
Collapse
Affiliation(s)
- Bingjie Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Zhenhao Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Zhang Boli Intelligent Health Innovation Lab, Hangzhou 311121, China
| | - Zehua Jin
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Yi Zhong
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Tianhang Lv
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Zhiwei Ge
- Analysis Center of Agrobiology and Environmental Sciences, Zhejiang University, Hangzhou 310058, China
| | - Haoran Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Tianhao Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Yugang Lin
- Department of Pharmacy, Affiliated Jinhua Hospital, Zhejiang University School of Medicine, Jinhua 321000, China
| | - Huihui Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Tianyi Ma
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Shufang Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Jie Liao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Xiaohui Fan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
- Zhang Boli Intelligent Health Innovation Lab, Hangzhou 311121, China
- The Joint-laboratory of Clinical Multi-Omics Research between Zhejiang University and Ningbo Municipal Hospital of TCM, Ningbo Municipal Hospital of TCM, 315100 Ningbo, China
| |
Collapse
|
2
|
Chen LY, Li YP. Machine learning-guided strategies for reaction conditions design and optimization. Beilstein J Org Chem 2024; 20:2476-2492. [PMID: 39376489 PMCID: PMC11457048 DOI: 10.3762/bjoc.20.212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 09/19/2024] [Indexed: 10/09/2024] Open
Abstract
This review surveys the recent advances and challenges in predicting and optimizing reaction conditions using machine learning techniques. The paper emphasizes the importance of acquiring and processing large and diverse datasets of chemical reactions, and the use of both global and local models to guide the design of synthetic processes. Global models exploit the information from comprehensive databases to suggest general reaction conditions for new reactions, while local models fine-tune the specific parameters for a given reaction family to improve yield and selectivity. The paper also identifies the current limitations and opportunities in this field, such as the data quality and availability, and the integration of high-throughput experimentation. The paper demonstrates how the combination of chemical engineering, data science, and ML algorithms can enhance the efficiency and effectiveness of reaction conditions design, and enable novel discoveries in synthetic chemistry.
Collapse
Affiliation(s)
- Lung-Yi Chen
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
| | - Yi-Pei Li
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
- Taiwan International Graduate Program on Sustainable Chemical Science and Technology (TIGP-SCST), No. 128, Sec. 2, Academia Road, Taipei 11529, Taiwan
| |
Collapse
|
3
|
Banerjee A, Kar S, Roy K, Patlewicz G, Charest N, Benfenati E, Cronin MTD. Molecular similarity in chemical informatics and predictive toxicity modeling: from quantitative read-across (q-RA) to quantitative read-across structure-activity relationship (q-RASAR) with the application of machine learning. Crit Rev Toxicol 2024; 54:659-684. [PMID: 39225123 DOI: 10.1080/10408444.2024.2386260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/25/2024] [Accepted: 07/25/2024] [Indexed: 09/04/2024]
Abstract
This article aims to provide a comprehensive critical, yet readable, review of general interest to the chemistry community on molecular similarity as applied to chemical informatics and predictive modeling with a special focus on read-across (RA) and read-across structure-activity relationships (RASAR). Molecular similarity-based computational tools, such as quantitative structure-activity relationships (QSARs) and RA, are routinely used to fill the data gaps for a wide range of properties including toxicity endpoints for regulatory purposes. This review will explore the background of RA starting from how structural information has been used through to how other similarity contexts such as physicochemical, absorption, distribution, metabolism, and elimination (ADME) properties, and biological aspects are being characterized. More recent developments of RA's integration with QSAR have resulted in the emergence of novel models such as ToxRead, generalized read-across (GenRA), and quantitative RASAR (q-RASAR). Conventional QSAR techniques have been excluded from this review except where necessary for context.
Collapse
Affiliation(s)
- Arkaprava Banerjee
- Department of Pharmaceutical Technology, Drug Theoretics and Cheminformatics (DTC) Laboratory, Jadavpur University, Kolkata, India
| | - Supratik Kar
- Department of Chemistry and Physics, Chemometrics & Molecular Modeling Laboratory, Kean University, Union, NJ, USA
| | - Kunal Roy
- Department of Pharmaceutical Technology, Drug Theoretics and Cheminformatics (DTC) Laboratory, Jadavpur University, Kolkata, India
| | - Grace Patlewicz
- Center for Computational Toxicology and Exposure, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Nathaniel Charest
- Center for Computational Toxicology and Exposure, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Emilio Benfenati
- Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Mark T D Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, UK
| |
Collapse
|
4
|
Das B, Mathew AT, Baidya ATK, Devi B, Salmon RR, Kumar R. Artificial intelligence assisted identification of potential tau aggregation inhibitors: ligand- and structure-based virtual screening, in silico ADME, and molecular dynamics study. Mol Divers 2024; 28:2013-2031. [PMID: 37022608 DOI: 10.1007/s11030-023-10645-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Accepted: 03/29/2023] [Indexed: 04/07/2023]
Abstract
Alzheimer's disease (AD) is a severe, growing, multifactorial disorder affecting millions of people worldwide characterized by cognitive decline and neurodegeneration. The accumulation of tau protein into paired helical filaments is one of the major pathological hallmarks of AD and has gained the interest of researchers as a potential drug target to treat AD. Lately, Artificial Intelligence (AI) has revolutionized the drug discovery process by speeding it up and reducing the overall cost. As a part of our continuous effort to identify potential tau aggregation inhibitors, and leveraging the power of AI, in this study, we used a fully automated AI-assisted ligand-based virtual screening tool, PyRMD to screen a library of 12 million compounds from the ZINC database to identify potential tau aggregation inhibitors. The preliminary hits from virtual screening were filtered for similar compounds and pan-assay interference compounds (the compounds containing reactive functional groups which can interfere with the assays) using RDKit. Further, the selected compounds were prioritized based on their molecular docking score with the binding pocket of tau where the binding pockets were identified using replica exchange molecular dynamics simulation. Thirty-three compounds showing good docking scores for all the tau clusters were selected and were further subjected to in silico pharmacokinetic prediction. Finally, top 10 compounds were selected for molecular dynamics simulation and MMPBSA binding free energy calculations resulting in the identification of UNK_175, UNK_1027, UNK_1172, UNK_1173, UNK_1237, UNK_1518, and UNK_2181 as potential tau aggregation inhibitors.
Collapse
Affiliation(s)
- Bhanuranjan Das
- Department of Pharmaceutical Engineering & Technology, Indian Institute of Technology (B.H.U.), Varanasi, 221005, UP, India
| | - Alen T Mathew
- Department of Pharmaceutical Engineering & Technology, Indian Institute of Technology (B.H.U.), Varanasi, 221005, UP, India
| | - Anurag T K Baidya
- Department of Pharmaceutical Engineering & Technology, Indian Institute of Technology (B.H.U.), Varanasi, 221005, UP, India
| | - Bharti Devi
- Department of Pharmaceutical Engineering & Technology, Indian Institute of Technology (B.H.U.), Varanasi, 221005, UP, India
| | - Rahul Rampa Salmon
- Department of Pharmaceutical Engineering & Technology, Indian Institute of Technology (B.H.U.), Varanasi, 221005, UP, India
| | - Rajnish Kumar
- Department of Pharmaceutical Engineering & Technology, Indian Institute of Technology (B.H.U.), Varanasi, 221005, UP, India.
| |
Collapse
|
5
|
Venkatraman V, Gaiser J, Demekas D, Roy A, Xiong R, Wheeler TJ. Do Molecular Fingerprints Identify Diverse Active Drugs in Large-Scale Virtual Screening? (No). Pharmaceuticals (Basel) 2024; 17:992. [PMID: 39204097 PMCID: PMC11356940 DOI: 10.3390/ph17080992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 07/18/2024] [Accepted: 07/23/2024] [Indexed: 09/03/2024] Open
Abstract
Computational approaches for small-molecule drug discovery now regularly scale to the consideration of libraries containing billions of candidate small molecules. One promising approach to increased the speed of evaluating billion-molecule libraries is to develop succinct representations of each molecule that enable the rapid identification of molecules with similar properties. Molecular fingerprints are thought to provide a mechanism for producing such representations. Here, we explore the utility of commonly used fingerprints in the context of predicting similar molecular activity. We show that fingerprint similarity provides little discriminative power between active and inactive molecules for a target protein based on a known active-while they may sometimes provide some enrichment for active molecules in a drug screen, a screened data set will still be dominated by inactive molecules. We also demonstrate that high-similarity actives appear to share a scaffold with the query active, meaning that they could more easily be identified by structural enumeration. Furthermore, even when limited to only active molecules, fingerprint similarity values do not correlate with compound potency. In sum, these results highlight the need for a new wave of molecular representations that will improve the capacity to detect biologically active molecules based on their similarity to other such molecules.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Chemistry, Norwegian University of Science and Technology, 7034 Trondheim, Norway
| | - Jeremiah Gaiser
- School of Information, University of Arizona, Tucson, AZ 85721, USA
| | - Daphne Demekas
- R. Ken Coit College Pharmacy, University of Arizona, Tucson, AZ 85721, USA
| | - Amitava Roy
- Rocky Mountain Laboratories, Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Hamilton, MT 59840, USA;
- Department of Biomedical and Pharmaceutical Sciences, University of Montana, Missoula, MT 59812, USA
| | - Rui Xiong
- Department of Pharmacology & Toxicology, University of Arizona, Tucson, AZ 85721, USA
| | - Travis J. Wheeler
- R. Ken Coit College Pharmacy, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
6
|
Rago AJ, Zoi I, Gartman JA, McDaniel KA, Jana N, Liu D, Bai WJ. Mining Medicinally Relevant Bioreduction Substrates Inspired by Ligand-Based Drug Design. J Med Chem 2024. [PMID: 39051635 DOI: 10.1021/acs.jmedchem.4c01129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Exploring the scope of biocatalytic transformations in the absence of enzyme structures without extensive experimentation is a challenging task. To expand the limited substrate capacity of carrot-mediated bioreduction and hunt for new medicinally relevant ketones with minimum cost of labor and time, we deployed a practical method inspired by ligand-based drug design. Through analyzing collected literature data and building pharmacophore and reactivity prediction models, we screened a self-built virtual library of >8000 ketones bearing the most frequently used N,O,S-heterocycles and functional groups in drug discovery. Representative examples were validated, expanding the bioreduction substrate scope. The public availability of our models alongside the straightforward screening workflow makes it time-, labor-, and cost-saving to evaluate unknown bioreduction substrates for medicinal chemistry applications, especially for a large set of structurally differentiated ketones. Our studies also showcase the novelty of utilizing medicinal chemistry principles to solve a general biocatalysis problem.
Collapse
Affiliation(s)
| | - Ioanna Zoi
- AbbVie, Inc., North Chicago, Illinois 60064, United States
| | | | | | - Navendu Jana
- AbbVie, Inc., North Chicago, Illinois 60064, United States
| | - Dachun Liu
- AbbVie, Inc., North Chicago, Illinois 60064, United States
| | - Wen-Ju Bai
- AbbVie, Inc., North Chicago, Illinois 60064, United States
| |
Collapse
|
7
|
Tahıl G, Delorme F, Le Berre D, Monflier É, Sayede A, Tilloy S. Stereoisomers Are Not Machine Learning's Best Friends. J Chem Inf Model 2024; 64:5451-5469. [PMID: 38949069 DOI: 10.1021/acs.jcim.4c00318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
This study addresses the challenge of accurately identifying stereoisomers in cheminformatics, which originates from our objective to apply machine learning to predict the association constant between cyclodextrin and a guest. Identifying stereoisomers is indeed crucial for machine learning applications. Current tools offer various molecular descriptors, including their textual representation as Isomeric SMILES that can distinguish stereoisomers. However, such representation is text-based and does not have a fixed size, so a conversion is needed to make it usable to machine learning approaches. Word embedding techniques can be used to solve this problem. Mol2vec, a word embedding approach for molecules, offers such a conversion. Unfortunately, it cannot distinguish between stereoisomers due to its inability to capture the spatial configuration of molecular structures. This study proposes several approaches that use word embedding techniques to handle molecular discrimination using stereochemical information on molecules or considering Isomeric SMILES notation as a text in Natural Language Processing. Our aim is to generate a distinct vector for each unique molecule, correctly identifying stereoisomer information in cheminformatics. The proposed approaches are then compared to our original machine learning task: predicting the association constant between cyclodextrin and a guest molecule.
Collapse
Affiliation(s)
- Gökhan Tahıl
- Centre de Recherche en Informatique de Lens (CRIL)Univ. Artois, CNRS, Centre de Recherche en Informatique de Lens (CRIL), F-62300 Lens, France
- Univ. Artois, CNRS, Centrale Lille, Univ. Lille, UMR 8181, Unité de Catalyse et Chimie du Solide (UCCS), rue Jean Souvraz, SP 18, F-62307 Lens Cedex, France
| | - Fabien Delorme
- Centre de Recherche en Informatique de Lens (CRIL)Univ. Artois, CNRS, Centre de Recherche en Informatique de Lens (CRIL), F-62300 Lens, France
| | - Daniel Le Berre
- Centre de Recherche en Informatique de Lens (CRIL)Univ. Artois, CNRS, Centre de Recherche en Informatique de Lens (CRIL), F-62300 Lens, France
| | - Éric Monflier
- Univ. Artois, CNRS, Centrale Lille, Univ. Lille, UMR 8181, Unité de Catalyse et Chimie du Solide (UCCS), rue Jean Souvraz, SP 18, F-62307 Lens Cedex, France
| | - Adlane Sayede
- Univ. Artois, CNRS, Centrale Lille, Univ. Lille, UMR 8181, Unité de Catalyse et Chimie du Solide (UCCS), rue Jean Souvraz, SP 18, F-62307 Lens Cedex, France
| | - Sébastien Tilloy
- Univ. Artois, CNRS, Centrale Lille, Univ. Lille, UMR 8181, Unité de Catalyse et Chimie du Solide (UCCS), rue Jean Souvraz, SP 18, F-62307 Lens Cedex, France
| |
Collapse
|
8
|
Kim J, Chang W, Ji H, Joung I. Quantum-Informed Molecular Representation Learning Enhancing ADMET Property Prediction. J Chem Inf Model 2024; 64:5028-5040. [PMID: 38916580 DOI: 10.1021/acs.jcim.4c00772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
We examined pretraining tasks leveraging abundant labeled data to effectively enhance molecular representation learning in downstream tasks, specifically emphasizing graph transformers to improve the prediction of ADMET properties. Our investigation revealed limitations in previous pretraining tasks and identified more meaningful training targets, ranging from 2D molecular descriptors to extensive quantum chemistry simulations. These data were seamlessly integrated into supervised pretraining tasks. The implementation of our pretraining strategy and multitask learning outperforms conventional methods, achieving state-of-the-art outcomes in 7 out of 22 ADMET tasks within the Therapeutics Data Commons by utilizing a shared encoder across all tasks. Our approach underscores the effectiveness of learning molecular representations and highlights the potential for scalability when leveraging extensive data sets, marking a significant advancement in this domain.
Collapse
Affiliation(s)
- Jungwoo Kim
- Standigm Inc., 182 Dogok-ro, 6F, Gangnam-gu, Seoul 06261, Korea
| | - Woojae Chang
- Standigm Inc., 182 Dogok-ro, 6F, Gangnam-gu, Seoul 06261, Korea
| | - Hyunjun Ji
- Standigm Inc., 182 Dogok-ro, 6F, Gangnam-gu, Seoul 06261, Korea
| | - InSuk Joung
- Standigm Inc., 182 Dogok-ro, 6F, Gangnam-gu, Seoul 06261, Korea
| |
Collapse
|
9
|
Boulaamane Y, Molina Panadero I, Hmadcha A, Atalaya Rey C, Baammi S, El Allali A, Maurady A, Smani Y. Antibiotic discovery with artificial intelligence for the treatment of Acinetobacter baumannii infections. mSystems 2024; 9:e0032524. [PMID: 38700330 PMCID: PMC11326114 DOI: 10.1128/msystems.00325-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 03/27/2024] [Indexed: 05/05/2024] Open
Abstract
Global challenges presented by multidrug-resistant Acinetobacter baumannii infections have stimulated the development of new treatment strategies. We reported that outer membrane protein W (OmpW) is a potential therapeutic target in A. baumannii. Here, a library of 11,648 natural compounds was subjected to a primary screening using quantitative structure-activity relationship (QSAR) models generated from a ChEMBL data set with >7,000 compounds with their reported minimal inhibitory concentration (MIC) values against A. baumannii followed by a structure-based virtual screening against OmpW. In silico pharmacokinetic evaluation was conducted to assess the drug-likeness of these compounds. The ten highest-ranking compounds were found to bind with an energy score ranging from -7.8 to -7.0 kcal/mol where most of them belonged to curcuminoids. To validate these findings, one lead compound exhibiting promising binding stability as well as favorable pharmacokinetics properties, namely demethoxycurcumin, was tested against a panel of A. baumannii strains to determine its antibacterial activity using microdilution and time-kill curve assays. To validate whether the compound binds to the selected target, an OmpW-deficient mutant was studied and compared with the wild type. Our results demonstrate that demethoxycurcumin in monotherapy and in combination with colistin is active against all A. baumannii strains. Finally, the compound was found to significantly reduce the A. baumannii interaction with host cells, suggesting its anti-virulence properties. Collectively, this study demonstrates machine learning as a promising strategy for the discovery of curcuminoids as antimicrobial agents for combating A. baumannii infections. IMPORTANCE Acinetobacter baumannii presents a severe global health threat, with alarming levels of antimicrobial resistance rates resulting in significant morbidity and mortality in the USA, ranging from 26% to 68%, as reported by the Centers for Disease Control and Prevention (CDC). To address this threat, novel strategies beyond traditional antibiotics are imperative. Computational approaches, such as QSAR models leverage molecular structures to predict biological effects, expediting drug discovery. We identified OmpW as a potential therapeutic target in A. baumannii and screened 11,648 natural compounds. We employed QSAR models from a ChEMBL bioactivity data set and conducted structure-based virtual screening against OmpW. Demethoxycurcumin, a lead compound, exhibited promising antibacterial activity against A. baumannii, including multidrug-resistant strains. Additionally, demethoxycurcumin demonstrated anti-virulence properties by reducing A. baumannii interaction with host cells. The findings highlight the potential of artificial intelligence in discovering curcuminoids as effective antimicrobial agents against A. baumannii infections, offering a promising strategy to address antibiotic resistance.
Collapse
Affiliation(s)
- Yassir Boulaamane
- Laboratory of Innovative Technologies, National School of Applied Sciences of Tangier, Abdelmalek Essaadi University, Tetouan, Morocco
| | - Irene Molina Panadero
- Centro Andaluz de Biología del Desarrollo, Universidad Pablo de Olavide/CSIC/Junta de Andalucía, Seville, Spain
| | - Abdelkrim Hmadcha
- Departamento de Biología Molecular e Ingeniería Bioquímica, Universidad Pablo de Olavide, Seville, Spain
- Biosanitary Research Institute (IIB-VIU), Valencian International University (VIU), Valencia, Spain
| | - Celia Atalaya Rey
- Centro Andaluz de Biología del Desarrollo, Universidad Pablo de Olavide/CSIC/Junta de Andalucía, Seville, Spain
| | - Soukayna Baammi
- Bioinformatics Laboratory, College of Computing, Mohammed VI Polytechnic University, Benguerir, Morocco
| | - Achraf El Allali
- Bioinformatics Laboratory, College of Computing, Mohammed VI Polytechnic University, Benguerir, Morocco
| | - Amal Maurady
- Laboratory of Innovative Technologies, National School of Applied Sciences of Tangier, Abdelmalek Essaadi University, Tetouan, Morocco
- Faculty of Sciences and Techniques of Tangier, Abdelmalek Essaadi University, Tetouan, Morocco
| | - Younes Smani
- Centro Andaluz de Biología del Desarrollo, Universidad Pablo de Olavide/CSIC/Junta de Andalucía, Seville, Spain
- Departamento de Biología Molecular e Ingeniería Bioquímica, Universidad Pablo de Olavide, Seville, Spain
| |
Collapse
|
10
|
Yoo S, Kim J. Adapt-cMolGPT: A Conditional Generative Pre-Trained Transformer with Adapter-Based Fine-Tuning for Target-Specific Molecular Generation. Int J Mol Sci 2024; 25:6641. [PMID: 38928346 PMCID: PMC11203498 DOI: 10.3390/ijms25126641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 06/09/2024] [Accepted: 06/14/2024] [Indexed: 06/28/2024] Open
Abstract
Small-molecule drug design aims to generate compounds that target specific proteins, playing a crucial role in the early stages of drug discovery. Recently, research has emerged that utilizes the GPT model, which has achieved significant success in various fields to generate molecular compounds. However, due to the persistent challenge of small datasets in the pharmaceutical field, there has been some degradation in the performance of generating target-specific compounds. To address this issue, we propose an enhanced target-specific drug generation model, Adapt-cMolGPT, which modifies molecular representation and optimizes the fine-tuning process. In particular, we introduce a new fine-tuning method that incorporates an adapter module into a pre-trained base model and alternates weight updates by sections. We evaluated the proposed model through multiple experiments and demonstrated performance improvements compared to previous models. In the experimental results, Adapt-cMolGPT generated a greater number of novel and valid compounds compared to other models, with these generated compounds exhibiting properties similar to those of real molecular data. These results indicate that our proposed method is highly effective in designing drugs targeting specific proteins.
Collapse
Affiliation(s)
- Soyoung Yoo
- Department of Artificial Intelligence, Sejong University, Seoul 05006, Republic of Korea;
| | - Junghyun Kim
- Department of Artificial Intelligence, Sejong University, Seoul 05006, Republic of Korea;
- Deep Learning Architecture Research Center, Sejong University, Seoul 05006, Republic of Korea
| |
Collapse
|
11
|
Zhang Q, Zuo L, Ren Y, Wang S, Wang W, Ma L, Zhang J, Xia B. FMCA-DTI: a fragment-oriented method based on a multihead cross attention mechanism to improve drug-target interaction prediction. Bioinformatics 2024; 40:btae347. [PMID: 38810106 PMCID: PMC11256963 DOI: 10.1093/bioinformatics/btae347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 04/23/2024] [Accepted: 05/28/2024] [Indexed: 05/31/2024] Open
Abstract
MOTIVATION Identifying drug-target interactions (DTI) is crucial in drug discovery. Fragments are less complex and can accurately characterize local features, which is important in DTI prediction. Recently, deep learning (DL)-based methods predict DTI more efficiently. However, two challenges remain in existing DL-based methods: (i) some methods directly encode drugs and proteins into integers, ignoring the substructure representation; (ii) some methods learn the features of the drugs and proteins separately instead of considering their interactions. RESULTS In this article, we propose a fragment-oriented method based on a multihead cross attention mechanism for predicting DTI, named FMCA-DTI. FMCA-DTI obtains multiple types of fragments of drugs and proteins by branch chain mining and category fragment mining. Importantly, FMCA-DTI utilizes the shared-weight-based multihead cross attention mechanism to learn the complex interaction features between different fragments. Experiments on three benchmark datasets show that FMCA-DTI achieves significantly improved performance by comparing it with four state-of-the-art baselines. AVAILABILITY AND IMPLEMENTATION The code for this workflow is available at: https://github.com/jacky102022/FMCA-DTI.
Collapse
Affiliation(s)
- Qi Zhang
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Le Zuo
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Ying Ren
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Siyuan Wang
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Wenfa Wang
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Lerong Ma
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Jing Zhang
- Medical College of Yan'an University, Yan'an University, Yan'an 716000, China
- Medical Research and Experimental Center, The Second Affiliated Hospital of Xi'an Medical University, Xi'an 710021, China
| | - Bisheng Xia
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| |
Collapse
|
12
|
Orsi M, Reymond JL. One chiral fingerprint to find them all. J Cheminform 2024; 16:53. [PMID: 38741153 DOI: 10.1186/s13321-024-00849-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/28/2024] [Indexed: 05/16/2024] Open
Abstract
Molecular fingerprints are indispensable tools in cheminformatics. However, stereochemistry is generally not considered, which is problematic for large molecules which are almost all chiral. Herein we report MAP4C, a chiral version of our previously reported fingerprint MAP4, which lists MinHashes computed from character strings containing the SMILES of all pairs of circular substructures up to a diameter of four bonds and the shortest topological distance between their central atoms. MAP4C includes the Cahn-Ingold-Prelog (CIP) annotation (R, S, r or s) whenever the chiral atom is the center of a circular substructure, a question mark for undefined stereocenters, and double bond cis-trans information if specified. MAP4C performs slightly better than the achiral MAP4, ECFP and AP fingerprints in non-stereoselective virtual screening benchmarks. Furthermore, MAP4C distinguishes between stereoisomers in chiral molecules from small molecule drugs to large natural products and peptides comprising thousands of diastereomers, with a degree of distinction smaller than between structural isomers and proportional to the number of chirality changes. Due to its excellent performance across diverse molecular classes and its ability to handle stereochemistry, MAP4C is recommended as a generally applicable chiral molecular fingerprint. SCIENTIFIC CONTRIBUTION: The ability of our chiral fingerprint MAP4C to handle stereoisomers from small molecules to large natural products and peptides is unprecedented and opens the way for cheminformatics to include stereochemistry as an important molecular parameter across all fields of molecular design.
Collapse
Affiliation(s)
- Markus Orsi
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| |
Collapse
|
13
|
Phan TL, Trinh TC, To VT, Pham TA, Van Nguyen PC, Phan TM, Truong TN. Novel machine learning approach toward classification model of HIV-1 integrase inhibitors. RSC Adv 2024; 14:14506-14513. [PMID: 38708110 PMCID: PMC11064125 DOI: 10.1039/d4ra02231a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Accepted: 04/22/2024] [Indexed: 05/07/2024] Open
Abstract
HIV-1 (human immunodeficiency virus-1) has been causing severe pandemics by attacking the immune system of its host. Left untreated, it can lead to AIDS (acquired immunodeficiency syndrome), where death is inevitable due to opportunistic diseases. Therefore, discovering new antiviral drugs against HIV-1 is crucial. This study aimed to explore a novel machine learning approach to classify compounds that inhibit HIV-1 integrase and screen the dataset of repurposing compounds. The present study had two main stages: selecting the best type of fingerprint or molecular descriptor using the Wilcoxon signed-rank test and building a computational model based on machine learning. In the first stage, we calculated 16 different types of fingerprint or molecular descriptors from the dataset and used each of them as input features for 10 machine-learning models, which were evaluated through cross-validation. Then, a meta-analysis was performed with the Wilcoxon signed-rank test to select the optimal fingerprint or molecular descriptor types. In the second stage, we constructed a model based on the optimal fingerprint or molecular descriptor type. This data followed the machine learning procedure, including data preprocessing, outlier handling, normalization, feature selection, model selection, external validation, and model optimization. In the end, an XGBoost model and RDK7 fingerprint were identified as the most suitable. The model achieved promising results, with an average precision of 0.928 ± 0.027 and an F1-score of 0.848 ± 0.041 in cross-validation. The model achieved an average precision of 0.921 and an F1-score of 0.889 in external validation. Molecular docking was performed and validated by redocking for docking power and retrospective control for screening power, with the AUC metrics being 0.876 and the threshold being identified at -9.71 kcal mol-1. Finally, 44 compounds from DrugBank repurposing data were selected from the QSAR model, then three candidates were identified as potential compounds from molecular docking, and PSI-697 was detected as the most promising molecule, with in vitro experiment being not performed (docking score: -17.14 kcal mol-1, HIV integrase inhibitory probability: 69.81%).
Collapse
Affiliation(s)
- Tieu-Long Phan
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig Härtelstraße 16-18 04107 Leipzig Germany
- Department of Mathematics and Computer Science, University of Southern Denmark Odense M DK-5230 Denmark
| | - The-Chuong Trinh
- Faculty of Pharmacy, Grenoble Alpes University La Tronche 38700 France
| | - Van-Thinh To
- Falcuty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City Ho Chi Minh City 700000 Vietnam
| | - Thanh-An Pham
- Falcuty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City Ho Chi Minh City 700000 Vietnam
| | - Phuoc-Chung Van Nguyen
- Falcuty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City Ho Chi Minh City 700000 Vietnam
| | - Tuyet-Minh Phan
- Falcuty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City Ho Chi Minh City 700000 Vietnam
| | - Tuyen Ngoc Truong
- Falcuty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City Ho Chi Minh City 700000 Vietnam
| |
Collapse
|
14
|
Oliveira PF, Guedes RC, Falcao AO. Inferring molecular inhibition potency with AlphaFold predicted structures. Sci Rep 2024; 14:8252. [PMID: 38589418 PMCID: PMC11001998 DOI: 10.1038/s41598-024-58394-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 03/28/2024] [Indexed: 04/10/2024] Open
Abstract
Even though in silico drug ligand-based methods have been successful in predicting interactions with known target proteins, they struggle with new, unassessed targets. To address this challenge, we propose an approach that integrates structural data from AlphaFold 2 predicted protein structures into machine learning models. Our method extracts 3D structural protein fingerprints and combines them with ligand structural data to train a single machine learning model. This model captures the relationship between ligand properties and the unique structural features of various target proteins, enabling predictions for never before tested molecules and protein targets. To assess our model, we used a dataset of 144 Human G-protein Coupled Receptors (GPCRs) with over 140,000 measured inhibition constants (Ki) values. Results strongly suggest that our approach performs as well as state-of-the-art ligand-based methods. In a second modeling approach that used 129 targets for training and a separate test set of 15 different protein targets, our model correctly predicted interactions for 73% of targets, with explained variances exceeding 0.50 in 22% of cases. Our findings further verified that the usage of experimentally determined protein structures produced models that were statistically indistinct from the Alphafold synthetic structures. This study presents a proteo-chemometric drug screening approach that uses a simple and scalable method for extracting protein structural information for usage in machine learning models capable of predicting protein-molecule interactions even for orphan targets.
Collapse
Affiliation(s)
- Pedro F Oliveira
- Lasige, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| | - Rita C Guedes
- Research Institute for Medicines (iMed.ULisboa), Faculdade de Farmácia, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003 Lisboa, Portugal
| | - Andre O Falcao
- Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisboa, Portugal.
| |
Collapse
|
15
|
Qian W, Wang X, Kang Y, Pan P, Hou T, Hsieh CY. A general model for predicting enzyme functions based on enzymatic reactions. J Cheminform 2024; 16:38. [PMID: 38556873 PMCID: PMC10983695 DOI: 10.1186/s13321-024-00827-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 03/16/2024] [Indexed: 04/02/2024] Open
Abstract
Accurate prediction of the enzyme comission (EC) numbers for chemical reactions is essential for the understanding and manipulation of enzyme functions, biocatalytic processes and biosynthetic planning. A number of machine leanring (ML)-based models have been developed to classify enzymatic reactions, showing great advantages over costly and long-winded experimental verifications. However, the prediction accuracy for most available models trained on the records of chemical reactions without specifying the enzymatic catalysts is rather limited. In this study, we introduced BEC-Pred, a BERT-based multiclassification model, for predicting EC numbers associated with reactions. Leveraging transfer learning, our approach achieves precise forecasting across a wide variety of Enzyme Commission (EC) numbers solely through analysis of the SMILES sequences of substrates and products. BEC-Pred model outperformed other sequence and graph-based ML methods, attaining a higher accuracy of 91.6%, surpassing them by 5.5%, and exhibiting superior F1 scores with improvements of 6.6% and 6.0%, respectively. The enhanced performance highlights the potential of BEC-Pred to serve as a reliable foundational tool to accelerate the cutting-edge research in synthetic biology and drug metabolism. Moreover, we discussed a few examples on how BEC-Pred could accurately predict the enzymatic classification for the Novozym 435-induced hydrolysis and lipase efficient catalytic synthesis. We anticipate that BEC-Pred will have a positive impact on the progression of enzymatic research.
Collapse
Affiliation(s)
- Wenjia Qian
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Xiaorui Wang
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, 999078, China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Peichen Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
16
|
Boldini D, Ballabio D, Consonni V, Todeschini R, Grisoni F, Sieber SA. Effectiveness of molecular fingerprints for exploring the chemical space of natural products. J Cheminform 2024; 16:35. [PMID: 38528548 DOI: 10.1186/s13321-024-00830-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/17/2024] [Indexed: 03/27/2024] Open
Abstract
Natural products are a diverse class of compounds with promising biological properties, such as high potency and excellent selectivity. However, they have different structural motifs than typical drug-like compounds, e.g., a wider range of molecular weight, multiple stereocenters and higher fraction of sp3-hybridized carbons. This makes the encoding of natural products via molecular fingerprints difficult, thus restricting their use in cheminformatics studies. To tackle this issue, we explored over 30 years of research to systematically evaluate which molecular fingerprint provides the best performance on the natural product chemical space. We considered 20 molecular fingerprints from four different sources, which we then benchmarked on over 100,000 unique natural products from the COCONUT (COlleCtion of Open Natural prodUcTs) and CMNPD (Comprehensive Marine Natural Products Database) databases. Our analysis focused on the correlation between different fingerprints and their classification performance on 12 bioactivity prediction datasets. Our results show that different encodings can provide fundamentally different views of the natural product chemical space, leading to substantial differences in pairwise similarity and performance. While Extended Connectivity Fingerprints are the de-facto option to encoding drug-like compounds, other fingerprints resulted to match or outperform them for bioactivity prediction of natural products. These results highlight the need to evaluate multiple fingerprinting algorithms for optimal performance and suggest new areas of research. Finally, we provide an open-source Python package for computing all molecular fingerprints considered in the study, as well as data and scripts necessary to reproduce the results, at https://github.com/dahvida/NP_Fingerprints .
Collapse
Affiliation(s)
- Davide Boldini
- TUM School of Natural Sciences, Department of Bioscience, Technical University of Munich, Center for Functional Protein Assemblies (CPA), 85748, Garching bei München, Germany.
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.zza Della Scienza, 1, 20126, Milan, Italy
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.zza Della Scienza, 1, 20126, Milan, Italy
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.zza Della Scienza, 1, 20126, Milan, Italy
| | - Francesca Grisoni
- Institute for Complex Molecular Systems and Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, Netherlands
| | - Stephan A Sieber
- TUM School of Natural Sciences, Department of Bioscience, Technical University of Munich, Center for Functional Protein Assemblies (CPA), 85748, Garching bei München, Germany
| |
Collapse
|
17
|
Shimizu Y, Ohta M, Ishida S, Terayama K, Osawa M, Honma T, Ikeda K. AI-driven molecular generation of not-patented pharmaceutical compounds using world open patent data. J Cheminform 2023; 15:120. [PMID: 38093324 PMCID: PMC10716930 DOI: 10.1186/s13321-023-00791-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 12/02/2023] [Indexed: 12/17/2023] Open
Abstract
Developing compounds with novel structures is important for the production of new drugs. From an intellectual perspective, confirming the patent status of newly developed compounds is essential, particularly for pharmaceutical companies. The generation of a large number of compounds has been made possible because of the recent advances in artificial intelligence (AI). However, confirming the patent status of these generated molecules has been a challenge because there are no free and easy-to-use tools that can be used to determine the novelty of the generated compounds in terms of patents in a timely manner; additionally, there are no appropriate reference databases for pharmaceutical patents in the world. In this study, two public databases, SureChEMBL and Google Patents Public Datasets, were used to create a reference database of drug-related patented compounds using international patent classification. An exact structure search system was constructed using InChIKey and a relational database system to rapidly search for compounds in the reference database. Because drug-related patented compounds are a good source for generative AI to learn useful chemical structures, they were used as the training data. Furthermore, molecule generation was successfully directed by increasing and decreasing the number of generated patented compounds through incorporation of patent status (i.e., patented or not) into learning. The use of patent status enabled generation of novel molecules with high drug-likeness. The generation using generative AI with patent information would help efficiently propose novel compounds in terms of pharmaceutical patents. Scientific contribution: In this study, a new molecule-generation method that takes into account the patent status of molecules, which has rarely been considered but is an important feature in drug discovery, was developed. The method enables the generation of novel molecules based on pharmaceutical patents with high drug-likeness and will help in the efficient development of effective drug compounds.
Collapse
Affiliation(s)
- Yugo Shimizu
- HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
- Division of Physics for Life Functions, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo, 105-8512, Japan
| | - Masateru Ohta
- HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
| | - Shoichi Ishida
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
| | - Kei Terayama
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
| | - Masanori Osawa
- Division of Physics for Life Functions, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo, 105-8512, Japan
| | - Teruki Honma
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
| | - Kazuyoshi Ikeda
- HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan.
- Division of Physics for Life Functions, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo, 105-8512, Japan.
| |
Collapse
|
18
|
Probst D. An explainability framework for deep learning on chemical reactions exemplified by enzyme-catalysed reaction classification. J Cheminform 2023; 15:113. [PMID: 37996942 PMCID: PMC10668483 DOI: 10.1186/s13321-023-00784-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 11/13/2023] [Indexed: 11/25/2023] Open
Abstract
Assigning or proposing a catalysing enzyme given a chemical or biochemical reaction is of great interest to life sciences and chemistry alike. The exploration and design of metabolic pathways and the challenge of finding more sustainable enzyme-catalysed alternatives to traditional organic reactions are just two examples of tasks that require an association between reaction and enzyme. However, given the lack of large and balanced annotated data sets of enzyme-catalysed reactions, assigning an enzyme to a reaction still relies on expert-curated rules and databases. Here, we present a data-driven explainable human-in-the-loop machine learning approach to support and ultimately automate the association of a catalysing enzyme with a given biochemical reaction. In addition, the proposed method is capable of predicting enzymes as candidate catalysts for organic reactions amendable to biocatalysis. Finally, the introduced explainability and visualisation methods can easily be generalised to support other machine-learning approaches involving chemical and biochemical reactions.
Collapse
Affiliation(s)
- Daniel Probst
- Signal Processing Laboratory 2, Institute of Electrical and Micro Engineering, School of Engineering, EPFL, Rte Cantonale, 1015, Lausanne, Vaud, Switzerland.
| |
Collapse
|
19
|
Orsi M, Probst D, Schwaller P, Reymond JL. Alchemical analysis of FDA approved drugs. DIGITAL DISCOVERY 2023; 2:1289-1296. [PMID: 38013905 PMCID: PMC10561545 DOI: 10.1039/d3dd00039g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 08/29/2023] [Indexed: 11/29/2023]
Abstract
Chemical space maps help visualize similarities within molecular sets. However, there are many different molecular similarity measures resulting in a confusing number of possible comparisons. To overcome this limitation, we exploit the fact that tools designed for reaction informatics also work for alchemical processes that do not obey Lavoisier's principle, such as the transmutation of lead into gold. We start by using the differential reaction fingerprint (DRFP) to create tree-maps (TMAPs) representing the chemical space of pairs of drugs selected as being similar according to various molecular fingerprints. We then use the Transformer-based RXNMapper model to understand structural relationships between drugs, and its confidence score to distinguish between pairs related by chemically feasible transformations and pairs related by alchemical transmutations. This analysis reveals a diversity of structural similarity relationships that are otherwise difficult to analyze simultaneously. We exemplify this approach by visualizing FDA-approved drugs, EGFR inhibitors, and polymyxin B analogs.
Collapse
Affiliation(s)
- Markus Orsi
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
| | - Daniel Probst
- Ecole Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | | | - Jean-Louis Reymond
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
| |
Collapse
|
20
|
Abstract
DNA-encoded libraries (DELs) are widely used in the discovery of drug candidates, and understanding their design principles is critical for accessing better libraries. Most DELs are combinatorial in nature and are synthesized by assembling sets of building blocks in specific topologies. In this study, different aspects of library topology were explored and their effect on DEL properties and chemical diversity was analyzed. We introduce a descriptor for DEL topological assignment (DELTA) and use it to examine the landscape of possible DEL topologies and their coverage in the literature. A generative topographic mapping analysis revealed that the impact of library topology on chemical space coverage is secondary to building block selection. Furthermore, it became apparent that the descriptor used to analyze chemical space dictates how structures cluster, with the effects of topology being apparent when using three-dimensional descriptors but not with common two-dimensional descriptors. This outcome points to potential challenges of attempts to predict DEL productivity based on chemical space analyses alone. While topology is rather inconsequential for defining the chemical space of encoded compounds, it greatly affects possible interactions with target proteins as illustrated in docking studies using NAD/NADP binding proteins as model receptors.
Collapse
Affiliation(s)
- William K Weigel
- Department of Medicinal Chemistry, Skaggs College of Pharmacy, University of Utah, 30 S 2000 E, Salt Lake City, Utah 84112, United States
| | - Alba L Montoya
- Department of Medicinal Chemistry, Skaggs College of Pharmacy, University of Utah, 30 S 2000 E, Salt Lake City, Utah 84112, United States
| | - Raphael M Franzini
- Department of Medicinal Chemistry, Skaggs College of Pharmacy, University of Utah, 30 S 2000 E, Salt Lake City, Utah 84112, United States
- Huntsman Cancer Institute, University of Utah, 2000 Circle of Hope Dr., Salt Lake City, Utah 84112, United States
| |
Collapse
|
21
|
Kırboğa KK, Abbasi S, Küçüksille EU. Explainability and white box in drug discovery. Chem Biol Drug Des 2023; 102:217-233. [PMID: 37105727 DOI: 10.1111/cbdd.14262] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 03/24/2023] [Accepted: 04/12/2023] [Indexed: 04/29/2023]
Abstract
Recently, artificial intelligence (AI) techniques have been increasingly used to overcome the challenges in drug discovery. Although traditional AI techniques generally have high accuracy rates, there may be difficulties in explaining the decision process and patterns. This can create difficulties in understanding and making sense of the outputs of algorithms used in drug discovery. Therefore, using explainable AI (XAI) techniques, the causes and consequences of the decision process are better understood. This can help further improve the drug discovery process and make the right decisions. To address this issue, Explainable Artificial Intelligence (XAI) emerged as a process and method that securely captures the results and outputs of machine learning (ML) and deep learning (DL) algorithms. Using techniques such as SHAP (SHApley Additive ExPlanations) and LIME (Locally Interpretable Model-Independent Explanations) has made the drug targeting phase clearer and more understandable. XAI methods are expected to reduce time and cost in future computational drug discovery studies. This review provides a comprehensive overview of XAI-based drug discovery and development prediction. XAI mechanisms to increase confidence in AI and modeling methods. The limitations and future directions of XAI in drug discovery are also discussed.
Collapse
Affiliation(s)
- Kevser Kübra Kırboğa
- Bioengineering Department, Bilecik Seyh Edebali University, Bilecik, Turkey
- Informatics Institute, Istanbul Technical University, Maslak, Turkey
| | - Sumra Abbasi
- Department of Biological Sciences, National of Medical Sciences, Rawalpindi, Pakistan
| | - Ecir Uğur Küçüksille
- Department of Computer Engineering, Süleyman Demirel University, Isparta, Turkey
| |
Collapse
|
22
|
Guha R, Velegol D. Harnessing Shannon entropy-based descriptors in machine learning models to enhance the prediction accuracy of molecular properties. J Cheminform 2023; 15:54. [PMID: 37211605 DOI: 10.1186/s13321-023-00712-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 03/18/2023] [Indexed: 05/23/2023] Open
Abstract
Accurate prediction of molecular properties is essential in the screening and development of drug molecules and other functional materials. Traditionally, property-specific molecular descriptors are used in machine learning models. This in turn requires the identification and development of target or problem-specific descriptors. Additionally, an increase in the prediction accuracy of the model is not always feasible from the standpoint of targeted descriptor usage. We explored the accuracy and generalizability issues using a framework of Shannon entropies, based on SMILES, SMARTS and/or InChiKey strings of respective molecules. Using various public databases of molecules, we showed that the accuracy of the prediction of machine learning models could be significantly enhanced simply by using Shannon entropy-based descriptors evaluated directly from SMILES. Analogous to partial pressures and total pressure of gases in a mixture, we used atom-wise fractional Shannon entropy in combination with total Shannon entropy from respective tokens of the string representation to model the molecule efficiently. The proposed descriptor was competitive in performance with standard descriptors such as Morgan fingerprints and SHED in regression models. Additionally, we found that either a hybrid descriptor set containing the Shannon entropy-based descriptors or an optimized, ensemble architecture of multilayer perceptrons and graph neural networks using the Shannon entropies was synergistic to improve the prediction accuracy. This simple approach of coupling the Shannon entropy framework to other standard descriptors and/or using it in ensemble models could find applications in boosting the performance of molecular property predictions in chemistry and material science.
Collapse
Affiliation(s)
- Rajarshi Guha
- Intel Corporation, 2501 NE Century Blvd, Hillsboro, OR, 97124, USA.
| | - Darrell Velegol
- Department of Chemical Engineering, Pennsylvania State University, University Park, PA, 16802, USA
| |
Collapse
|
23
|
Boswell Z, Verga JU, Mackle J, Guerrero-Vazquez K, Thomas OP, Cray J, Wolf BJ, Choo YM, Croot P, Hamann MT, Hardiman G. In-Silico Approaches for the Screening and Discovery of Broad-Spectrum Marine Natural Product Antiviral Agents Against Coronaviruses. Infect Drug Resist 2023; 16:2321-2338. [PMID: 37155475 PMCID: PMC10122865 DOI: 10.2147/idr.s395203] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Accepted: 03/16/2023] [Indexed: 05/10/2023] Open
Abstract
The urgent need for SARS-CoV-2 controls has led to a reassessment of approaches to identify and develop natural product inhibitors of zoonotic, highly virulent, and rapidly emerging viruses. There are yet no clinically approved broad-spectrum antivirals available for beta-coronaviruses. Discovery pipelines for pan-virus medications against a broad range of betacoronaviruses are therefore a priority. A variety of marine natural product (MNP) small molecules have shown inhibitory activity against viral species. Access to large data caches of small molecule structural information is vital to finding new pharmaceuticals. Increasingly, molecular docking simulations are being used to narrow the space of possibilities and generate drug leads. Combining in-silico methods, augmented by metaheuristic optimization and machine learning (ML) allows the generation of hits from within a virtual MNP library to narrow screens for novel targets against coronaviruses. In this review article, we explore current insights and techniques that can be leveraged to generate broad-spectrum antivirals against betacoronaviruses using in-silico optimization and ML. ML approaches are capable of simultaneously evaluating different features for predicting inhibitory activity. Many also provide a semi-quantitative measure of feature relevance and can guide in selecting a subset of features relevant for inhibition of SARS-CoV-2.
Collapse
Affiliation(s)
- Zachary Boswell
- School of Biological Sciences and Institute for Global Security, Queen's University, Belfast, Northern Ireland, UK
| | - Jacopo Umberto Verga
- School of Biological Sciences and Institute for Global Security, Queen's University, Belfast, Northern Ireland, UK
- Genomic Data Science, University of Galway, Galway, Ireland
| | - James Mackle
- School of Biological Sciences and Institute for Global Security, Queen's University, Belfast, Northern Ireland, UK
| | | | - Olivier P Thomas
- School of Biological and Chemical Sciences, Ryan Institute, University of Galway, Galway, H91TK33Ireland
| | - James Cray
- Department of Biomedical Education and Anatomy, College of Medicine and Division of Biosciences, College of Dentistry, Ohio State University, Columbus, OH, USA
| | - Bethany J Wolf
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA
| | - Yeun-Mun Choo
- Department of Chemistry, University of Malaya, Kuala Lumpur, Malaysia
| | - Peter Croot
- Irish Centre for Research in Applied Geoscience, Earth and Ocean Sciences and Ryan Institute, School of Natural Sciences, University of Galway, Galway, Ireland
| | - Mark T Hamann
- Departments of Drug Discovery and Biomedical Sciences and Public Health, Colleges of Pharmacy and Medicine, Medical University of South Carolina, Charleston, SC, USA
| | - Gary Hardiman
- School of Biological Sciences and Institute for Global Security, Queen's University, Belfast, Northern Ireland, UK
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA
- Department of Medicine, Medical University of South Carolina, Charleston, SC, USA
| |
Collapse
|
24
|
Young TJ, Jubery TZ, Carley CN, Carroll M, Sarkar S, Singh AK, Singh A, Ganapathysubramanian B. "Canopy fingerprints" for characterizing three-dimensional point cloud data of soybean canopies. FRONTIERS IN PLANT SCIENCE 2023; 14:1141153. [PMID: 37063230 PMCID: PMC10090282 DOI: 10.3389/fpls.2023.1141153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 02/28/2023] [Indexed: 06/19/2023]
Abstract
Advances in imaging hardware allow high throughput capture of the detailed three-dimensional (3D) structure of plant canopies. The point cloud data is typically post-processed to extract coarse-scale geometric features (like volume, surface area, height, etc.) for downstream analysis. We extend feature extraction from 3D point cloud data to various additional features, which we denote as 'canopy fingerprints'. This is motivated by the successful application of the fingerprint concept for molecular fingerprints in chemistry applications and acoustic fingerprints in sound engineering applications. We developed an end-to-end pipeline to generate canopy fingerprints of a three-dimensional point cloud of soybean [Glycine max (L.) Merr.] canopies grown in hill plots captured by a terrestrial laser scanner (TLS). The pipeline includes noise removal, registration, and plot extraction, followed by the canopy fingerprint generation. The canopy fingerprints are generated by splitting the data into multiple sub-canopy scale components and extracting sub-canopy scale geometric features. The generated canopy fingerprints are interpretable and can assist in identifying patterns in a database of canopies, querying similar canopies, or identifying canopies with a certain shape. The framework can be extended to other modalities (for instance, hyperspectral point clouds) and tuned to find the most informative fingerprint representation for downstream tasks. These canopy fingerprints can aid in the utilization of canopy traits at previously unutilized scales, and therefore have applications in plant breeding and resilient crop production.
Collapse
Affiliation(s)
- Therin J. Young
- Department of Mechanical Engineering, Iowa State University, Ames, IA, United States
| | | | - Clayton N. Carley
- Department of Agronomy, Iowa State University, Ames, IA, United States
| | - Matthew Carroll
- Department of Agronomy, Iowa State University, Ames, IA, United States
| | - Soumik Sarkar
- Department of Mechanical Engineering, Iowa State University, Ames, IA, United States
- Translational AI Center, Iowa State University, Ames, IA, United States
| | - Asheesh K. Singh
- Department of Agronomy, Iowa State University, Ames, IA, United States
| | - Arti Singh
- Department of Agronomy, Iowa State University, Ames, IA, United States
| | - Baskar Ganapathysubramanian
- Department of Mechanical Engineering, Iowa State University, Ames, IA, United States
- Translational AI Center, Iowa State University, Ames, IA, United States
| |
Collapse
|
25
|
Maiti KS. Non-Invasive Disease Specific Biomarker Detection Using Infrared Spectroscopy: A Review. Molecules 2023; 28:2320. [PMID: 36903576 PMCID: PMC10005715 DOI: 10.3390/molecules28052320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 02/22/2023] [Accepted: 02/22/2023] [Indexed: 03/06/2023] Open
Abstract
Many life-threatening diseases remain obscure in their early disease stages. Symptoms appear only at the advanced stage when the survival rate is poor. A non-invasive diagnostic tool may be able to identify disease even at the asymptotic stage and save lives. Volatile metabolites-based diagnostics hold a lot of promise to fulfil this demand. Many experimental techniques are being developed to establish a reliable non-invasive diagnostic tool; however, none of them are yet able to fulfil clinicians' demands. Infrared spectroscopy-based gaseous biofluid analysis demonstrated promising results to fulfil clinicians' expectations. The recent development of the standard operating procedure (SOP), sample measurement, and data analysis techniques for infrared spectroscopy are summarized in this review article. It has also outlined the applicability of infrared spectroscopy to identify the specific biomarkers for diseases such as diabetes, acute gastritis caused by bacterial infection, cerebral palsy, and prostate cancer.
Collapse
Affiliation(s)
- Kiran Sankar Maiti
- Max–Planck–Institut für Quantenoptik, Hans-Kopfermann-Straße 1, 85748 Garching, Germany; ; Tel.: +49-289-14054
- Lehrstuhl für Experimental Physik, Ludwig-Maximilians-Universität München, Am Coulombwall 1, 85748 Garching, Germany
- Laser-Forschungslabor, Klinikum der Universität München, Fraunhoferstrasse 20, 82152 Planegg, Germany
| |
Collapse
|
26
|
Mensa S, Sahin E, Tacchino F, Kl Barkoutsos P, Tavernelli I. Quantum machine learning framework for virtual screening in drug discovery: a prospective quantum advantage. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2023. [DOI: 10.1088/2632-2153/acb900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023] Open
Abstract
Abstract
Machine Learning for ligand based virtual screening (LB-VS) is an important in-silico tool for discovering new drugs in a faster and cost-effective manner, especially for emerging diseases such as COVID-19. In this paper, we propose a general-purpose framework combining a classical Support Vector Classifier algorithm with quantum kernel estimation for LB-VS on real-world databases, and we argue in favor of its prospective quantum advantage. Indeed, we heuristically prove that our quantum integrated workflow can, at least in some relevant instances, provide a tangible advantage compared to state-of-art classical algorithms operating on the same datasets, showing strong dependence on target and features selection method. Finally, we test our algorithm on IBM Quantum processors using ADRB2 and COVID-19 datasets, showing that hardware simulations provide results in line with the predicted performances and can surpass classical equivalents.
Collapse
|
27
|
Béquignon OJM, Bongers BJ, Jespers W, IJzerman AP, van der Water B, van Westen GJP. Papyrus: a large-scale curated dataset aimed at bioactivity predictions. J Cheminform 2023; 15:3. [PMID: 36609528 PMCID: PMC9824924 DOI: 10.1186/s13321-022-00672-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/17/2022] [Indexed: 01/07/2023] Open
Abstract
With the ongoing rapid growth of publicly available ligand-protein bioactivity data, there is a trove of valuable data that can be used to train a plethora of machine-learning algorithms. However, not all data is equal in terms of size and quality and a significant portion of researchers' time is needed to adapt the data to their needs. On top of that, finding the right data for a research question can often be a challenge on its own. To meet these challenges, we have constructed the Papyrus dataset. Papyrus is comprised of around 60 million data points. This dataset contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with several smaller datasets containing high-quality data. The aggregated data has been standardised and normalised in a manner that is suitable for machine learning. We show how data can be filtered in a variety of ways and also perform some examples of quantitative structure-activity relationship analyses and proteochemometric modelling. Our ambition is that this pruned data collection constitutes a benchmark set that can be used for constructing predictive models, while also providing an accessible data source for research.
Collapse
Affiliation(s)
- O. J. M. Béquignon
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - B. J. Bongers
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - W. Jespers
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - A. P. IJzerman
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - B. van der Water
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - G. J. P. van Westen
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| |
Collapse
|
28
|
Lu G, Ou K, Zhang Y, Zhang H, Feng S, Yang Z, Sun G, Liu J, Wei S, Pan S, Chen Z. Structural Analysis, Multi-Conformation Virtual Screening and Molecular Simulation to Identify Potential Inhibitors Targeting pS273R Proteases of African Swine Fever Virus. MOLECULES (BASEL, SWITZERLAND) 2023; 28:molecules28020570. [PMID: 36677630 PMCID: PMC9866604 DOI: 10.3390/molecules28020570] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 12/26/2022] [Accepted: 01/03/2023] [Indexed: 01/09/2023]
Abstract
The African Swine Fever virus (ASFV) causes an infectious viral disease in pigs of all ages. The development of antiviral drugs primarily aimed at inhibition of proteases required for the proteolysis of viral polyproteins. In this study, the conformation of the pS273R protease in physiological states were investigated, virtually screened the multi-protein conformation of pS273R target proteins, combined various molecular docking scoring functions, and identified five potential drugs from the Food and Drug Administration drug library that may inhibit pS273R. Subsequent validation of the dynamic interactions of pS273R with the five putative inhibitors was achieved using molecular dynamics simulations and binding free energy calculations using the molecular mechanics/Poison-Boltzmann (Generalized Born) (MM/PB(GB)SA) surface area. These findings demonstrate that the arm domain and Thr159-Lys167 loop region of pS273R are significantly more flexible compared to the core structural domain, and the Thr159-Lys167 loop region can serve as a "gatekeeper" in the substrate channel. Leucovorin, Carboprost, Protirelin, Flavin Mononucleotide, and Lovastatin Acid all have Gibbs binding free energies with pS273R that were less than -20 Kcal/mol according to the MM/PBSA analyses. In contrast to pS273R in the free energy landscape, the inhibitor and drug complexes of pS273R showed distinct structural group distributions. These five drugs may be used as potential inhibitors of pS273R and may serve as future drug candidates for treating ASFV.
Collapse
Affiliation(s)
- Gen Lu
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, Shenyang Agricultural University, No. 120, Dongling Road, Shenhe District, Shenyang 110866, China
| | - Kang Ou
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, Shenyang Agricultural University, No. 120, Dongling Road, Shenhe District, Shenyang 110866, China
| | - Yihan Zhang
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, Shenyang Agricultural University, No. 120, Dongling Road, Shenhe District, Shenyang 110866, China
| | - Huan Zhang
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, Shenyang Agricultural University, No. 120, Dongling Road, Shenhe District, Shenyang 110866, China
| | - Shouhua Feng
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, Shenyang Agricultural University, No. 120, Dongling Road, Shenhe District, Shenyang 110866, China
| | - Zuofeng Yang
- The Preventive and Control Center of Animal Disease of Liaoning Province, Liaoning Agricultural Development Service Center, No. 95, Renhe Road, Shenbei District, Shenyang 110164, China
| | - Guo Sun
- Qianyuanhao Biological Co., Ltd., Building 20, District 11, No. 188 South Fourth Ring West Road, Fengtai District, Beijing 100070, China
| | - Jinling Liu
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, Shenyang Agricultural University, No. 120, Dongling Road, Shenhe District, Shenyang 110866, China
- Correspondence: (J.L.); (S.W.); (S.P.); (Z.C.); Tel.: +86-13022453165 (J.L.); Fax: +86-24-88487156 (J.L.)
| | - Shu Wei
- The Preventive and Control Center of Animal Disease of Liaoning Province, Liaoning Agricultural Development Service Center, No. 95, Renhe Road, Shenbei District, Shenyang 110164, China
- Correspondence: (J.L.); (S.W.); (S.P.); (Z.C.); Tel.: +86-13022453165 (J.L.); Fax: +86-24-88487156 (J.L.)
| | - Shude Pan
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, Shenyang Agricultural University, No. 120, Dongling Road, Shenhe District, Shenyang 110866, China
- Correspondence: (J.L.); (S.W.); (S.P.); (Z.C.); Tel.: +86-13022453165 (J.L.); Fax: +86-24-88487156 (J.L.)
| | - Zeliang Chen
- Key Laboratory of Livestock Infectious Diseases, Ministry of Education, Shenyang Agricultural University, No. 120, Dongling Road, Shenhe District, Shenyang 110866, China
- Correspondence: (J.L.); (S.W.); (S.P.); (Z.C.); Tel.: +86-13022453165 (J.L.); Fax: +86-24-88487156 (J.L.)
| |
Collapse
|
29
|
Su A, Zhang X, Zhang C, Ding D, Yang YF, Wang K, She YB. Deep transfer learning for predicting frontier orbital energies of organic materials using small data and its application to porphyrin photocatalysts. Phys Chem Chem Phys 2023; 25:10536-10549. [PMID: 36987933 DOI: 10.1039/d3cp00917c] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
Abstract
A deep transfer learning approach is used to predict HOMO/LUMO energies of organic materials with a small amount of training data.
Collapse
Affiliation(s)
- An Su
- College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310014, P. R. China.
| | - Xin Zhang
- College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310014, P. R. China.
| | - Chengwei Zhang
- College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310014, P. R. China.
| | - Debo Ding
- College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310014, P. R. China.
| | - Yun-Fang Yang
- College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310014, P. R. China.
| | - Keke Wang
- College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310014, P. R. China.
| | - Yuan-Bin She
- College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310014, P. R. China.
| |
Collapse
|
30
|
Puch-Giner I, Molina A, Municoy M, Pérez C, Guallar V. Recent PELE Developments and Applications in Drug Discovery Campaigns. Int J Mol Sci 2022; 23:ijms232416090. [PMID: 36555731 PMCID: PMC9788188 DOI: 10.3390/ijms232416090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 12/12/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
Computer simulation techniques are gaining a central role in molecular pharmacology. Due to several factors, including the significant improvements of traditional molecular modelling, the irruption of machine learning methods, the massive data generation, or the unlimited computational resources through cloud computing, the future of pharmacology seems to go hand in hand with in silico predictions. In this review, we summarize our recent efforts in such a direction, centered on the unconventional Monte Carlo PELE software and on its coupling with machine learning techniques. We also provide new data on combining two recent new techniques, aquaPELE capable of exhaustive water sampling and fragPELE, for fragment growing.
Collapse
Affiliation(s)
- Ignasi Puch-Giner
- Barcelona Supercomputing Center, Plaça d’Eusebi Güell, 1-3, 08034 Barcelona, Spain
| | - Alexis Molina
- Nostrum Biodiscovery S.L., Av. de Josep Tarradellas, 8-10, 3-2, 08029 Barcelona, Spain
| | - Martí Municoy
- Barcelona Supercomputing Center, Plaça d’Eusebi Güell, 1-3, 08034 Barcelona, Spain
- Nostrum Biodiscovery S.L., Av. de Josep Tarradellas, 8-10, 3-2, 08029 Barcelona, Spain
| | - Carles Pérez
- Nostrum Biodiscovery S.L., Av. de Josep Tarradellas, 8-10, 3-2, 08029 Barcelona, Spain
| | - Victor Guallar
- Barcelona Supercomputing Center, Plaça d’Eusebi Güell, 1-3, 08034 Barcelona, Spain
- Nostrum Biodiscovery S.L., Av. de Josep Tarradellas, 8-10, 3-2, 08029 Barcelona, Spain
- Correspondence:
| |
Collapse
|
31
|
Muegge I, Hu Y. How do we further enhance 2D fingerprint similarity searching for novel drug discovery? Expert Opin Drug Discov 2022; 17:1173-1176. [PMID: 36150044 DOI: 10.1080/17460441.2022.2128332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Affiliation(s)
| | - Yuan Hu
- Alkermes, Inc, Waltham, Massachusetts, USA
| |
Collapse
|
32
|
Eswaran SCD, Subramaniam S, Sanyal U, Rallo R, Zhang X. Molecular structural dataset of lignin macromolecule elucidating experimental structural compositions. Sci Data 2022; 9:647. [PMID: 36273011 PMCID: PMC9588021 DOI: 10.1038/s41597-022-01709-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 09/20/2022] [Indexed: 11/23/2022] Open
Abstract
Lignin is one of the most abundant biopolymers in nature and has great potential to be transformed into high-value chemicals. However, the limited availability of molecular structure data hinders its potential industrial applications. Herein, we present the Lignin Structural (LGS) Dataset that includes the molecular structure of milled wood lignin focusing on two major monomeric units (coniferyl and syringyl), and the six most common interunit linkages (phenylpropane β-aryl ether, resinol, phenylcoumaran, biphenyl, dibenzodioxocin, and diaryl ether). The dataset constitutes a unique resource that covers a part of lignin’s chemical space characterized by polymer chains with lengths in the range of 3 to 25 monomer units. Structural data were generated using a sequence-controlled polymer generation approach that was calibrated to match experimental lignin properties. The LGS dataset includes 60 K newly generated lignin structures that match with high accuracy (~90%) the experimentally determined structural compositions available in the literature. The LGS dataset is a valuable resource to advance lignin chemistry research, including computational simulation approaches and predictive modelling. Measurement(s) | molecular structure | Technology Type(s) | Computer Modeling | Factor Type(s) | monomer ratio • bond frequency • degree of polymerization | Sample Characteristic - Organism | coniferous (softwood) • deciduous (hardwood) |
Collapse
Affiliation(s)
- Sudha Cheranma Devi Eswaran
- Bioproducts Sciences and Engineering Laboratory, Washington State University, 2710 Crimson Way, Richland, WA, 99354, USA.,Voiland School of Chemical Engineering and Bioengineering, Washington State University, Richland, WA, 99354, USA
| | - Senthil Subramaniam
- Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99354, USA
| | - Udishnu Sanyal
- Bioproducts Sciences and Engineering Laboratory, Washington State University, 2710 Crimson Way, Richland, WA, 99354, USA.,Voiland School of Chemical Engineering and Bioengineering, Washington State University, Richland, WA, 99354, USA
| | - Robert Rallo
- Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99354, USA.
| | - Xiao Zhang
- Bioproducts Sciences and Engineering Laboratory, Washington State University, 2710 Crimson Way, Richland, WA, 99354, USA. .,Voiland School of Chemical Engineering and Bioengineering, Washington State University, Richland, WA, 99354, USA. .,Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99354, USA.
| |
Collapse
|
33
|
Fassio AV, Shub L, Ponzoni L, McKinley J, O’Meara MJ, Ferreira RS, Keiser MJ, de Melo Minardi RC. Prioritizing Virtual Screening with Interpretable Interaction Fingerprints. J Chem Inf Model 2022; 62:4300-4318. [DOI: 10.1021/acs.jcim.2c00695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Alexandre V. Fassio
- São Carlos Institute of Physics, University of São Paulo, São Carlos, São Paulo 13563-120, Brazil
- Department of Biochemistry and Immunology, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil
| | - Laura Shub
- Department of Pharmaceutical Chemistry, Department of Bioengineering & Therapeutic Sciences, Institute for Neurodegenerative Diseases, Kavli Institute for Fundamental Neuroscience, Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California 94143, United States
| | - Luca Ponzoni
- Department of Pharmaceutical Chemistry, Department of Bioengineering & Therapeutic Sciences, Institute for Neurodegenerative Diseases, Kavli Institute for Fundamental Neuroscience, Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California 94143, United States
| | - Jessica McKinley
- Gilead Sciences, Inc., Foster City, California 94404, United States
| | - Matthew J. O’Meara
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Rafaela S. Ferreira
- Department of Biochemistry and Immunology, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil
| | - Michael J. Keiser
- Department of Pharmaceutical Chemistry, Department of Bioengineering & Therapeutic Sciences, Institute for Neurodegenerative Diseases, Kavli Institute for Fundamental Neuroscience, Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California 94143, United States
| | - Raquel C. de Melo Minardi
- Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil
| |
Collapse
|
34
|
In-Silico Drug Toxicity and Interaction Prediction for Plant Complexes Based on Virtual Screening and Text Mining. Int J Mol Sci 2022; 23:ijms231710056. [PMID: 36077464 PMCID: PMC9456415 DOI: 10.3390/ijms231710056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 08/30/2022] [Accepted: 09/01/2022] [Indexed: 11/16/2022] Open
Abstract
Potential drug toxicities and drug interactions of redundant compounds of plant complexes may cause unexpected clinical responses or even severe adverse events. On the other hand, super-additivity of drug interactions between natural products and synthetic drugs may be utilized to gain better performance in disease management. Although without enough datasets for prediction model training, based on the SwissSimilarity and PubChem platforms, for the first time, a feasible workflow of prediction of both toxicity and drug interaction of plant complexes was built in this study. The optimal similarity score threshold for toxicity prediction of this system is 0.6171, based on an analysis of 20 different herbal medicines. From the PubChem database, 31 different sections of toxicity information such as "Acute Effects", "NIOSH Toxicity Data", "Interactions", "Hepatotoxicity", "Carcinogenicity", "Symptoms", and "Human Toxicity Values" sections have been retrieved, with dozens of active compounds predicted to exert potential toxicities. In Spatholobus suberectus Dunn (SSD), there are 9 out of 24 active compounds predicted to play synergistic effects on cancer management with various drugs or factors. The synergism between SSD, luteolin and docetaxel in the management of triple-negative breast cancer was proved by the combination index assay, synergy score detection assay, and xenograft model.
Collapse
|
35
|
Gautam V, Gupta R, Gupta D, Ruhela A, Mittal A, Mohanty SK, Arora S, Gupta R, Saini C, Sengupta D, Murugan NA, Ahuja G. deepGraphh: AI-driven web service for graph-based quantitative structure-activity relationship analysis. Brief Bioinform 2022; 23:6648791. [PMID: 35868454 DOI: 10.1093/bib/bbac288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 06/01/2022] [Accepted: 06/23/2022] [Indexed: 11/12/2022] Open
Abstract
Artificial intelligence (AI)-based computational techniques allow rapid exploration of the chemical space. However, representation of the compounds into computational-compatible and detailed features is one of the crucial steps for quantitative structure-activity relationship (QSAR) analysis. Recently, graph-based methods are emerging as a powerful alternative to chemistry-restricted fingerprints or descriptors for modeling. Although graph-based modeling offers multiple advantages, its implementation demands in-depth domain knowledge and programming skills. Here we introduce deepGraphh, an end-to-end web service featuring a conglomerate of established graph-based methods for model generation for classification or regression tasks. The graphical user interface of deepGraphh supports highly configurable parameter support for model parameter tuning, model generation, cross-validation and testing of the user-supplied query molecules. deepGraphh supports four widely adopted methods for QSAR analysis, namely, graph convolution network, graph attention network, directed acyclic graph and Attentive FP. Comparative analysis revealed that deepGraphh supported methods are comparable to the descriptors-based machine learning techniques. Finally, we used deepGraphh models to predict the blood-brain barrier permeability of human and microbiome-generated metabolites. In summary, deepGraphh offers a one-stop web service for graph-based methods for chemoinformatics.
Collapse
Affiliation(s)
- Vishakha Gautam
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Rahul Gupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Deepti Gupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Anubhav Ruhela
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Aayushi Mittal
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Sanjay Kumar Mohanty
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Sakshi Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Ria Gupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Chandan Saini
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Debarka Sengupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India.,Department of Computer Science and Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India.,Centre for Artificial Intelligence, Indraprastha Institute of Information Technology, New Delhi, India
| | - Natarajan Arul Murugan
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Gaurav Ahuja
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| |
Collapse
|
36
|
AddictedChem: A Data-Driven Integrated Platform for New Psychoactive Substance Identification. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27123931. [PMID: 35745053 PMCID: PMC9227411 DOI: 10.3390/molecules27123931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Revised: 05/28/2022] [Accepted: 06/14/2022] [Indexed: 11/17/2022]
Abstract
The mechanisms underlying drug addiction remain nebulous. Furthermore, new psychoactive substances (NPS) are being developed to circumvent legal control; hence, rapid NPS identification is urgently needed. Here, we present the construction of the comprehensive database of controlled substances, AddictedChem. This database integrates the following information on controlled substances from the US Drug Enforcement Administration: physical and chemical characteristics; classified literature by Medical Subject Headings terms and target binding data; absorption, distribution, metabolism, excretion, and toxicity; and related genes, pathways, and bioassays. We created 29 predictive models for NPS identification using five machine learning algorithms and seven molecular descriptors. The best performing models achieved a balanced accuracy (BA) of 0.940 with an area under the curve (AUC) of 0.986 for the test set and a BA of 0.919 and an AUC of 0.968 for the external validation set, which were subsequently used to identify potential NPS with a consensus strategy. Concurrently, a chemical space that included the properties of vectorised addictive compounds was constructed and integrated with AddictedChem, illustrating the principle of diversely existing NPS from a macro perspective. Based on these potential applications, AddictedChem could be considered a highly promising tool for NPS identification and evaluation.
Collapse
|
37
|
Zheng S, Zeng T, Li C, Chen B, Coley CW, Yang Y, Wu R. Deep learning driven biosynthetic pathways navigation for natural products with BioNavi-NP. Nat Commun 2022; 13:3342. [PMID: 35688826 PMCID: PMC9187661 DOI: 10.1038/s41467-022-30970-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 05/27/2022] [Indexed: 12/30/2022] Open
Abstract
The complete biosynthetic pathways are unknown for most natural products (NPs), it is thus valuable to make computer-aided bio-retrosynthesis predictions. Here, a navigable and user-friendly toolkit, BioNavi-NP, is developed to predict the biosynthetic pathways for both NPs and NP-like compounds. First, a single-step bio-retrosynthesis prediction model is trained using both general organic and biosynthetic reactions through end-to-end transformer neural networks. Based on this model, plausible biosynthetic pathways can be efficiently sampled through an AND-OR tree-based planning algorithm from iterative multi-step bio-retrosynthetic routes. Extensive evaluations reveal that BioNavi-NP can identify biosynthetic pathways for 90.2% of 368 test compounds and recover the reported building blocks as in the test set for 72.8%, 1.7 times more accurate than existing conventional rule-based approaches. The model is further shown to identify biologically plausible pathways for complex NPs collected from the recent literature. The toolkit as well as the curated datasets and learned models are freely available to facilitate the elucidation and reconstruction of the biosynthetic pathways for NPs. The complete biosynthetic pathway from most natural products (NPs) are unknown. Here, the authors report BioNavi-NP, a computational toolkit for bio-retrosynthetic pathway elucidation or reconstruction for both NPs and NP-like compounds.
Collapse
Affiliation(s)
- Shuangjia Zheng
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, 510006, China.,School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China.,Galixir, Beijing, China.,School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China
| | - Tao Zeng
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, 510006, China
| | | | - Binghong Chen
- College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China.
| | - Ruibo Wu
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, 510006, China.
| |
Collapse
|
38
|
Zeng T, Hess BA, Zhang F, Wu R. Bio-inspired chemical space exploration of terpenoids. Brief Bioinform 2022; 23:6586263. [PMID: 35576010 DOI: 10.1093/bib/bbac197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 04/26/2022] [Accepted: 04/28/2022] [Indexed: 11/12/2022] Open
Abstract
Many computational methods are devoted to rapidly generating pseudo-natural products to expand the open-ended border of chemical spaces for natural products. However, the accessibility and chemical interpretation were often ignored or underestimated in conventional library/fragment-based or rule-based strategies, thus hampering experimental synthesis. Herein, a bio-inspired strategy (named TeroGen) is developed to mimic the two key biosynthetic stages (cyclization and decoration) of terpenoid natural products, by utilizing physically based simulations and deep learning models, respectively. The precision and efficiency are validated for different categories of terpenoids, and in practice, more than 30 000 sesterterpenoids (10 times as many as the known sesterterpenoids) are predicted to be linked in a reaction network, and their synthetic accessibility and chemical interpretation are estimated by thermodynamics and kinetics. Since it could not only greatly expand the chemical space of terpenoids but also numerate plausible biosynthetic routes, TeroGen is promising for accelerating heterologous biosynthesis, bio-mimic and chemical synthesis of complicated terpenoids and derivatives.
Collapse
Affiliation(s)
- Tao Zeng
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, P.R. China
| | | | - Fan Zhang
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, P.R. China
| | - Ruibo Wu
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, P.R. China
| |
Collapse
|
39
|
Seidl P, Renz P, Dyubankova N, Neves P, Verhoeven J, Wegner JK, Segler M, Hochreiter S, Klambauer G. Improving Few- and Zero-Shot Reaction Template Prediction Using Modern Hopfield Networks. J Chem Inf Model 2022; 62:2111-2120. [PMID: 35034452 PMCID: PMC9092346 DOI: 10.1021/acs.jcim.1c01065] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Indexed: 12/17/2022]
Abstract
Finding synthesis routes for molecules of interest is essential in the discovery of new drugs and materials. To find such routes, computer-assisted synthesis planning (CASP) methods are employed, which rely on a single-step model of chemical reactivity. In this study, we introduce a template-based single-step retrosynthesis model based on Modern Hopfield Networks, which learn an encoding of both molecules and reaction templates in order to predict the relevance of templates for a given molecule. The template representation allows generalization across different reactions and significantly improves the performance of template relevance prediction, especially for templates with few or zero training examples. With inference speed up to orders of magnitude faster than baseline methods, we improve or match the state-of-the-art performance for top-k exact match accuracy for k ≥ 3 in the retrosynthesis benchmark USPTO-50k. Code to reproduce the results is available at github.com/ml-jku/mhn-react.
Collapse
Affiliation(s)
- Philipp Seidl
- ELLIS
Unit Linz, LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Altenbergerstraße 69, Linz, Austria 4040
| | - Philipp Renz
- ELLIS
Unit Linz, LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Altenbergerstraße 69, Linz, Austria 4040
| | - Natalia Dyubankova
- Janssen
Pharmaceutica NV, High Dimensional Biology and Discovery Data Sciences, Janssen Research & Development, Turnhoutseweg 30, Beerse, Belgium 2340
| | - Paulo Neves
- Janssen
Pharmaceutica NV, High Dimensional Biology and Discovery Data Sciences, Janssen Research & Development, Turnhoutseweg 30, Beerse, Belgium 2340
| | - Jonas Verhoeven
- Janssen
Pharmaceutica NV, High Dimensional Biology and Discovery Data Sciences, Janssen Research & Development, Turnhoutseweg 30, Beerse, Belgium 2340
| | - Jörg K. Wegner
- Janssen Research & Development, LLC, In-Silico Discovery and
External Innovation (ISD&EI), 1 Cambridge Center, 255 Main St, Cambridge, Massachusetts 02142, United States
| | - Marwin Segler
- Microsoft
Research, 21 Station Road, Cambridge, United Kingdom CB1 2FB
| | - Sepp Hochreiter
- ELLIS
Unit Linz, LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Altenbergerstraße 69, Linz, Austria 4040
- Institute
of Advanced Research in Artificial Intelligence, Landstraßer Hauptstraße 5, Wien, Austria 1030
| | - Günter Klambauer
- ELLIS
Unit Linz, LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Altenbergerstraße 69, Linz, Austria 4040
| |
Collapse
|
40
|
Warr WA, Nicklaus MC, Nicolaou CA, Rarey M. Exploration of Ultralarge Compound Collections for Drug Discovery. J Chem Inf Model 2022; 62:2021-2034. [PMID: 35421301 DOI: 10.1021/acs.jcim.2c00224] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Designing new medicines more cheaply and quickly is tightly linked to the quest of exploring chemical space more widely and efficiently. Chemical space is monumentally large, but recent advances in computer software and hardware have enabled researchers to navigate virtual chemical spaces containing billions of chemical structures. This review specifically concerns collections of many millions or even billions of enumerated chemical structures as well as even larger chemical spaces that are not fully enumerated. We present examples of chemical libraries and spaces and the means used to construct them, and we discuss new technologies for searching huge libraries and for searching combinatorially in chemical space. We also cover space navigation techniques and consider new approaches to de novo drug design and the impact of the "autonomous laboratory" on synthesis of designed compounds. Finally, we summarize some other challenges and opportunities for the future.
Collapse
Affiliation(s)
- Wendy A Warr
- Wendy Warr & Associates, 6 Berwick Court, Holmes Chapel, Crewe, Cheshire CW4 7HZ, United Kingdom
| | - Marc C Nicklaus
- NCI, NIH, CADD Group, NCI-Frederick, Frederick, Maryland 21702, United States
| | - Christos A Nicolaou
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Matthias Rarey
- Universität Hamburg, ZBH Center for Bioinformatics, 20146 Hamburg, Germany
| |
Collapse
|
41
|
Probst D, Schwaller P, Reymond JL. Reaction classification and yield prediction using the differential reaction fingerprint DRFP. DIGITAL DISCOVERY 2022; 1:91-97. [PMID: 35515081 PMCID: PMC8996827 DOI: 10.1039/d1dd00006c] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 01/12/2022] [Indexed: 01/19/2023]
Abstract
Predicting the nature and outcome of reactions using computational methods is a crucial tool to accelerate chemical research. The recent application of deep learning-based learned fingerprints to reaction classification and reaction yield prediction has shown an impressive increase in performance compared to previous methods such as DFT- and structure-based fingerprints. However, learned fingerprints require large training data sets, are inherently biased, and are based on complex deep learning architectures. Here we present the differential reaction fingerprint DRFP. The DRFP algorithm takes a reaction SMILES as an input and creates a binary fingerprint based on the symmetric difference of two sets containing the circular molecular n-grams generated from the molecules listed left and right from the reaction arrow, respectively, without the need for distinguishing between reactants and reagents. We show that DRFP performs better than DFT-based fingerprints in reaction yield prediction and other structure-based fingerprints in reaction classification, reaching the performance of state-of-the-art learned fingerprints in both tasks while being data-independent. Differential Reaction Fingerprint DRFP is a chemical reaction fingerprint enabling simple machine learning models running on standard hardware to reach DFT- and deep learning-based accuracies in reaction yield prediction and reaction classification.![]()
Collapse
Affiliation(s)
- Daniel Probst
- Department of Chemistry and Biochemistry, University of Bern Freiestrasse 3 3012 Bern Switzerland
| | | | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern Freiestrasse 3 3012 Bern Switzerland
| |
Collapse
|
42
|
Su A, Cheng Y, Xue H, She Y, Rajan K. Artificial intelligence informed toxicity screening of amine chemistries used in the synthesis of hybrid
organic–inorganic
perovskites. AIChE J 2022. [DOI: 10.1002/aic.17699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- An Su
- College of Chemical Engineering Zhejiang University of Technology Hangzhou China
- Department of Materials Design and Innovation University at Buffalo Buffalo New York USA
| | - Yingying Cheng
- College of Chemical Engineering Zhejiang University of Technology Hangzhou China
| | - Haotian Xue
- Collaborative Innovation Center of Yangtze River Delta Region Green Pharmaceuticals Zhejiang University of Technology Hangzhou China
| | - Yuanbin She
- College of Chemical Engineering Zhejiang University of Technology Hangzhou China
| | - Krishna Rajan
- Department of Materials Design and Innovation University at Buffalo Buffalo New York USA
| |
Collapse
|
43
|
Yu TH, Su BH, Battalora LC, Liu S, Tseng YJ. Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power. Brief Bioinform 2022; 23:bbab377. [PMID: 34530437 PMCID: PMC8769704 DOI: 10.1093/bib/bbab377] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 07/30/2021] [Accepted: 08/23/2021] [Indexed: 12/28/2022] Open
Abstract
The trade-off between a machine learning (ML) and deep learning (DL) model's predictability and its interpretability has been a rising concern in central nervous system-related quantitative structure-activity relationship (CNS-QSAR) analysis. Many state-of-the-art predictive modeling failed to provide structural insights due to their black box-like nature. Lack of interpretability and further to provide easy simple rules would be challenging for CNS-QSAR models. To address these issues, we develop a protocol to combine the power of ML and DL to generate a set of simple rules that are easy to interpret with high prediction power. A data set of 940 market drugs (315 CNS-active, 625 CNS-inactive) with support vector machine and graph convolutional network algorithms were used. Individual ML/DL modeling methods were also constructed for comparison. The performance of these models was evaluated using an additional external dataset of 117 market drugs (42 CNS-active, 75 CNS-inactive). Fingerprint-split validation was adopted to ensure model stringency and generalizability. The resulting novel hybrid ensemble model outperformed other constituent traditional QSAR models with an accuracy of 0.96 and an F1 score of 0.95. With the power of the interpretability provided with this protocol, our model laid down a set of simple physicochemical rules to determine whether a compound can be a CNS drug using six sub-structural features. These rules displayed higher classification ability than classical guidelines, with higher specificity and more mechanistic insights than just for blood-brain barrier permeability. This hybrid protocol can potentially be used for other drug property predictions.
Collapse
Affiliation(s)
- Tzu-Hui Yu
- National Taiwan University in Bio-Industry Communication and Development, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106
| | - Bo-Han Su
- Department of Computer Science and Information Engineering of National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106
| | | | - Sin Liu
- Graduate Institute of Biomedical Electronics and Bioinformatics of National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106
| | - Yufeng Jane Tseng
- Graduate Institute of Biomedical Electronics and Bioinformatics, Department of Computer Science and Information Engineering and School of Pharmacy at National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106
| |
Collapse
|
44
|
The SwissSimilarity 2021 Web Tool: Novel Chemical Libraries and Additional Methods for an Enhanced Ligand-Based Virtual Screening Experience. Int J Mol Sci 2022; 23:ijms23020811. [PMID: 35054998 PMCID: PMC8776004 DOI: 10.3390/ijms23020811] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 01/06/2022] [Accepted: 01/07/2022] [Indexed: 01/27/2023] Open
Abstract
Hit finding, scaffold hopping, and structure–activity relationship studies are important tasks in rational drug discovery. Implementation of these tasks strongly depends on the availability of compounds similar to a known bioactive molecule. SwissSimilarity is a web tool for low-to-high-throughput virtual screening of multiple chemical libraries to find molecules similar to a compound of interest. According to the similarity principle, the output list of molecules generated by SwissSimilarity is expected to be enriched in compounds that are likely to share common protein targets with the query molecule and that can, therefore, be acquired and tested experimentally in priority. Compound libraries available for screening using SwissSimilarity include approved drugs, clinical candidates, known bioactive molecules, commercially available and synthetically accessible compounds. The first version of SwissSimilarity launched in 2015 made use of various 2D and 3D molecular descriptors, including path-based FP2 fingerprints and ElectroShape vectors. However, during the last few years, new fingerprinting methods for molecular description have been developed or have become popular. Here we would like to announce the launch of the new version of the SwissSimilarity web tool, which features additional 2D and 3D methods for estimation of molecular similarity: extended-connectivity, MinHash, 2D pharmacophore, extended reduced graph, and extended 3D fingerprints. Moreover, it is now possible to screen for molecular structures having the same scaffold as the query compound. Additionally, all compound libraries available for screening in SwissSimilarity have been updated, and several new ones have been added to the list. Finally, the interface of the website has been comprehensively rebuilt to provide a better user experience. The new version of SwissSimilarity is freely available starting from December 2021.
Collapse
|
45
|
Pujol‐Giménez J, Poirier M, Bühlmann S, Schuppisser C, Bhardwaj R, Awale M, Visini R, Javor S, Hediger MA, Reymond J. Inhibitors of Human Divalent Metal Transporters DMT1 (SLC11A2) and ZIP8 (SLC39A8) from a GDB-17 Fragment Library. ChemMedChem 2021; 16:3306-3314. [PMID: 34309203 PMCID: PMC8596699 DOI: 10.1002/cmdc.202100467] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Indexed: 11/06/2022]
Abstract
Solute carrier proteins (SLCs) are membrane proteins controlling fluxes across biological membranes and represent an emerging class of drug targets. Here we searched for inhibitors of divalent metal transporters in a library of 1,676 commercially available 3D-shaped fragment-like molecules from the generated database GDB-17, which lists all possible organic molecules up to 17 atoms of C, N, O, S and halogen following simple criteria for chemical stability and synthetic feasibility. While screening against DMT1 (SLC11A2), an iron transporter associated with hemochromatosis and for which only very few inhibitors are known, only yielded two weak inhibitors, our approach led to the discovery of the first inhibitor of ZIP8 (SLC39A8), a zinc transporter associated with manganese homeostasis and osteoarthritis but with no previously reported pharmacology, demonstrating that this target is druggable.
Collapse
Affiliation(s)
- Jonai Pujol‐Giménez
- Department of Biomedical Research and Department of Nephrology and Hypertension Membrane Transport Discovery Lab Inselspital, Bern University HospitalUniversity of BernCH-3010BernSwitzerland
| | - Marion Poirier
- Department of Chemistry Biochemistry and Pharmaceutical SciencesUniversity of BernFreiestrasse 33012BernSwitzerland
| | - Sven Bühlmann
- Department of Chemistry Biochemistry and Pharmaceutical SciencesUniversity of BernFreiestrasse 33012BernSwitzerland
| | - Céline Schuppisser
- Department of Chemistry Biochemistry and Pharmaceutical SciencesUniversity of BernFreiestrasse 33012BernSwitzerland
| | - Rajesh Bhardwaj
- Department of Biomedical Research and Department of Nephrology and Hypertension Membrane Transport Discovery Lab Inselspital, Bern University HospitalUniversity of BernCH-3010BernSwitzerland
| | - Mahendra Awale
- Department of Chemistry Biochemistry and Pharmaceutical SciencesUniversity of BernFreiestrasse 33012BernSwitzerland
| | - Ricardo Visini
- Department of Chemistry Biochemistry and Pharmaceutical SciencesUniversity of BernFreiestrasse 33012BernSwitzerland
| | - Sacha Javor
- Department of Chemistry Biochemistry and Pharmaceutical SciencesUniversity of BernFreiestrasse 33012BernSwitzerland
| | - Matthias A. Hediger
- Department of Biomedical Research and Department of Nephrology and Hypertension Membrane Transport Discovery Lab Inselspital, Bern University HospitalUniversity of BernCH-3010BernSwitzerland
| | - Jean‐Louis Reymond
- Department of Chemistry Biochemistry and Pharmaceutical SciencesUniversity of BernFreiestrasse 33012BernSwitzerland
| |
Collapse
|
46
|
Leguy J, Glavatskikh M, Cauchy T, Da Mota B. Scalable estimator of the diversity for de novo molecular generation resulting in a more robust QM dataset (OD9) and a more efficient molecular optimization. J Cheminform 2021; 13:76. [PMID: 34600576 PMCID: PMC8487551 DOI: 10.1186/s13321-021-00554-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 09/15/2021] [Indexed: 01/21/2023] Open
Abstract
Chemical diversity is one of the key term when dealing with machine learning and molecular generation. This is particularly true for quantum chemical datasets. The composition of which should be done meticulously since the calculation is highly time demanding. Previously we have seen that the most known quantum chemical dataset QM9 lacks chemical diversity. As a consequence, ML models trained on QM9 showed generalizability shortcomings. In this paper we would like to present (i) a fast and generic method to evaluate chemical diversity, (ii) a new quantum chemical dataset of 435k molecules, OD9, that includes QM9 and new molecules generated with a diversity objective, (iii) an analysis of the diversity impact on unconstrained and goal-directed molecular generation on the example of QED optimization. Our innovative approach makes it possible to individually estimate the impact of a solution to the diversity of a set, allowing for effective incremental evaluation. In the first application, we will see how the diversity constraint allows us to generate more than a million of molecules that would efficiently complete the reference datasets. The compounds were calculated with DFT thanks to a collaborative effort through the QuChemPedIA@home BOINC project. With regard to goal-directed molecular generation, getting a high QED score is not complicated, but adding a little diversity can cut the number of calls to the evaluation function by a factor of ten.
Collapse
Affiliation(s)
- Jules Leguy
- Univ Angers, LERIA, SFR MATHSTIC, 49000, Angers, France
| | - Marta Glavatskikh
- Univ Angers, LERIA, SFR MATHSTIC, 49000, Angers, France.,Univ Angers, CNRS, MOLTECH-ANJOU, SFR MATRIX, 49000, Angers, France
| | - Thomas Cauchy
- Univ Angers, CNRS, MOLTECH-ANJOU, SFR MATRIX, 49000, Angers, France.
| | - Benoit Da Mota
- Univ Angers, LERIA, SFR MATHSTIC, 49000, Angers, France.
| |
Collapse
|
47
|
Wang D, Yu J, Chen L, Li X, Jiang H, Chen K, Zheng M, Luo X. A hybrid framework for improving uncertainty quantification in deep learning-based QSAR regression modeling. J Cheminform 2021; 13:69. [PMID: 34544485 PMCID: PMC8454160 DOI: 10.1186/s13321-021-00551-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 09/05/2021] [Indexed: 11/24/2022] Open
Abstract
Reliable uncertainty quantification for statistical models is crucial in various downstream applications, especially for drug design and discovery where mistakes may incur a large amount of cost. This topic has therefore absorbed much attention and a plethora of methods have been proposed over the past years. The approaches that have been reported so far can be mainly categorized into two classes: distance-based approaches and Bayesian approaches. Although these methods have been widely used in many scenarios and shown promising performance with their distinct superiorities, being overconfident on out-of-distribution examples still poses challenges for the deployment of these techniques in real-world applications. In this study we investigated a number of consensus strategies in order to combine both distance-based and Bayesian approaches together with post-hoc calibration for improved uncertainty quantification in QSAR (Quantitative Structure-Activity Relationship) regression modeling. We employed a set of criteria to quantitatively assess the ranking and calibration ability of these models. Experiments based on 24 bioactivity datasets were designed to make critical comparison between the model we proposed and other well-studied baseline models. Our findings indicate that the hybrid framework proposed by us can robustly enhance the model ability of ranking absolute errors. Together with post-hoc calibration on the validation set, we show that well-calibrated uncertainty quantification results can be obtained in domain shift settings. The complementarity between different methods is also conceptually analyzed.
Collapse
Affiliation(s)
- Dingyan Wang
- Shanghai Key Laboratory of Forensic Medicine, Academy of Forensic Science, Shanghai, 200063, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Jie Yu
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Lifan Chen
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xutong Li
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Hualiang Jiang
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Kaixian Chen
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Mingyue Zheng
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China.
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
| | - Xiaomin Luo
- Shanghai Key Laboratory of Forensic Medicine, Academy of Forensic Science, Shanghai, 200063, China.
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China.
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
| |
Collapse
|
48
|
Amendola G, Cosconati S. PyRMD: A New Fully Automated AI-Powered Ligand-Based Virtual Screening Tool. J Chem Inf Model 2021; 61:3835-3845. [PMID: 34270903 DOI: 10.1021/acs.jcim.1c00653] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Artificial intelligence (AI) algorithms are dramatically redefining the current drug discovery landscape by boosting the efficiency of its various steps. Still, their implementation often requires a certain level of expertise in AI paradigms and coding. This often prevents the use of these powerful methodologies by non-expert users involved in the design of new biologically active compounds. Here, the random matrix discriminant (RMD) algorithm, a high-performance AI method specifically tailored for the identification of new ligands, was implemented in a new fully automated tool, PyRMD. This ligand-based virtual screening tool can be trained using target bioactivity data directly downloaded from the ChEMBL repository without manual intervention. The software automatically splits the available training compounds into active and inactive sets and learns the distinctive chemical features responsible for the compounds' activity/inactivity. PyRMD was designed to easily screen millions of compounds in hours through an automated workflow and intuitive input files, allowing fine tuning of each parameter of the calculation. Additionally, PyRMD features a wealth of benchmark metrics, to accurately probe the model performance, which were used here to gauge its predictive potential and limitations. PyRMD is freely available on GitHub (https://github.com/cosconatilab/PyRMD) as an open-source tool.
Collapse
Affiliation(s)
- Giorgio Amendola
- DiSTABiF, University of Campania Luigi Vanvitelli, Via Vivaldi 43, 81100 Caserta, Italy
| | - Sandro Cosconati
- DiSTABiF, University of Campania Luigi Vanvitelli, Via Vivaldi 43, 81100 Caserta, Italy
| |
Collapse
|
49
|
Kreutter D, Schwaller P, Reymond JL. Predicting enzymatic reactions with a molecular transformer. Chem Sci 2021; 12:8648-8659. [PMID: 34257863 PMCID: PMC8246114 DOI: 10.1039/d1sc02362d] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 05/24/2021] [Indexed: 11/29/2022] Open
Abstract
The use of enzymes for organic synthesis allows for simplified, more economical and selective synthetic routes not accessible to conventional reagents. However, predicting whether a particular molecule might undergo a specific enzyme transformation is very difficult. Here we used multi-task transfer learning to train the molecular transformer, a sequence-to-sequence machine learning model, with one million reactions from the US Patent Office (USPTO) database combined with 32 181 enzymatic transformations annotated with a text description of the enzyme. The resulting enzymatic transformer model predicts the structure and stereochemistry of enzyme-catalyzed reaction products with remarkable accuracy. One of the key novelties is that we combined the reaction SMILES language of only 405 atomic tokens with thousands of human language tokens describing the enzymes, such that our enzymatic transformer not only learned to interpret SMILES, but also the natural language as used by human experts to describe enzymes and their mutations.
Collapse
Affiliation(s)
- David Kreutter
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
| | - Philippe Schwaller
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
- IBM Research Europe Säumerstrasse 4 8803 Rüschlikon Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
| |
Collapse
|
50
|
McGill C, Forsuelo M, Guan Y, Green WH. Predicting Infrared Spectra with Message Passing Neural Networks. J Chem Inf Model 2021; 61:2594-2609. [PMID: 34048221 DOI: 10.1021/acs.jcim.1c00055] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Infrared (IR) spectroscopy remains an important tool for chemical characterization and identification. Chemprop-IR has been developed as a software package for the prediction of IR spectra through the use of machine learning. This work serves the dual purpose of providing a trained general-purpose model for the prediction of IR spectra with ease and providing the Chemprop-IR software framework for the training of new models. In Chemprop-IR, molecules are encoded using a directed message passing neural network, allowing for molecule latent representations to be learned and optimized for the task of spectral predictions. Model training incorporates spectra metrics and normalization techniques that offer better performance with spectral predictions than standard practice in regression models. The model makes use of pretraining using quantum chemistry calculations and ensembling of multiple submodels to improve generalizability and performance. The spectral predictions that result are of high quality, showing capability to capture the extreme diversity of spectral forms over chemical space and represent complex peak structures.
Collapse
Affiliation(s)
- Charles McGill
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Michael Forsuelo
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Yanfei Guan
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|