1
|
Tang X, Tran A, Tan J, Gerstein MB. MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations. Bioinformatics 2024; 40:i357-i368. [PMID: 38940177 PMCID: PMC11256921 DOI: 10.1093/bioinformatics/btae260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION The current paradigm of deep learning models for the joint representation of molecules and text primarily relies on 1D or 2D molecular formats, neglecting significant 3D structural information that offers valuable physical insight. This narrow focus inhibits the models' versatility and adaptability across a wide range of modalities. Conversely, the limited research focusing on explicit 3D representation tends to overlook textual data within the biomedical domain. RESULTS We present a unified pre-trained language model, MolLM, that concurrently captures 2D and 3D molecular information alongside biomedical text. MolLM consists of a text Transformer encoder and a molecular Transformer encoder, designed to encode both 2D and 3D molecular structures. To support MolLM's self-supervised pre-training, we constructed 160K molecule-text pairings. Employing contrastive learning as a supervisory signal for learning, MolLM demonstrates robust molecular representation capabilities across four downstream tasks, including cross-modal molecule and text matching, property prediction, captioning, and text-prompted molecular editing. Through ablation, we demonstrate that the inclusion of explicit 3D representations improves performance in these downstream tasks. AVAILABILITY AND IMPLEMENTATION Our code, data, pre-trained model weights, and examples of using our model are all available at https://github.com/gersteinlab/MolLM. In particular, we provide Jupyter Notebooks offering step-by-step guidance on how to use MolLM to extract embeddings for both molecules and text.
Collapse
Affiliation(s)
- Xiangru Tang
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Andrew Tran
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Jeffrey Tan
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Mark B Gerstein
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT 06520, United States
- Department of Statistics & Data Science, Yale University, New Haven, CT 06520, United States
- Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, United States
| |
Collapse
|
2
|
Ghiandoni GM, Flanagan SR, Bodkin MJ, Nizi MG, Galera‐Prat A, Brai A, Chen B, Wallace JEA, Hristozov D, Webster J, Manfroni G, Lehtiö L, Tabarrini O, Gillet VJ. Synthetically accessible de novo design using reaction vectors: Application to PARP1 inhibitors. Mol Inform 2024; 43:e202300183. [PMID: 38258328 PMCID: PMC11475289 DOI: 10.1002/minf.202300183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 01/16/2024] [Accepted: 01/22/2024] [Indexed: 01/24/2024]
Abstract
De novo design has been a hotly pursued topic for many years. Most recent developments have involved the use of deep learning methods for generative molecular design. Despite increasing levels of algorithmic sophistication, the design of molecules that are synthetically accessible remains a major challenge. Reaction-based de novo design takes a conceptually simpler approach and aims to address synthesisability directly by mimicking synthetic chemistry and driving structural transformations by known reactions that are applied in a stepwise manner. However, the use of a small number of hand-coded transformations restricts the chemical space that can be accessed and there are few examples in the literature where molecules and their synthetic routes have been designed and executed successfully. Here we describe the application of reaction-based de novo design to the design of synthetically accessible and biologically active compounds as proof-of-concept of our reaction vector-based software. Reaction vectors are derived automatically from known reactions and allow access to a wide region of synthetically accessible chemical space. The design was aimed at producing molecules that are active against PARP1 and which have improved brain penetration properties compared to existing PARP1 inhibitors. We synthesised a selection of the designed molecules according to the provided synthetic routes and tested them experimentally. The results demonstrate that reaction vectors can be applied to the design of novel molecules of biological relevance that are also synthetically accessible.
Collapse
Affiliation(s)
- Gian Marco Ghiandoni
- Information SchoolUniversity of SheffieldRegent Court, 211 PortobelloSheffieldS1 4DPUK
| | | | - Michael J. Bodkin
- Evotec (U.K.) Ltd114 Innovation Drive, Milton ParkAbingdonOX14 4RZUK
| | - Maria Giulia Nizi
- Department of Pharmaceutical SciencesUniversity of Perugia06123PerugiaItaly
| | - Albert Galera‐Prat
- Faculty of Biochemistry and Molecular Medicine & Biocenter OuluUniversity of OuluOuluFI-90014Finland
| | - Annalaura Brai
- Department of Biotechnology, Chemistry and PharmacyUniversity of SienaI-53100SienaItaly
| | - Beining Chen
- Department of ChemistryUniversity of SheffieldDainton Building, Brook HillSheffieldS3 7HFUK
| | | | - Dimitar Hristozov
- Evotec (U.K.) Ltd114 Innovation Drive, Milton ParkAbingdonOX14 4RZUK
| | - James Webster
- Information SchoolUniversity of SheffieldRegent Court, 211 PortobelloSheffieldS1 4DPUK
| | - Giuseppe Manfroni
- Department of Pharmaceutical SciencesUniversity of Perugia06123PerugiaItaly
| | - Lari Lehtiö
- Faculty of Biochemistry and Molecular Medicine & Biocenter OuluUniversity of OuluOuluFI-90014Finland
| | - Oriana Tabarrini
- Department of Pharmaceutical SciencesUniversity of Perugia06123PerugiaItaly
| | - Valerie J. Gillet
- Information SchoolUniversity of SheffieldRegent Court, 211 PortobelloSheffieldS1 4DPUK
| |
Collapse
|
3
|
Chen B, Pan Z, Mou M, Zhou Y, Fu W. Is fragment-based graph a better graph-based molecular representation for drug design? A comparison study of graph-based models. Comput Biol Med 2024; 169:107811. [PMID: 38168647 DOI: 10.1016/j.compbiomed.2023.107811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 11/23/2023] [Accepted: 12/03/2023] [Indexed: 01/05/2024]
Abstract
Graph Neural Networks (GNNs) have gained significant traction in various sectors of AI-driven drug design. Over recent years, the integration of fragmentation concepts into GNNs has emerged as a potent strategy to augment the efficacy of molecular generative models. Nonetheless, challenges such as symmetry breaking and potential misrepresentation of intricate cycles and undefined functional groups raise questions about the superiority of fragment-based graph representation over traditional methods. In our research, we undertook a rigorous evaluation, contrasting the predictive prowess of eight models-developed using deep learning algorithms-across 12 benchmark datasets that span a range of properties. These models encompass established methods like GCN, AttentiveFP, and D-MPNN, as well as innovative fragment-based representation techniques. Our results indicate that fragment-based methodologies, notably PharmHGT, significantly improve model performance and interpretability, particularly in scenarios characterized by limited data availability. However, in situations with extensive training, fragment-based molecular graph representations may not necessarily eclipse traditional methods. In summation, we posit that the integration of fragmentation, as an avant-garde technique in drug design, harbors considerable promise for the future of AI-enhanced drug design.
Collapse
Affiliation(s)
- Baiyu Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 202103, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Yuan Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Wei Fu
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 202103, China.
| |
Collapse
|
4
|
Carracedo-Reboredo P, Aranzamendi E, He S, Arrasate S, Munteanu CR, Fernandez-Lozano C, Sotomayor N, Lete E, González-Díaz H. MATEO: intermolecular α-amidoalkylation theoretical enantioselectivity optimization. Online tool for selection and design of chiral catalysts and products. J Cheminform 2024; 16:9. [PMID: 38254200 PMCID: PMC10804835 DOI: 10.1186/s13321-024-00802-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 01/11/2024] [Indexed: 01/24/2024] Open
Abstract
The enantioselective Brønsted acid-catalyzed α-amidoalkylation reaction is a useful procedure is for the production of new drugs and natural products. In this context, Chiral Phosphoric Acid (CPA) catalysts are versatile catalysts for this type of reactions. The selection and design of new CPA catalysts for different enantioselective reactions has a dual interest because new CPA catalysts (tools) and chiral drugs or materials (products) can be obtained. However, this process is difficult and time consuming if approached from an experimental trial and error perspective. In this work, an Heuristic Perturbation-Theory and Machine Learning (HPTML) algorithm was used to seek a predictive model for CPA catalysts performance in terms of enantioselectivity in α-amidoalkylation reactions with R2 = 0.96 overall for training and validation series. It involved a Monte Carlo sampling of > 100,000 pairs of query and reference reactions. In addition, the computational and experimental investigation of a new set of intermolecular α-amidoalkylation reactions using BINOL-derived N-triflylphosphoramides as CPA catalysts is reported as a case of study. The model was implemented in a web server called MATEO: InterMolecular Amidoalkylation Theoretical Enantioselectivity Optimization, available online at: https://cptmltool.rnasa-imedir.com/CPTMLTools-Web/mateo . This new user-friendly online computational tool would enable sustainable optimization of reaction conditions that could lead to the design of new CPA catalysts along with new organic synthesis products.
Collapse
Affiliation(s)
- Paula Carracedo-Reboredo
- Department of Organic and Inorganic Chemistry, Faculty of Science and Technology, University of The Basque Country (UPV/EHU), P.O. Box 644, 48080, Bilbao, Spain
- Department of Computer Science and Information Technologies, Faculty of Computer Science, CITIC-Research Center of Information and Communication Technologies, University of A Coruña, Campus Elviña s/n, 15071, A Coruña, Spain
| | - Eider Aranzamendi
- Department of Organic and Inorganic Chemistry, Faculty of Science and Technology, University of The Basque Country (UPV/EHU), P.O. Box 644, 48080, Bilbao, Spain
| | - Shan He
- Department of Organic and Inorganic Chemistry, Faculty of Science and Technology, University of The Basque Country (UPV/EHU), P.O. Box 644, 48080, Bilbao, Spain
- IKERDATA S.L., ZITEK, University of Basque Country UPVEHU, Rectorate Building, 48940, Leioa, Spain
| | - Sonia Arrasate
- Department of Organic and Inorganic Chemistry, Faculty of Science and Technology, University of The Basque Country (UPV/EHU), P.O. Box 644, 48080, Bilbao, Spain
| | - Cristian R Munteanu
- Department of Computer Science and Information Technologies, Faculty of Computer Science, CITIC-Research Center of Information and Communication Technologies, University of A Coruña, Campus Elviña s/n, 15071, A Coruña, Spain
| | - Carlos Fernandez-Lozano
- Department of Computer Science and Information Technologies, Faculty of Computer Science, CITIC-Research Center of Information and Communication Technologies, University of A Coruña, Campus Elviña s/n, 15071, A Coruña, Spain
| | - Nuria Sotomayor
- Department of Organic and Inorganic Chemistry, Faculty of Science and Technology, University of The Basque Country (UPV/EHU), P.O. Box 644, 48080, Bilbao, Spain.
| | - Esther Lete
- Department of Organic and Inorganic Chemistry, Faculty of Science and Technology, University of The Basque Country (UPV/EHU), P.O. Box 644, 48080, Bilbao, Spain.
| | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, Faculty of Science and Technology, University of The Basque Country (UPV/EHU), P.O. Box 644, 48080, Bilbao, Spain.
- IKERBASQUE, Basque Foundation for Science, 48011, Bilbao, Spain.
| |
Collapse
|
5
|
Bi X, Lin L, Chen Z, Ye J. Artificial Intelligence for Surface-Enhanced Raman Spectroscopy. SMALL METHODS 2024; 8:e2301243. [PMID: 37888799 DOI: 10.1002/smtd.202301243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Surface-enhanced Raman spectroscopy (SERS), well acknowledged as a fingerprinting and sensitive analytical technique, has exerted high applicational value in a broad range of fields including biomedicine, environmental protection, food safety among the others. In the endless pursuit of ever-sensitive, robust, and comprehensive sensing and imaging, advancements keep emerging in the whole pipeline of SERS, from the design of SERS substrates and reporter molecules, synthetic route planning, instrument refinement, to data preprocessing and analysis methods. Artificial intelligence (AI), which is created to imitate and eventually exceed human behaviors, has exhibited its power in learning high-level representations and recognizing complicated patterns with exceptional automaticity. Therefore, facing up with the intertwining influential factors and explosive data size, AI has been increasingly leveraged in all the above-mentioned aspects in SERS, presenting elite efficiency in accelerating systematic optimization and deepening understanding about the fundamental physics and spectral data, which far transcends human labors and conventional computations. In this review, the recent progresses in SERS are summarized through the integration of AI, and new insights of the challenges and perspectives are provided in aim to better gear SERS toward the fast track.
Collapse
Affiliation(s)
- Xinyuan Bi
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
| | - Li Lin
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
| | - Zhou Chen
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
| | - Jian Ye
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
- Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200127, P. R. China
- Shanghai Key Laboratory of Gynecologic Oncology, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, P. R. China
| |
Collapse
|
6
|
Ilnicka A, Schneider G. Designing molecules with autoencoder networks. NATURE COMPUTATIONAL SCIENCE 2023; 3:922-933. [PMID: 38177601 DOI: 10.1038/s43588-023-00548-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/03/2023] [Indexed: 01/06/2024]
Abstract
Autoencoders are versatile tools in molecular informatics. These unsupervised neural networks serve diverse tasks such as data-driven molecular representation and constructive molecular design. This Review explores their algorithmic foundations and applications in drug discovery, highlighting the most active areas of development and the contributions autoencoder networks have made in advancing this field. We also explore the challenges and prospects concerning the utilization of autoencoders and the various adaptations of this neural network architecture in molecular design.
Collapse
Affiliation(s)
- Agnieszka Ilnicka
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland.
| |
Collapse
|
7
|
Zhu H, Zhou R, Cao D, Tang J, Li M. A pharmacophore-guided deep learning approach for bioactive molecular generation. Nat Commun 2023; 14:6234. [PMID: 37803000 PMCID: PMC10558534 DOI: 10.1038/s41467-023-41454-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 08/30/2023] [Indexed: 10/08/2023] Open
Abstract
The rational design of novel molecules with the desired bioactivity is a critical but challenging task in drug discovery, especially when treating a novel target family or understudied targets. We propose a Pharmacophore-Guided deep learning approach for bioactive Molecule Generation (PGMG). Through the guidance of pharmacophore, PGMG provides a flexible strategy for generating bioactive molecules. PGMG uses a graph neural network to encode spatially distributed chemical features and a transformer decoder to generate molecules. A latent variable is introduced to solve the many-to-many mapping between pharmacophores and molecules to improve the diversity of the generated molecules. Compared to existing methods, PGMG generates molecules with strong docking affinities and high scores of validity, uniqueness, and novelty. In the case studies, we use PGMG in a ligand-based and structure-based drug de novo design. Overall, the flexibility and effectiveness make PGMG a useful tool to accelerate the drug discovery process.
Collapse
Affiliation(s)
- Huimin Zhu
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Renyi Zhou
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410008, China
| | - Jing Tang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, 00290, Finland
- Department of Biochemistry and Developmental Biology, Faculty of Medicine, University of Helsinki, Helsinki, 00290, Finland
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
| |
Collapse
|
8
|
Wang J, Zeng Y, Sun H, Wang J, Wang X, Jin R, Wang M, Zhang X, Cao D, Chen X, Hsieh CY, Hou T. Molecular Generation with Reduced Labeling through Constraint Architecture. J Chem Inf Model 2023. [PMID: 37184885 DOI: 10.1021/acs.jcim.3c00579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
In the past few years, a number of machine learning (ML)-based molecular generative models have been proposed for generating molecules with desirable properties, but they all require a large amount of label data of pharmacological and physicochemical properties. However, experimental determination of these labels, especially bioactivity labels, is very expensive. In this study, we analyze the dependence of various multi-property molecule generation models on biological activity label data and propose Frag-G/M, a fragment-based multi-constraint molecular generation framework based on conditional transformer, recurrent neural networks (RNNs), and reinforcement learning (RL). The experimental results illustrate that, using the same number of labels, Frag-G/M can generate more desired molecules than the baselines (several times more than the baselines). Moreover, compared with the known active compounds, the molecules generated by Frag-G/M exhibit higher scaffold diversity than those generated by the baselines, thus making it more promising to be used in real-world drug discovery scenarios.
Collapse
Affiliation(s)
- Jike Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
- School of Computer Science, Wuhan University, Wuhan, Hubei 430072, P. R. China
| | - Yundian Zeng
- College of Control Science and Engineering, Zhejiang University, Hangzhou, Zhejiang 310027, P. R. China
| | - Huiyong Sun
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing, Jiangsu 210009, P. R. China
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Xiaorui Wang
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Macau 999078, P. R. China
| | - Ruofan Jin
- College of Life Science, Zhejiang University, Hangzhou, Zhejiang 310027, P. R. China
| | - Mingyang Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Xujun Zhang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410004, P. R. China
| | - Xi Chen
- School of Computer Science, Wuhan University, Wuhan, Hubei 430072, P. R. China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| |
Collapse
|
9
|
Kumar R, Sharma A, Alexiou A, Ashraf GM. Artificial Intelligence in De novo Drug Design: Are We Still There? Curr Top Med Chem 2022; 22:2483-2492. [PMID: 36263480 DOI: 10.2174/1568026623666221017143244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 09/06/2022] [Accepted: 09/15/2022] [Indexed: 01/20/2023]
Abstract
BACKGROUND The artificial intelligence (AI)-assisted design of drug candidates with novel structures and desired properties has received significant attention in the recent past, so related areas of forward prediction that aim to discover chemical matters worth synthesizing and further experimental investigation. OBJECTIVES The purpose behind developing AI-driven models is to explore the broader chemical space and suggest new drug candidate scaffolds with promising therapeutic value. Moreover, it is anticipated that such AI-based models may not only significantly reduce the cost and time but also decrease the attrition rate of drug candidates that fail to reach the desirable endpoints at the final stages of drug development. In an attempt to develop AI-based models for de novo drug design, numerous methods have been proposed by various study groups by applying machine learning and deep learning algorithms to chemical datasets. However, there are many challenges in obtaining accurate predictions, and real breakthroughs in de novo drug design are still scarce. METHODS In this review, we explore the recent trends in developing AI-based models for de novo drug design to assess the current status, challenges, and opportunities in the field. CONCLUSION The consistently improved AI algorithms and the abundance of curated training chemical data indicate that AI-based de novo drug design should perform better than the current models. Improvements in the performance are warranted to obtain better outcomes in the form of potential drug candidates, which can perform well in in vivo conditions, especially in the case of more complex diseases.
Collapse
Affiliation(s)
- Rajnish Kumar
- Amity Institute of Biotechnology, Amity University Uttar Pradesh Lucknow Campus, Uttar Pradesh, India
| | - Anju Sharma
- Department of Applied Science, Indian Institute of Information Technology, Allahabad, Uttar Pradesh, India
| | - Athanasios Alexiou
- Novel Global Community Educational Foundation, Hebersham, 2770 NSW, Australia.,AFNP Med Austria, 1010 Wien, Austria
| | - Ghulam Md Ashraf
- Pre-Clinical Research Unit (PCRU), King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia.,Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
10
|
Wang J, Wang X, Sun H, Wang M, Zeng Y, Jiang D, Wu Z, Liu Z, Liao B, Yao X, Hsieh CY, Cao D, Chen X, Hou T. ChemistGA: A Chemical Synthesizable Accessible Molecular Generation Algorithm for Real-World Drug Discovery. J Med Chem 2022; 65:12482-12496. [PMID: 36065998 DOI: 10.1021/acs.jmedchem.2c01179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Many deep learning (DL)-based molecular generative models have been proposed to design novel molecules. These models may perform well on benchmarks, but they usually do not take real-world constraints into account, such as available training data set, synthetic accessibility, and scaffold diversity in drug discovery. In this study, a new algorithm, ChemistGA, was proposed by combining the traditional heuristic algorithm with DL, in which the crossover of the traditional genetic algorithm (GA) was redefined by DL in conjunction with GA, and an innovative backcrossing operation was implemented to generate desired molecules. Our results clearly show that ChemistGA not only retains the strength of the traditional GA but also greatly enhances the synthetic accessibility and success rate of the generated molecules with desired properties. Calculations on the two benchmarks illustrate that ChemistGA achieves impressive performance among the state-of-the-art baselines, and it opens a new avenue for the application of generative models to real-world drug discovery scenarios.
Collapse
Affiliation(s)
- Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,School of Computer Science, Wuhan University, Wuhan 430072, Hubei, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, P. R. China
| | - Xiaorui Wang
- CarbonSilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, P. R. China.,State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa 999078, Macau(SAR), P. R. China
| | - Huiyong Sun
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009, Jiangsu, P. R. China
| | - Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, P. R. China
| | - Yundian Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, P. R. China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Zeyi Liu
- DAMTP, Centre for Mathematical Sciences, University of Cambridge, Cambridge CB30WA, U.K
| | - Ben Liao
- Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, P. R. China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa 999078, Macau(SAR), P. R. China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, P. R. China
| | - Xi Chen
- School of Computer Science, Wuhan University, Wuhan 430072, Hubei, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| |
Collapse
|
11
|
Ruchawapol C, Fu WW, Xu HX. A review on computational approaches that support the researches on traditional Chinese medicines (TCM) against COVID-19. PHYTOMEDICINE : INTERNATIONAL JOURNAL OF PHYTOTHERAPY AND PHYTOPHARMACOLOGY 2022; 104:154324. [PMID: 35841663 PMCID: PMC9259013 DOI: 10.1016/j.phymed.2022.154324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 06/23/2022] [Accepted: 07/05/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND COVID-19 highly caused contagious infections and massive deaths worldwide as well as unprecedentedly disrupting global economies and societies, and the urgent development of new antiviral medications are required. Medicinal herbs are promising resources for the discovery of prophylactic candidate against COVID-19. Considerable amounts of experimental efforts have been made on vaccines and direct-acting antiviral agents (DAAs), but neither of them was fast and fully developed. PURPOSE This study examined the computational approaches that have played a significant role in drug discovery and development against COVID-19, and these computational methods and tools will be helpful for the discovery of lead compounds from phytochemicals and understanding the molecular mechanism of action of TCM in the prevention and control of the other diseases. METHODS A search conducting in scientific databases (PubMed, Science Direct, ResearchGate, Google Scholar, and Web of Science) found a total of 2172 articles, which were retrieved via web interface of the following websites. After applying some inclusion and exclusion criteria and full-text screening, only 292 articles were collected as eligible articles. RESULTS In this review, we highlight three main categories of computational approaches including structure-based, knowledge-mining (artificial intelligence) and network-based approaches. The most commonly used database, molecular docking tool, and MD simulation software include TCMSP, AutoDock Vina, and GROMACS, respectively. Network-based approaches were mainly provided to help readers understanding the complex mechanisms of multiple TCM ingredients, targets, diseases, and networks. CONCLUSION Computational approaches have been broadly applied to the research of phytochemicals and TCM against COVID-19, and played a significant role in drug discovery and development in terms of the financial and time saving.
Collapse
Affiliation(s)
- Chattarin Ruchawapol
- School of Pharmacy, Shanghai University of Traditional Chinese Medicine, Cai Lun Lu 1200, Shanghai 201203, China; Engineering Research Centre of Shanghai Colleges for TCM New Drug Discovery, Cai Lun Lu 1200, Shanghai 201203, China
| | - Wen-Wei Fu
- School of Pharmacy, Shanghai University of Traditional Chinese Medicine, Cai Lun Lu 1200, Shanghai 201203, China; Engineering Research Centre of Shanghai Colleges for TCM New Drug Discovery, Cai Lun Lu 1200, Shanghai 201203, China.
| | - Hong-Xi Xu
- School of Pharmacy, Shanghai University of Traditional Chinese Medicine, Cai Lun Lu 1200, Shanghai 201203, China; Engineering Research Centre of Shanghai Colleges for TCM New Drug Discovery, Cai Lun Lu 1200, Shanghai 201203, China.
| |
Collapse
|
12
|
Spenke F, Hartke B. Graph-based Automated Macro-Molecule Assembly. J Chem Inf Model 2022; 62:3714-3723. [PMID: 35938711 DOI: 10.1021/acs.jcim.2c00609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We present a general molecular framework assembly algorithm that takes a largely arbitrary molecular fragment database and a user-supplied target template graph as input. Automatic assembly of molecular fragments from the database, following a prescribed, user-supplied set of connection rules, then turns the template graph into an actual, chemically reasonable molecular framework. Assembly capabilities of our algorithm are tested by producing several abstract, closed-loop shapes. To indicate a few of many possible application areas we demonstrate a host-guest complex and a road toward catalysis. Postassembly substituent exchange can be used to produce electric fields of desired values at desired points inside the framework or at its surface as a stepping stone toward rationally designed, artificial heterogeneous catalysts.
Collapse
Affiliation(s)
- Florian Spenke
- Institute for Physical Chemistry, Christian-Albrechts-University, Olshausenstrasse 40, Kiel 24098, Germany
| | - Bernd Hartke
- Institute for Physical Chemistry, Christian-Albrechts-University, Olshausenstrasse 40, Kiel 24098, Germany
| |
Collapse
|
13
|
Kong Y, Zhao X, Liu R, Yang Z, Yin H, Zhao B, Wang J, Qin B, Yan A. Integrating concept of pharmacophore with graph neural networks for chemical property prediction and interpretation. J Cheminform 2022; 14:52. [PMID: 35927691 PMCID: PMC9351086 DOI: 10.1186/s13321-022-00634-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 07/16/2022] [Indexed: 11/10/2022] Open
Abstract
Recently, graph neural networks (GNNs) have revolutionized the field of chemical property prediction and achieved state-of-the-art results on benchmark data sets. Compared with the traditional descriptor- and fingerprint-based QSAR models, GNNs can learn task related representations, which completely gets rid of the rules defined by experts. However, due to the lack of useful prior knowledge, the prediction performance and interpretability of the GNNs may be affected. In this study, we introduced a new GNN model called RG-MPNN for chemical property prediction that integrated pharmacophore information hierarchically into message-passing neural network (MPNN) architecture, specifically, in the way of pharmacophore-based reduced-graph (RG) pooling. RG-MPNN absorbed not only the information of atoms and bonds from the atom-level message-passing phase, but also the information of pharmacophores from the RG-level message-passing phase. Our experimental results on eleven benchmark and ten kinase data sets showed that our model consistently matched or outperformed other existing GNN models. Furthermore, we demonstrated that applying pharmacophore-based RG pooling to MPNN architecture can generally help GNN models improve the predictive power. The cluster analysis of RG-MPNN representations and the importance analysis of pharmacophore nodes will help chemists gain insights for hit discovery and lead optimization.
Collapse
Affiliation(s)
- Yue Kong
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China.,Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Xiaoman Zhao
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China
| | - Ruizi Liu
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China
| | - Zhenwu Yang
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China
| | - Hongyan Yin
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China.,Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Bowen Zhao
- Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Jinling Wang
- Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Bingjie Qin
- Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Aixia Yan
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China.
| |
Collapse
|
14
|
Mukaidaisi M, Vu A, Grantham K, Tchagang A, Li Y. Multi-Objective Drug Design Based on Graph-Fragment Molecular Representation and Deep Evolutionary Learning. Front Pharmacol 2022; 13:920747. [PMID: 35860028 PMCID: PMC9291509 DOI: 10.3389/fphar.2022.920747] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 05/26/2022] [Indexed: 11/19/2022] Open
Abstract
Drug discovery is a challenging process with a huge molecular space to be explored and numerous pharmacological properties to be appropriately considered. Among various drug design protocols, fragment-based drug design is an effective way of constraining the search space and better utilizing biologically active compounds. Motivated by fragment-based drug search for a given protein target and the emergence of artificial intelligence (AI) approaches in this field, this work advances the field of in silico drug design by (1) integrating a graph fragmentation-based deep generative model with a deep evolutionary learning process for large-scale multi-objective molecular optimization, and (2) applying protein-ligand binding affinity scores together with other desired physicochemical properties as objectives. Our experiments show that the proposed method can generate novel molecules with improved property values and binding affinities.
Collapse
Affiliation(s)
- Muhetaer Mukaidaisi
- Biomedical Data Science Laboratory, Department of Computer Science, Brock University, St. Catharines, ON, Canada
| | - Andrew Vu
- Biomedical Data Science Laboratory, Department of Computer Science, Brock University, St. Catharines, ON, Canada
| | - Karl Grantham
- Biomedical Data Science Laboratory, Department of Computer Science, Brock University, St. Catharines, ON, Canada
| | - Alain Tchagang
- Scientific Data Mining Team, Digital Technologies Research Centre, National Research Council Canada, Ottawa, ON, Canada
| | - Yifeng Li
- Biomedical Data Science Laboratory, Department of Computer Science, Brock University, St. Catharines, ON, Canada
- *Correspondence: Yifeng Li ,
| |
Collapse
|
15
|
Wigh DS, Goodman JM, Lapkin AA. A review of molecular representation in the age of machine learning. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1603] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Daniel S. Wigh
- Department of Chemical Engineering and Biotechnology University of Cambridge Cambridge UK
| | | | - Alexei A. Lapkin
- Department of Chemical Engineering and Biotechnology University of Cambridge Cambridge UK
| |
Collapse
|
16
|
Abstract
Matched Molecular Pair Analysis (MMP) is a very important tool during the lead optimization stage in drug discovery. The usefulness of this tool in the lead optimization stage has been discussed in several peer-reviewed articles. The application of MMP in Molecule generation is relatively new. This brings several challenges one of them being the need to encode contextual information into the transforms. In this chapter, we discuss how we use MMPs as a molecule generation method and how does it compare with other molecular generators.
Collapse
Affiliation(s)
- Sandeep Pal
- GlaxoSmithKline Medicines Research Centre, Stevenage, UK.
| | - Peter Pogány
- GlaxoSmithKline Medicines Research Centre, Stevenage, UK
| | | |
Collapse
|
17
|
Abstract
Artificial intelligence (AI) tools find increasing application in drug discovery supporting every stage of the Design-Make-Test-Analyse (DMTA) cycle. The main focus of this chapter is the application in molecular generation with the aid of deep neural networks (DNN). We present a historical overview of the main advances in the field. We analyze the concepts of distribution and goal-directed learning and then highlight some of the recent applications of generative models in drug design with a focus into research work from the biopharmaceutical industry. We present in some more detail REINVENT which is an open-source software developed within our group in AstraZeneca and the main platform for AI molecular design support for a number of medicinal chemistry projects in the company and we also demonstrate some of our work in library design. Finally, we present some of the main challenges in the application of AI in Drug Discovery and different approaches to respond to these challenges which define areas for current and future work.
Collapse
|
18
|
Wang M, Sun H, Wang J, Pang J, Chai X, Xu L, Li H, Cao D, Hou T. Comprehensive assessment of deep generative architectures for de novo drug design. Brief Bioinform 2021; 23:6470970. [PMID: 34929743 DOI: 10.1093/bib/bbab544] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 11/24/2021] [Accepted: 11/25/2021] [Indexed: 01/20/2023] Open
Abstract
Recently, deep learning (DL)-based de novo drug design represents a new trend in pharmaceutical research, and numerous DL-based methods have been developed for the generation of novel compounds with desired properties. However, a comprehensive understanding of the advantages and disadvantages of these methods is still lacking. In this study, the performances of different generative models were evaluated by analyzing the properties of the generated molecules in different scenarios, such as goal-directed (rediscovery, optimization and scaffold hopping of active compounds) and target-specific (generation of novel compounds for a given target) tasks. In overall, the DL-based models have significant advantages over the baseline models built by the traditional methods in learning the physicochemical property distributions of the training sets and may be more suitable for target-specific tasks. However, both the baselines and DL-based generative models cannot fully exploit the scaffolds of the training sets, and the molecules generated by the DL-based methods even have lower scaffold diversity than those generated by the traditional models. Moreover, our assessment illustrates that the DL-based methods do not exhibit obvious advantages over the genetic algorithm-based baselines in goal-directed tasks. We believe that our study provides valuable guidance for the effective use of generative models in de novo drug design.
Collapse
Affiliation(s)
- Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Huiyong Sun
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009, Jiangsu, P. R. China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Jinping Pang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Xin Chai
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, Jiangsu, China
| | - Honglin Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai 200237, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| |
Collapse
|
19
|
Quevedo-Tumailli V, Ortega-Tenezaca B, González-Díaz H. IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds. Int J Mol Sci 2021; 22:13066. [PMID: 34884870 PMCID: PMC8657696 DOI: 10.3390/ijms222313066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/23/2021] [Accepted: 11/24/2021] [Indexed: 11/16/2022] Open
Abstract
The parasite species of genus Plasmodium causes Malaria, which remains a major global health problem due to parasite resistance to available Antimalarial drugs and increasing treatment costs. Consequently, computational prediction of new Antimalarial compounds with novel targets in the proteome of Plasmodium sp. is a very important goal for the pharmaceutical industry. We can expect that the success of the pre-clinical assay depends on the conditions of assay per se, the chemical structure of the drug, the structure of the target protein to be targeted, as well as on factors governing the expression of this protein in the proteome such as genes (Deoxyribonucleic acid, DNA) sequence and/or chromosomes structure. However, there are no reports of computational models that consider all these factors simultaneously. Some of the difficulties for this kind of analysis are the dispersion of data in different datasets, the high heterogeneity of data, etc. In this work, we analyzed three databases ChEMBL (Chemical database of the European Molecular Biology Laboratory), UniProt (Universal Protein Resource), and NCBI-GDV (National Center for Biotechnology Information-Genome Data Viewer) to achieve this goal. The ChEMBL dataset contains outcomes for 17,758 unique assays of potential Antimalarial compounds including numeric descriptors (variables) for the structure of compounds as well as a huge amount of information about the conditions of assays. The NCBI-GDV and UniProt datasets include the sequence of genes, proteins, and their functions. In addition, we also created two partitions (cassayj = caj and cdataj = cdj) of categorical variables from theChEMBL dataset. These partitions contain variables that encode information about experimental conditions of preclinical assays (caj) or about the nature and quality of data (cdj). These categorical variables include information about 22 parameters of biological activity (ca0), 28 target proteins (ca1), and 9 organisms of assay (ca2), etc. We also created another partition of (cprotj = cpj) including categorical variables with biological information about the target proteins, genes, and chromosomes. These variables cover32 genes (cp0), 10 chromosomes (cp1), gene orientation (cp2), and 31 protein functions (cp3). We used a Perturbation-Theory Machine Learning Information Fusion (IFPTML) algorithm to map all this information (from three databases) into and train a predictive model. Shannon's entropy measure Shk (numerical variables) was used to quantify the information about the structure of drugs, protein sequences, gene sequences, and chromosomes in the same information scale. Perturbation Theory Operators (PTOs) with the form of Moving Average (MA) operators have been used to quantify perturbations (deviations) in the structural variables with respect to their expected values for different subsets (partitions) of categorical variables. We obtained three IFPTML models using General Discriminant Analysis (GDA), Classification Tree with Univariate Splits (CTUS), and Classification Tree with Linear Combinations (CTLC). The IFPTML-CTLC presented the better performance with Sensitivity Sn(%) = 83.6/85.1, and Specificity Sp(%) = 89.8/89.7 for training/validation sets, respectively. This model could become a useful tool for the optimization of preclinical assays of new Antimalarial compounds vs. different proteins in the proteome of Plasmodium.
Collapse
Affiliation(s)
- Viviana Quevedo-Tumailli
- Grupo RNASA-IMEDIR, Department of Computer Science, University of A Coruña, 15071 A Coruña, Spain; (V.Q.-T.); (B.O.-T.)
- Research Department, Puyo Campus, Universidad Estatal Amazónica, Puyo 160150, Ecuador
| | - Bernabe Ortega-Tenezaca
- Grupo RNASA-IMEDIR, Department of Computer Science, University of A Coruña, 15071 A Coruña, Spain; (V.Q.-T.); (B.O.-T.)
- Information and Communications Technology Management Department, Puyo Campus, Universidad Estatal Amazónica, Puyo 160150, Ecuador
| | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, 48940 Leioa, Spain
- BIOFISIKA, Basque Centre for Biophysics, CSIC-UPV/EHU, 48940 Leioa, Spain
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain
| |
Collapse
|
20
|
Chen G, Song Z, Qi Z. Transformer-convolutional neural network for surface charge density profile prediction: Enabling high-throughput solvent screening with COSMO-SAC. Chem Eng Sci 2021. [DOI: 10.1016/j.ces.2021.117002] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
21
|
Wang M, Wang Z, Sun H, Wang J, Shen C, Weng G, Chai X, Li H, Cao D, Hou T. Deep learning approaches for de novo drug design: An overview. Curr Opin Struct Biol 2021; 72:135-144. [PMID: 34823138 DOI: 10.1016/j.sbi.2021.10.001] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 08/28/2021] [Accepted: 10/10/2021] [Indexed: 01/01/2023]
Abstract
De novo drug design is the process of generating novel lead compounds with desirable pharmacological and physiochemical properties. The application of deep learning (DL) in de novo drug design has become a hot topic, and many DL-based approaches have been developed for molecular generation tasks. Generally, these approaches were developed as per four frameworks: recurrent neural networks; encoder-decoder; reinforcement learning; and generative adversarial networks. In this review, we first introduced the molecular representation and assessment metrics used in DL-based de novo drug design. Then, we summarized the features of each architecture. Finally, the potential challenges and future directions of DL-based molecular generation were prospected.
Collapse
Affiliation(s)
- Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China
| | - Huiyong Sun
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009, Jiangsu, PR China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China
| | - Gaoqi Weng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China
| | - Xin Chai
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China
| | - Honglin Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China; Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai 200237, PR China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, PR China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China.
| |
Collapse
|
22
|
Deep Learning Applied to Ligand-Based De Novo Drug Design. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:273-299. [PMID: 34731474 DOI: 10.1007/978-1-0716-1787-8_12] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
In the latest years, the application of deep generative models to suggest virtual compounds is becoming a new and powerful tool in drug discovery projects. The idea behind this review is to offer an updated view on de novo design approaches based on artificial intelligent (AI) algorithms, with a particular focus on ligand-based methods. We start this review by reporting a brief overview of the most relevant de novo design approaches developed before the use of AI techniques. We then describe the nowadays most common neural network architectures employed in ligand-based de novo design, together with an up-to-date list of more than 100 deep generative models found in the literature (2017-2020). In order to show how deep generative approaches are applied into drug discovery context, we report all the now available studies in which generated compounds have been synthetized and their biological activity tested. Finally, we discuss what we envisage as beneficial future directions for further application of deep generative models in de novo drug design.
Collapse
|
23
|
Li Y, Pei J, Lai L. Structure-based de novo drug design using 3D deep generative models. Chem Sci 2021; 12:13664-13675. [PMID: 34760151 PMCID: PMC8549794 DOI: 10.1039/d1sc04444c] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 09/09/2021] [Indexed: 12/14/2022] Open
Abstract
Deep generative models are attracting much attention in the field of de novo molecule design. Compared to traditional methods, deep generative models can be trained in a fully data-driven way with little requirement for expert knowledge. Although many models have been developed to generate 1D and 2D molecular structures, 3D molecule generation is less explored, and the direct design of drug-like molecules inside target binding sites remains challenging. In this work, we introduce DeepLigBuilder, a novel deep learning-based method for de novo drug design that generates 3D molecular structures in the binding sites of target proteins. We first developed Ligand Neural Network (L-Net), a novel graph generative model for the end-to-end design of chemically and conformationally valid 3D molecules with high drug-likeness. Then, we combined L-Net with Monte Carlo tree search to perform structure-based de novo drug design tasks. In the case study of inhibitor design for the main protease of SARS-CoV-2, DeepLigBuilder suggested a list of drug-like compounds with novel chemical structures, high predicted affinity, and similar binding features to those of known inhibitors. The current version of L-Net was trained on drug-like compounds from ChEMBL, which could be easily extended to other molecular datasets with desired properties based on users' demands and applied in functional molecule generation. Merging deep generative models with atomic-level interaction evaluation, DeepLigBuilder provides a state-of-the-art model for structure-based de novo drug design and lead optimization. DeepLigBuilder, a novel deep generative model for structure-based de novo drug design, directly generates 3D structures of drug-like compounds in the target binding site.![]()
Collapse
Affiliation(s)
- Yibo Li
- Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University Beijing 100871 China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University Beijing 100871 China
| | - Luhua Lai
- Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University Beijing 100871 China .,Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University Beijing 100871 China .,BNLMS, College of Chemistry and Molecular Engineering, Peking University Beijing 100871 China
| |
Collapse
|
24
|
Selvaraj C, Chandra I, Singh SK. Artificial intelligence and machine learning approaches for drug design: challenges and opportunities for the pharmaceutical industries. Mol Divers 2021; 26:1893-1913. [PMID: 34686947 PMCID: PMC8536481 DOI: 10.1007/s11030-021-10326-z] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 09/24/2021] [Indexed: 12/27/2022]
Abstract
The global spread of COVID-19 has raised the importance of pharmaceutical drug development as intractable and hot research. Developing new drug molecules to overcome any disease is a costly and lengthy process, but the process continues uninterrupted. The critical point to consider the drug design is to use the available data resources and to find new and novel leads. Once the drug target is identified, several interdisciplinary areas work together with artificial intelligence (AI) and machine learning (ML) methods to get enriched drugs. These AI and ML methods are applied in every step of the computer-aided drug design, and integrating these AI and ML methods results in a high success rate of hit compounds. In addition, this AI and ML integration with high-dimension data and its powerful capacity have taken a step forward. Clinical trials output prediction through the AI/ML integrated models could further decrease the clinical trials cost by also improving the success rate. Through this review, we discuss the backend of AI and ML methods in supporting the computer-aided drug design, along with its challenge and opportunity for the pharmaceutical industry. From the available information or data, the AI and ML based prediction for the high throughput virtual screening. After this integration of AI and ML, the success rate of hit identification has gained a momentum with huge success by providing novel drugs.
Collapse
Affiliation(s)
- Chandrabose Selvaraj
- CADD and Molecular Modelling Lab, Department of Bioinformatics, Alagappa University, Science Block, Karaikudi, Tamil Nadu, 630004, India.
| | - Ishwar Chandra
- CADD and Molecular Modelling Lab, Department of Bioinformatics, Alagappa University, Science Block, Karaikudi, Tamil Nadu, 630004, India
| | - Sanjeev Kumar Singh
- CADD and Molecular Modelling Lab, Department of Bioinformatics, Alagappa University, Science Block, Karaikudi, Tamil Nadu, 630004, India.
| |
Collapse
|
25
|
Multi-constraint molecular generation based on conditional transformer, knowledge distillation and reinforcement learning. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00403-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
26
|
Muratov EN, Amaro R, Andrade CH, Brown N, Ekins S, Fourches D, Isayev O, Kozakov D, Medina-Franco JL, Merz KM, Oprea TI, Poroikov V, Schneider G, Todd MH, Varnek A, Winkler DA, Zakharov AV, Cherkasov A, Tropsha A. A critical overview of computational approaches employed for COVID-19 drug discovery. Chem Soc Rev 2021; 50:9121-9151. [PMID: 34212944 PMCID: PMC8371861 DOI: 10.1039/d0cs01065k] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Indexed: 01/18/2023]
Abstract
COVID-19 has resulted in huge numbers of infections and deaths worldwide and brought the most severe disruptions to societies and economies since the Great Depression. Massive experimental and computational research effort to understand and characterize the disease and rapidly develop diagnostics, vaccines, and drugs has emerged in response to this devastating pandemic and more than 130 000 COVID-19-related research papers have been published in peer-reviewed journals or deposited in preprint servers. Much of the research effort has focused on the discovery of novel drug candidates or repurposing of existing drugs against COVID-19, and many such projects have been either exclusively computational or computer-aided experimental studies. Herein, we provide an expert overview of the key computational methods and their applications for the discovery of COVID-19 small-molecule therapeutics that have been reported in the research literature. We further outline that, after the first year the COVID-19 pandemic, it appears that drug repurposing has not produced rapid and global solutions. However, several known drugs have been used in the clinic to cure COVID-19 patients, and a few repurposed drugs continue to be considered in clinical trials, along with several novel clinical candidates. We posit that truly impactful computational tools must deliver actionable, experimentally testable hypotheses enabling the discovery of novel drugs and drug combinations, and that open science and rapid sharing of research results are critical to accelerate the development of novel, much needed therapeutics for COVID-19.
Collapse
Affiliation(s)
- Eugene N. Muratov
- UNC Eshelman School of Pharmacy, University of North CarolinaChapel HillNCUSA
| | - Rommie Amaro
- University of California in San DiegoSan DiegoCAUSA
| | | | | | - Sean Ekins
- Collaborations PharmaceuticalsRaleighNCUSA
| | - Denis Fourches
- Department of Chemistry, North Carolina State UniversityRaleighNCUSA
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Melon UniversityPittsburghPAUSA
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook UniversityStony BrookNYUSA
| | | | - Kenneth M. Merz
- Department of Chemistry, Michigan State UniversityEast LansingMIUSA
| | - Tudor I. Oprea
- Department of Internal Medicine and UNM Comprehensive Cancer Center, University of New Mexico, AlbuquerqueNMUSA
- Department of Rheumatology and Inflammation Research, Gothenburg UniversitySweden
- Novo Nordisk Foundation Center for Protein Research, University of CopenhagenDenmark
| | | | - Gisbert Schneider
- Institute of Pharmaceutical Sciences, Swiss Federal Institute of TechnologyZurichSwitzerland
| | | | - Alexandre Varnek
- Department of Chemistry, University of StrasbourgStrasbourgFrance
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido UniversitySapporoJapan
| | - David A. Winkler
- Monash Institute of Pharmaceutical Sciences, Monash UniversityMelbourneVICAustralia
- School of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe UniversityBundooraAustralia
- School of Pharmacy, University of NottinghamNottinghamUK
| | | | - Artem Cherkasov
- Vancouver Prostate Centre, University of British ColumbiaVancouverBCCanada
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North CarolinaChapel HillNCUSA
| |
Collapse
|
27
|
Meyers J, Fabian B, Brown N. De novo molecular design and generative models. Drug Discov Today 2021; 26:2707-2715. [PMID: 34082136 DOI: 10.1016/j.drudis.2021.05.019] [Citation(s) in RCA: 79] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 04/21/2021] [Accepted: 05/26/2021] [Indexed: 02/09/2023]
Abstract
Molecular design strategies are integral to therapeutic progress in drug discovery. Computational approaches for de novo molecular design have been developed over the past three decades and, recently, thanks in part to advances in machine learning (ML) and artificial intelligence (AI), the drug discovery field has gained practical experience. Here, we review these learnings and present de novo approaches according to the coarseness of their molecular representation: that is, whether molecular design is modeled on an atom-based, fragment-based, or reaction-based paradigm. Furthermore, we emphasize the value of strong benchmarks, describe the main challenges to using these methods in practice, and provide a viewpoint on further opportunities for exploration and challenges to be tackled in the upcoming years.
Collapse
Affiliation(s)
| | | | - Nathan Brown
- BenevolentAI, 4-8 Maple Street, London W1T 5HD, UK
| |
Collapse
|
28
|
Schultz KJ, Colby SM, Yesiltepe Y, Nuñez JR, McGrady MY, Renslow RS. Application and assessment of deep learning for the generation of potential NMDA receptor antagonists. Phys Chem Chem Phys 2021; 23:1197-1214. [PMID: 33355332 DOI: 10.1039/d0cp03620j] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Uncompetitive antagonists of the N-methyl d-aspartate receptor (NMDAR) have demonstrated therapeutic benefit in the treatment of neurological diseases such as Parkinson's and Alzheimer's, but some also cause dissociative effects that have led to the synthesis of illicit drugs. The ability to generate NMDAR antagonists in silico is therefore desirable for both new medication development and preempting and identifying new designer drugs. Recently, generative deep learning models have been applied to de novo drug design as a means to expand the amount of chemical space that can be explored for potential drug-like compounds. In this study, we assess the application of a generative model to the NMDAR to achieve two primary objectives: (i) the creation and release of a comprehensive library of experimentally validated NMDAR phencyclidine (PCP) site antagonists to assist the drug discovery community and (ii) an analysis of both the advantages conferred by applying such generative artificial intelligence models to drug design and the current limitations of the approach. We apply, and provide source code for, a variety of ligand- and structure-based assessment techniques used in standard drug discovery analyses to the deep learning-generated compounds. We present twelve candidate antagonists that are not available in existing chemical databases to provide an example of what this type of workflow can achieve, though synthesis and experimental validation of these compounds are still required.
Collapse
Affiliation(s)
| | - Sean M Colby
- Pacific Northwest National Laboratory, Richland, WA, USA.
| | | | - Jamie R Nuñez
- Pacific Northwest National Laboratory, Richland, WA, USA.
| | | | - Ryan S Renslow
- Pacific Northwest National Laboratory, Richland, WA, USA.
| |
Collapse
|
29
|
Kim H, Kim E, Lee I, Bae B, Park M, Nam H. Artificial Intelligence in Drug Discovery: A Comprehensive Review of Data-driven and Machine Learning Approaches. BIOTECHNOL BIOPROC E 2021; 25:895-930. [PMID: 33437151 PMCID: PMC7790479 DOI: 10.1007/s12257-020-0049-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 05/27/2020] [Accepted: 06/03/2020] [Indexed: 02/07/2023]
Abstract
As expenditure on drug development increases exponentially, the overall drug discovery process requires a sustainable revolution. Since artificial intelligence (AI) is leading the fourth industrial revolution, AI can be considered as a viable solution for unstable drug research and development. Generally, AI is applied to fields with sufficient data such as computer vision and natural language processing, but there are many efforts to revolutionize the existing drug discovery process by applying AI. This review provides a comprehensive, organized summary of the recent research trends in AI-guided drug discovery process including target identification, hit identification, ADMET prediction, lead optimization, and drug repositioning. The main data sources in each field are also summarized in this review. In addition, an in-depth analysis of the remaining challenges and limitations will be provided, and proposals for promising future directions in each of the aforementioned areas.
Collapse
Affiliation(s)
- Hyunho Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Eunyoung Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Bongsung Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Minsu Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| |
Collapse
|
30
|
Gao P, Zhang J, Sun Y, Yu J. Accurate predictions of aqueous solubility of drug molecules via the multilevel graph convolutional network (MGCN) and SchNet architectures. Phys Chem Chem Phys 2020; 22:23766-23772. [PMID: 33063077 DOI: 10.1039/d0cp03596c] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Deep learning based methods have been widely applied to predict various kinds of molecular properties in the pharmaceutical industry with increasingly more success. In this study, we propose two novel models for aqueous solubility predictions, based on the Multilevel Graph Convolutional Network (MGCN) and SchNet architectures, respectively. The advantage of the MGCN lies in the fact that it could extract the graph features of the target molecules directly from the (3D) structural information; therefore, it doesn't need to rely on a lot of intra-molecular descriptors to learn the features, which are of significance for accurate predictions of the molecular properties. The SchNet performs well in modelling the interatomic interactions inside a molecule, and such a deep learning architecture is also capable of extracting structural information and further predicting the related properties. The actual accuracy of these two novel approaches was systematically benchmarked with four different independent datasets. We found that both the MGCN and SchNet models performed well for aqueous solubility predictions. In the future, we believe such promising predictive models will be applicable to enhancing the efficiency of the screening, crystallization and delivery of drug molecules, essentially as a useful tool to promote the development of molecular pharmaceutics.
Collapse
Affiliation(s)
- Peng Gao
- School of Chemistry and Molecular Bioscience, University of Wollongong, NSW 2500, Australia
| | | | | | | |
Collapse
|
31
|
Bush JT, Pogany P, Pickett SD, Barker M, Baxter A, Campos S, Cooper AWJ, Hirst D, Inglis G, Nadin A, Patel VK, Poole D, Pritchard J, Washio Y, White G, Green DVS. A Turing Test for Molecular Generators. J Med Chem 2020; 63:11964-11971. [PMID: 32955254 DOI: 10.1021/acs.jmedchem.0c01148] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Machine learning approaches promise to accelerate and improve success rates in medicinal chemistry programs by more effectively leveraging available data to guide a molecular design. A key step of an automated computational design algorithm is molecule generation, where the machine is required to design high-quality, drug-like molecules within the appropriate chemical space. Many algorithms have been proposed for molecular generation; however, a challenge is how to assess the validity of the resulting molecules. Here, we report three Turing-inspired tests designed to evaluate the performance of molecular generators. Profound differences were observed between the performance of molecule generators in these tests, highlighting the importance of selection of the appropriate design algorithms for specific circumstances. One molecule generator, based on match molecular pairs, performed excellently against all tests and thus provides a valuable component for machine-driven medicinal chemistry design workflows.
Collapse
Affiliation(s)
- Jacob T Bush
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Peter Pogany
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Stephen D Pickett
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Mike Barker
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Andrew Baxter
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Sebastien Campos
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Anthony W J Cooper
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - David Hirst
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Graham Inglis
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Alan Nadin
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Vipulkumar K Patel
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Darren Poole
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - John Pritchard
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Yoshiaki Washio
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Gemma White
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Darren V S Green
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| |
Collapse
|
32
|
Domenico A, Nicola G, Daniela T, Fulvio C, Nicola A, Orazio N. De Novo Drug Design of Targeted Chemical Libraries Based on Artificial Intelligence and Pair-Based Multiobjective Optimization. J Chem Inf Model 2020; 60:4582-4593. [PMID: 32845150 DOI: 10.1021/acs.jcim.0c00517] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Artificial intelligence and multiobjective optimization represent promising solutions to bridge chemical and biological landscapes by addressing the automated de novo design of compounds as a result of a humanlike creative process. In the present study, we conceived a novel pair-based multiobjective approach implemented in an adapted SMILES generative algorithm based on recurrent neural networks for the automated de novo design of new molecules whose overall features are optimized by finding the best trade-offs among relevant physicochemical properties (MW, logP, HBA, HBD) and additional similarity-based constraints biasing specific biological targets. In this respect, we carried out the de novo design of chemical libraries targeting neuraminidase, acetylcholinesterase, and the main protease of severe acute respiratory syndrome coronavirus 2. Several quality metrics were employed to assess drug-likeness, chemical feasibility, diversity content, and validity. Molecular docking was finally carried out to better evaluate the scoring and posing of the de novo generated molecules with respect to X-ray cognate ligands of the corresponding molecular counterparts. Our results indicate that artificial intelligence and multiobjective optimization allow us to capture the latent links joining chemical and biological aspects, thus providing easy-to-use options for customizable design strategies, which are especially effective for both lead generation and lead optimization. The algorithm is freely downloadable at https://github.com/alberdom88/moo-denovo and all of the data are available as Supporting Information.
Collapse
Affiliation(s)
- Alberga Domenico
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari "Aldo Moro", Via E. Orabona, 4, I-70126 Bari, Italy
| | - Gambacorta Nicola
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari "Aldo Moro", Via E. Orabona, 4, I-70126 Bari, Italy
| | - Trisciuzzi Daniela
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari "Aldo Moro", Via E. Orabona, 4, I-70126 Bari, Italy.,Molecular Horizon srl, Via Montelino 32, 06084 Bettona, Italy
| | - Ciriaco Fulvio
- Dipartimento di Chimica, Università degli Studi di Bari "Aldo Moro", Via E. Orabona, 4, I-70126 Bari, Italy
| | - Amoroso Nicola
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari "Aldo Moro", Via E. Orabona, 4, I-70126 Bari, Italy
| | - Nicolotti Orazio
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari "Aldo Moro", Via E. Orabona, 4, I-70126 Bari, Italy
| |
Collapse
|
33
|
Amabilino S, Pogány P, Pickett SD, Green DVS. Guidelines for Recurrent Neural Network Transfer Learning-Based Molecular Generation of Focused Libraries. J Chem Inf Model 2020; 60:5699-5713. [DOI: 10.1021/acs.jcim.0c00343] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Silvia Amabilino
- School of Chemistry, University of Bristol, Cantock’s Close, Bristol BS8 1TS, United Kingdom
| | - Peter Pogány
- Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Herts SG1 2NY, United Kingdom
| | - Stephen D. Pickett
- Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Herts SG1 2NY, United Kingdom
| | - Darren V. S. Green
- Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Herts SG1 2NY, United Kingdom
| |
Collapse
|
34
|
Green DVS, Pickett S, Luscombe C, Senger S, Marcus D, Meslamani J, Brett D, Powell A, Masson J. BRADSHAW: a system for automated molecular design. J Comput Aided Mol Des 2020; 34:747-765. [PMID: 31637565 PMCID: PMC7292824 DOI: 10.1007/s10822-019-00234-8] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Accepted: 10/05/2019] [Indexed: 12/18/2022]
Abstract
This paper introduces BRADSHAW (Biological Response Analysis and Design System using an Heterogenous, Automated Workflow), a system for automated molecular design which integrates methods for chemical structure generation, experimental design, active learning and cheminformatics tools. The simple user interface is designed to facilitate access to large scale automated design whilst minimising software development required to introduce new algorithms, a critical requirement in what is a very fast moving field. The system embodies a philosophy of automation, best practice, experimental design and the use of both traditional cheminformatics and modern machine learning algorithms.
Collapse
Affiliation(s)
- Darren V S Green
- Department of Molecular Design, Data and Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, UK.
| | - Stephen Pickett
- Department of Molecular Design, Data and Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, UK
| | - Chris Luscombe
- Department of Molecular Design, Data and Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, UK
| | - Stefan Senger
- Department of Molecular Design, Data and Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, UK
| | - David Marcus
- Department of Molecular Design, Data and Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, UK
| | - Jamel Meslamani
- Department of Molecular Design, Data and Computational Sciences, GlaxoSmithKline, 1250 South Collegeville Road, Collegeville, PA, 19426, USA
| | - David Brett
- Tessella Ltd, Walkern Road, Stevenage, Hertfordshire, SG1 3QP, UK
| | - Adam Powell
- Tessella Ltd, Walkern Road, Stevenage, Hertfordshire, SG1 3QP, UK
| | - Jonathan Masson
- Tessella Ltd, Walkern Road, Stevenage, Hertfordshire, SG1 3QP, UK
| |
Collapse
|
35
|
Morris P, St. Clair R, Hahn WE, Barenholtz E. Predicting Binding from Screening Assays with Transformer Network Embeddings. J Chem Inf Model 2020; 60:4191-4199. [DOI: 10.1021/acs.jcim.9b01212] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Paul Morris
- Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, Florida 33431, United States
| | - Rachel St. Clair
- Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, Florida 33431, United States
| | - William Edward Hahn
- Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, Florida 33431, United States
| | - Elan Barenholtz
- Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, Florida 33431, United States
| |
Collapse
|
36
|
Affiliation(s)
- Günter Klambauer
- Johannes Kepler University , LIT AI Lab & Institute for Machine Learning , 4040 Linz , Austria
| | - Sepp Hochreiter
- Johannes Kepler University , LIT AI Lab & Institute for Machine Learning , 4040 Linz , Austria
| | - Matthias Rarey
- Universität Hamburg , ZBH-Center for Bioinformatics , 20146 Hamburg , Germany
| |
Collapse
|
37
|
Nakajima R, Midorikawa N. Topic extraction to provide an overview of research activities: The case of the high-temperature superconductor and simulation and modelling. J Inf Sci 2020. [DOI: 10.1177/0165551520920794] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
For those who are not experts in a particular scientific field, it is difficult to understand scientific research trends. Although studies on the extraction of research trends have been conducted, most focus on extracting global trends from large-scale data, and the methods are often complicated. The purpose of this study is to develop a method of obtaining overviews of a scientific field for non-experts by capturing research trends simply and then to verify the method. To extract research topics which should express research trends, text analysis was performed using abstracts over 12 years of articles on high-temperature superconductors. We characterised three topics for the extracted word groups that frequently occurred. For these topics, we studied their appropriateness using a method that has been little used: examining research articles, review literature and co-citations among research articles used to extract the words, comparisons with controlled index terms assigned to the articles and confirming that there were no contradictions. Based on the established method, we have also applied this method to another research field: ‘simulation and modelling’. Although the method used in this article is simple, important topics were extracted, and the relations with the original articles are clear, which can lead to further investigation of the extracted topics.
Collapse
Affiliation(s)
- Ritsuko Nakajima
- Graduate School of Library, Information and Media Studies, University of Tsukuba, Japan
| | - Nobuyuki Midorikawa
- Faculty of Library, Information and Media Science, University of Tsukuba, Japan
| |
Collapse
|
38
|
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem Rev 2019; 119:10520-10594. [PMID: 31294972 DOI: 10.1021/acs.chemrev.8b00728] [Citation(s) in RCA: 351] [Impact Index Per Article: 70.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
- Xin Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Yifei Wang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Ryan Byrne
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Gisbert Schneider
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Shengyong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| |
Collapse
|