1
|
Wang Y, Guo M, Chen X, Ai D. Screening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generation. Sci Rep 2025; 15:4419. [PMID: 39910075 PMCID: PMC11799282 DOI: 10.1038/s41598-025-86840-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Accepted: 01/14/2025] [Indexed: 02/07/2025] Open
Abstract
Traditional virtual screening methods need to explore expanse and vast chemical spaces and need to be based on existing chemical libraries. With the development of deep learning techniques for the de novo generation of molecules, also known as inverse molecular design, the increasingly widespread application of various types of deep learning algorithms has led to revolutionary changes in de novo molecular generation research. In particular, the emergence of a novel natural language processing (NLP) architecture called the transformer has improved the state-of-the-art performance of existing AI technologies. In this study, we modified one top-performing molecular generation model on the basis of the generative pretraining transformer (GPT) architecture in three directions. Moreover, we propose an integrated end-to-end neural network learning framework based on one complete encoder-decoder architecture transformer model: Transfer Text-to-Text Transformer (T5), by learning the embedding vector representation space of conditional molecular properties to encode and guide the vector representation of SMILES sequences, resulting in the output of the final decoder block with a softmax output (maximum likelihood objective). Moreover, we evaluated the performance of these NLP-based generation models and another new model architecture based on a selective state space and selected the best approach jointing a transfer learning strategy for de novo drug discovery to target L858R/T790M/C797S-mutant EGFR in non-small cell lung cancer.
Collapse
Affiliation(s)
- Yishu Wang
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, 100083, China.
| | - Mengyao Guo
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, 100083, China
| | - Xiaomin Chen
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, 100083, China
| | - Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, 100083, China
| |
Collapse
|
2
|
Flores-Hernandez H, Martinez-Ledesma E. A systematic review of deep learning chemical language models in recent era. J Cheminform 2024; 16:129. [PMID: 39558376 PMCID: PMC11571686 DOI: 10.1186/s13321-024-00916-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 10/17/2024] [Indexed: 11/20/2024] Open
Abstract
Discovering new chemical compounds with specific properties can provide advantages for fields that rely on materials for their development, although this task comes at a high cost in terms of complexity and resources. Since the beginning of the data age, deep learning techniques have revolutionized the process of designing molecules by analyzing and learning from representations of molecular data, greatly reducing the resources and time involved. Various deep learning approaches have been developed to date, using a variety of architectures and strategies, in order to explore the extensive and discontinuous chemical space, providing benefits for generating compounds with specific properties. In this study, we present a systematic review that offers a statistical description and comparison of the strategies utilized to generate molecules through deep learning techniques, utilizing the metrics proposed in Molecular Sets (MOSES) or Guacamol. The study included 48 articles retrieved from a query-based search of Scopus and Web of Science and 25 articles retrieved from citation search, yielding a total of 72 retrieved articles, of which 62 correspond to chemical language models approaches to molecule generation and other 10 retrieved articles correspond to molecular graph representations. Transformers, recurrent neural networks (RNNs), generative adversarial networks (GANs), Structured Space State Sequence (S4) models, and variational autoencoders (VAEs) are considered the main deep learning architectures used for molecule generation in the set of retrieved articles. In addition, transfer learning, reinforcement learning, and conditional learning are the most employed techniques for biased model generation and exploration of specific chemical space regions. Finally, this analysis focuses on the central themes of molecular representation, databases, training dataset size, validity-novelty trade-off, and performance of unbiased and biased chemical language models. These themes were selected to conduct a statistical analysis utilizing graphical representation and statistical tests. The resulting analysis reveals the main challenges, advantages, and opportunities in the field of chemical language models over the past four years.
Collapse
Affiliation(s)
- Hector Flores-Hernandez
- Tecnológico de Monterrey, School of Engineering and Sciences, Monterrey, 64710, Nuevo León, México
| | - Emmanuel Martinez-Ledesma
- Tecnológico de Monterrey, School of Medicine and Health Sciences, Monterrey, 64710, Nuevo León, México.
- Institute for Obesity Research, Tecnológico de Monterrey, Monterrey, 64710, Nuevo León, México.
| |
Collapse
|
3
|
Pawar SB, Deshmukh NK, Jadhav SB. Hybrid deep learning technique for COX-2 inhibition bioactivity detection against breast cancer disease. Biomed Eng Lett 2024; 14:631-647. [PMID: 39512384 PMCID: PMC11538098 DOI: 10.1007/s13534-024-00355-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 01/03/2024] [Accepted: 01/24/2024] [Indexed: 11/15/2024] Open
Abstract
This study addresses detecting COX-2 inhibition in breast cancer, targeting its role in tumor growth. The primary goal is to develop an efficient technique for precise COX-2 inhibition bioactivity detection, with implications for identifying anti-cancer compounds and advancing breast cancer therapies. The proposed methodology uses the UNet architecture for feature extraction, enhancing accuracy. A modified chicken swarm optimization (MCSO) algorithm addresses data dimensionality, optimizing features. An improved Laguerre neural network (ILNN) classifies COX-2 inhibition bioactivity. Validation is performed using the ChEMBL database. The research evaluates the accuracy, precision, recall, F-measure, Matthews' correlation coefficient (MCC), and Dice coefficient of the proposed method. These metrics are compared against those of contemporary methods to assess the efficiency and effectiveness of the developed technique. The study underscores the hybrid deep learning method's significance in accurately detecting COX-2 inhibition bioactivity against breast cancer. Results highlight its potential as a valuable tool in breast cancer drug discovery.
Collapse
Affiliation(s)
- Sahebrao B. Pawar
- School of Computational Sciences, Swami Ramanand Teerth, Marathvada University, Nanded, India
| | - N. K. Deshmukh
- School of Computational Sciences, Swami Ramanand Teerth, Marathvada University, Nanded, India
| | - Sharad B. Jadhav
- School of Computational Sciences, Swami Ramanand Teerth, Marathvada University, Nanded, India
| |
Collapse
|
4
|
Hazemann J, Kimmerlin T, Lange R, Mac Sweeney A, Bourquin G, Ritz D, Czodrowski P. Identification of SARS-CoV-2 Mpro inhibitors through deep reinforcement learning for de novo drug design and computational chemistry approaches. RSC Med Chem 2024; 15:2146-2159. [PMID: 38911172 PMCID: PMC11187573 DOI: 10.1039/d4md00106k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 04/20/2024] [Indexed: 06/25/2024] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused a global pandemic of coronavirus disease (COVID-19) since its emergence in December 2019. As of January 2024, there has been over 774 million reported cases and 7 million deaths worldwide. While vaccination efforts have been successful in reducing the severity of the disease and decreasing the transmission rate, the development of effective therapeutics against SARS-CoV-2 remains a critical need. The main protease (Mpro) of SARS-CoV-2 is an essential enzyme required for viral replication and has been identified as a promising target for drug development. In this study, we report the identification of novel Mpro inhibitors, using a combination of deep reinforcement learning for de novo drug design with 3D pharmacophore/shape-based alignment and privileged fragment match count scoring components followed by hit expansions and molecular docking approaches. Our experimentally validated results show that 3 novel series exhibit potent inhibitory activity against SARS-CoV-2 Mpro, with IC50 values ranging from 1.3 μM to 2.3 μM and a high degree of selectivity. These findings represent promising starting points for the development of new antiviral therapies against COVID-19.
Collapse
Affiliation(s)
- Julien Hazemann
- Physical Chemistry, Chemistry Department, Johannes Gutenberg University Duesbergweg 10-14 55128 Mainz Germany
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Thierry Kimmerlin
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Roland Lange
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Aengus Mac Sweeney
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Geoffroy Bourquin
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Daniel Ritz
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Paul Czodrowski
- Physical Chemistry, Chemistry Department, Johannes Gutenberg University Duesbergweg 10-14 55128 Mainz Germany
| |
Collapse
|
5
|
Matúška J, Bucinsky L, Gall M, Pitoňák M, Štekláč M. SchNetPack Hyperparameter Optimization for a More Reliable Top Docking Scores Prediction. J Phys Chem B 2024; 128:4943-4951. [PMID: 38733335 DOI: 10.1021/acs.jpcb.4c00296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2024]
Abstract
Options to improve the extrapolation power of the neural network designed using the SchNetPack package with respect to top docking scores prediction are presented. It is shown that hyperparameter tuning of the atomistic model representation (in the schnetpack.representation) improves the prediction of the top scoring compounds, which have characteristically a low incidence in randomized data sets for training of machine learning models. The prediction robustness is evaluated according to the mean square error (MSE) and the entropy of the average loss landscape decrease. Admittedly, the improvement of the top scoring compounds' prediction accuracy comes with the penalty of worsening the overall prediction power. It is revealed that the most impactful hyperparameter is the cutoff (5 Å is reported as the optimal choice). Other parameters (e.g., number of radial basis functions, number of interaction layers of the neural network, feature vector size or its batch size) are found to not affect the prediction robustness of the top scoring compounds in any comparable way relative to the cutoff. The MSE of the best docking score prediction (below -13 kcal/mol) improves from ca. 3.5 to 0.9 kcal/mol, while the prediction of less potent compounds (-13 to -11 kcal/mol) shows a lesser improvement, i.e., a decrease of MSE from 1.6 to 1.3 kcal/mol. Additionally, oversampling and undersampling of the training set with respect to the top scoring compounds' abundance is presented. The results indicate that the cutoff choice performs better than over- or undersampling of the training set, with undersampling performing better than oversampling.
Collapse
Affiliation(s)
- Ján Matúška
- Institute of Physical Chemistry and Chemical Physics, Faculty of Chemical and Food Technology, Slovak University of Technology in Bratislava, Radlinského 9, SK-81237 Bratislava, Slovak Republic
| | - Lukas Bucinsky
- Institute of Physical Chemistry and Chemical Physics, Faculty of Chemical and Food Technology, Slovak University of Technology in Bratislava, Radlinského 9, SK-81237 Bratislava, Slovak Republic
| | - Marián Gall
- Institute of Information Engineering, Automation and Mathematics, Faculty of Chemical and Food Technology, Slovak University of Technology in Bratislava, Radlinského 9, SK-81237 Bratislava, Slovak Republic
- National SuperComputing Center, Dúbravská cesta č. 9, SK-84104 Bratislava, Slovak Republic
| | - Michal Pitoňák
- National SuperComputing Center, Dúbravská cesta č. 9, SK-84104 Bratislava, Slovak Republic
- Department of Physical and Theoretical Chemistry, Faculty of Natural Sciences, Comenius University in Bratislava, Mlynská dolina Ilkovičova 6, SK-84215 Bratislava, Slovak Republic
| | - Marek Štekláč
- Institute of Physical Chemistry and Chemical Physics, Faculty of Chemical and Food Technology, Slovak University of Technology in Bratislava, Radlinského 9, SK-81237 Bratislava, Slovak Republic
- Computing Centre, Centre of Operations of the Slovak Academy of Sciences, Dúbravská cesta č. 9, SK-84535 Bratislava, Slovak Republic
| |
Collapse
|
6
|
Humayun F, Khan F, Khan A, Alshammari A, Ji J, Farhan A, Fawad N, Alam W, Ali A, Wei DQ. De novo generation of dual-target ligands for the treatment of SARS-CoV-2 using deep learning, virtual screening, and molecular dynamic simulations. J Biomol Struct Dyn 2024; 42:3019-3029. [PMID: 37449757 DOI: 10.1080/07391102.2023.2234481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 04/30/2023] [Indexed: 07/18/2023]
Abstract
De novo generation of molecules with the necessary features offers a promising opportunity for artificial intelligence, such as deep generative approaches. However, creating novel compounds having biological activities toward two distinct targets continues to be a very challenging task. In this study, we develop a unique computational framework for the de novo synthesis of bioactive compounds directed at two predetermined therapeutic targets. This framework is referred to as the dual-target ligand generative network. Our approach uses a stochastic policy to explore chemical spaces called a sequence-based simple molecular input line entry system (SMILES) generator. The steps in the high-level workflow would be to gather and prepare the training data for both targets' molecules, build a neural network model and train it to make molecules, create new molecules using generative AI, and then virtually screen the newly validated molecules against the SARS-CoV-2 PLpro and 3CLpro drug targets. Results shows that novel molecules generated have higher binding affinity with both targets than the conventional drug i.e. Remdesivir being used for the treatment of SARS-CoV-2.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Fahad Humayun
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
| | - Fatima Khan
- National Institute of Health, Islamabad, Pakistan
| | - Abbas Khan
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
| | - Abdulrahman Alshammari
- Department of Pharmacology and Toxicology, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Jun Ji
- Henan Provincial Engineering and Technology Center of Health Products for Livestock and Poultry, Henan Provincial Engineering and Technology Center of Animal Disease Diagnosis and Integrated Control, Nanyang Normal University, Nanyang, PR China
| | - Ali Farhan
- Department of Chemistry, Chung Yuan Christian University, Taoyuan, Taiwan
| | - Nasim Fawad
- Poultry Research Institute, Rawalpindi, Pakistan
| | - Waheed Alam
- National Institute of Health, Islamabad, Pakistan
| | - Arif Ali
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
| | - Dong-Qing Wei
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
- Centre for Research in Molecular Modeling, Concordia University, Québec, Canada
| |
Collapse
|
7
|
Nguyen TH, Thai QM, Pham MQ, Minh PTH, Phung HTT. Machine learning combines atomistic simulations to predict SARS-CoV-2 Mpro inhibitors from natural compounds. Mol Divers 2024; 28:553-561. [PMID: 36823394 PMCID: PMC9950021 DOI: 10.1007/s11030-023-10601-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 01/04/2023] [Indexed: 02/25/2023]
Abstract
To date, the COVID-19 pandemic has still been infectious around the world, continuously causing social and economic damage on a global scale. One of the most important therapeutic targets for the treatment of COVID-19 is the main protease (Mpro) of SARS-CoV-2. In this study, we combined machine-learning (ML) model with atomistic simulations to computationally search for highly promising SARS-CoV-2 Mpro inhibitors from the representative natural compounds of the National Cancer Institute (NCI) Database. First, the trained ML model was used to scan the library quickly and reliably for possible Mpro inhibitors. The ML output was then confirmed using atomistic simulations integrating molecular docking and molecular dynamic simulations with the linear interaction energy scheme. The results turned out to show that there was evidently good agreement between ML and atomistic simulations. Ten substances were proposed to be able to inhibit SARS-CoV-2 Mpro. Seven of them have high-nanomolar affinity and are very potential inhibitors. The strategy has been proven to be reliable and appropriate for fast prediction of SARS-CoV-2 Mpro inhibitors, benefiting for new emerging SARS-CoV-2 variants in the future accordingly.
Collapse
Affiliation(s)
- Trung Hai Nguyen
- Laboratory of Theoretical and Computational Biophysics, Advanced Institute of Materials Science, Ton Duc Thang University, Ho Chi Minh City, Vietnam
- Faculty of Pharmacy, Ton Duc Thang University, Ho Chi Minh City, Vietnam
| | - Quynh Mai Thai
- Faculty of Pharmacy, Ton Duc Thang University, Ho Chi Minh City, Vietnam
| | - Minh Quan Pham
- Institute of Natural Products Chemistry, Vietnam Academy of Science and Technology, Hanoi, Vietnam
- Graduate University of Science and Technology, Vietnam Academy of Science and Technology, Hanoi, Vietnam
| | - Pham Thi Hong Minh
- Institute of Natural Products Chemistry, Vietnam Academy of Science and Technology, Hanoi, Vietnam
- Graduate University of Science and Technology, Vietnam Academy of Science and Technology, Hanoi, Vietnam
| | - Huong Thi Thu Phung
- NTT Hi-Tech Institute, Nguyen Tat Thanh University, Ho Chi Minh City, Vietnam
| |
Collapse
|
8
|
Wei W, Mengshan L, Yan W, Lixin G. Cluster energy prediction based on multiple strategy fusion whale optimization algorithm and light gradient boosting machine. BMC Chem 2024; 18:24. [PMID: 38291518 PMCID: PMC11367823 DOI: 10.1186/s13065-024-01127-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 01/15/2024] [Indexed: 02/01/2024] Open
Abstract
BACKGROUND Clusters, a novel hierarchical material structure that emerges from atoms or molecules, possess unique reactivity and catalytic properties, crucial in catalysis, biomedicine, and optoelectronics. Predicting cluster energy provides insights into electronic structure, magnetism, and stability. However, the structure of clusters and their potential energy surface is exceptionally intricate. Searching for the global optimal structure (the lowest energy) among these isomers poses a significant challenge. Currently, modelling cluster energy predictions with traditional machine learning methods has several issues, including reliance on manual expertise, slow computation, heavy computational resource demands, and less efficient parameter tuning. RESULTS This paper introduces a predictive model for the energy of a gold cluster comprising twenty atoms (referred to as Au20 cluster). The model integrates the Multiple Strategy Fusion Whale Optimization Algorithm (MSFWOA) with the Light Gradient Boosting Machine (LightGBM), resulting in the MSFWOA-LightGBM model. This model employs the Coulomb matrix representation and eigenvalue solution methods for feature extraction. Additionally, it incorporates the Tent chaotic mapping, cosine convergence factor, and inertia weight updating strategy to optimize the Whale Optimization Algorithm (WOA), leading to the development of MSFWOA. Subsequently, MSFWOA is employed to optimize the parameters of LightGBM for supporting the energy prediction of Au20 cluster. CONCLUSIONS The experimental results show that the most stable Au20 cluster structure is a regular tetrahedron with the lowest energy, displaying tight and uniform atom distribution, high geometric symmetry. Compared to other models, the MSFWOA-LightGBM model excels in accuracy and correlation, with MSE, RMSE, and R2 values of 0.897, 0.947, and 0.879, respectively. Additionally, the MSFWOA-LightGBM model possesses outstanding scalability, offering valuable insights for material design, energy storage, sensing technology, and biomedical imaging, with the potential to drive research and development in these areas.
Collapse
Affiliation(s)
- Wu Wei
- School of Physics and Electronic Information, Gannan Normal University, Ganzhou, 341000, Jiangxi, China
| | - Li Mengshan
- School of Physics and Electronic Information, Gannan Normal University, Ganzhou, 341000, Jiangxi, China.
| | - Wu Yan
- School of Mathematics and Computer Science, Gannan Normal University, Ganzhou, 341000, Jiangxi, China
| | - Guan Lixin
- School of Physics and Electronic Information, Gannan Normal University, Ganzhou, 341000, Jiangxi, China
| |
Collapse
|
9
|
Iwata H, Nakai T, Koyama T, Matsumoto S, Kojima R, Okuno Y. VGAE-MCTS: A New Molecular Generative Model Combining the Variational Graph Auto-Encoder and Monte Carlo Tree Search. J Chem Inf Model 2023; 63:7392-7400. [PMID: 37993764 PMCID: PMC10716893 DOI: 10.1021/acs.jcim.3c01220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 11/03/2023] [Accepted: 11/03/2023] [Indexed: 11/24/2023]
Abstract
Molecular generation is crucial for advancing drug discovery, materials science, and chemical exploration. It expedites the search for new drug candidates, facilitates tailored material creation, and enhances our understanding of molecular diversity. By employing artificial intelligence techniques such as molecular generative models based on molecular graphs, researchers have tackled the challenge of identifying efficient molecules with desired properties. Here, we propose a new molecular generative model combining a graph-based deep neural network and a reinforcement learning technique. We evaluated the validity, novelty, and optimized physicochemical properties of the generated molecules. Importantly, the model explored uncharted regions of chemical space, allowing for the efficient discovery and design of new molecules. This innovative approach has considerable potential to revolutionize drug discovery, materials science, and chemical research for accelerating scientific innovation. By leveraging advanced techniques and exploring previously unexplored chemical spaces, this study offers promising prospects for the efficient discovery and design of new molecules in the field of drug development.
Collapse
Affiliation(s)
- Hiroaki Iwata
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
| | - Taichi Nakai
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
| | - Takuto Koyama
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
| | - Shigeyuki Matsumoto
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
| | - Ryosuke Kojima
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
| | - Yasushi Okuno
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
- HPC-
and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, Kobe-shi, Hyogo 650-0047, Japan
| |
Collapse
|
10
|
Zou J, Zhao L, Shi S. Generation of focused drug molecule library using recurrent neural network. J Mol Model 2023; 29:361. [PMID: 37932607 DOI: 10.1007/s00894-023-05772-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Accepted: 10/26/2023] [Indexed: 11/08/2023]
Abstract
CONTEXT With the wide application of deep learning in drug research and development, de novo molecular design methods based on recurrent neural network (RNN) have strong advantages in drug molecule generation. The RNN model can be used to learn the internal chemical structure of molecules, which is similar to a natural language processing task. Although techniques for generating target-specific molecular libraries based on RNN models are mature, research related to drug design and screening continues around the clock. Research based on de novo drug design methods to generate larger quantities of valid compounds is necessary. METHODS In this study, a molecular generation model based on RNN was designed, which abandoned the traditional way of stacked RNN and introduced the Nested long short-term memory network structure. To enrich the library of focused molecules for specific targets, we fine-tuned the model using active molecules from novel coronavirus pneumonia and screened the molecules using machine learning models. Following rigorous screening, the selected molecules underwent molecular docking with the SARS-CoV-2 M-pro receptor using AutoDock2.4 to identify the top 3 potential inhibitors. Subsequently, 100-ns molecular dynamics simulations were conducted using Amber22. Molecule parameterization involved the GAFF2 force field, while the proteins were modeled using the ff19SB force field, with solvation facilitated by a truncated octahedral TIP3P solvent environment. Upon completion of molecular dynamics simulations, stability of ligand-protein complexes was assessed by analysis of RMSD, H-bonds, and MM-GBSA. Reasonable results prove that the model can complete the task of de novo drug design and has the potential to be ideal drug molecules.
Collapse
Affiliation(s)
- Jinping Zou
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China
- Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China
| | - Long Zhao
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China
- Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China
| | - Shaoping Shi
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China.
- Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China.
| |
Collapse
|
11
|
Ashraf FB, Akter S, Mumu SH, Islam MU, Uddin J. Bio-activity prediction of drug candidate compounds targeting SARS-Cov-2 using machine learning approaches. PLoS One 2023; 18:e0288053. [PMID: 37669264 PMCID: PMC10479925 DOI: 10.1371/journal.pone.0288053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 06/18/2023] [Indexed: 09/07/2023] Open
Abstract
The SARS-CoV-2 3CLpro protein is one of the key therapeutic targets of interest for COVID-19 due to its critical role in viral replication, various high-quality protein crystal structures, and as a basis for computationally screening for compounds with improved inhibitory activity, bioavailability, and ADMETox properties. The ChEMBL and PubChem database contains experimental data from screening small molecules against SARS-CoV-2 3CLpro, which expands the opportunity to learn the pattern and design a computational model that can predict the potency of any drug compound against coronavirus before in-vitro and in-vivo testing. In this study, Utilizing several descriptors, we evaluated 27 machine learning classifiers. We also developed a neural network model that can correctly identify bioactive and inactive chemicals with 91% accuracy, on CheMBL data and 93% accuracy on combined data on both CheMBL and Pubchem. The F1-score for inactive and active compounds was 93% and 94%, respectively. SHAP (SHapley Additive exPlanations) on XGB classifier to find important fingerprints from the PaDEL descriptors for this task. The results indicated that the PaDEL descriptors were effective in predicting bioactivity, the proposed neural network design was efficient, and the Explanatory factor through SHAP correctly identified the important fingertips. In addition, we validated the effectiveness of our proposed model using a large dataset encompassing over 100,000 molecules. This research employed various molecular descriptors to discover the optimal one for this task. To evaluate the effectiveness of these possible medications against SARS-CoV-2, more in-vitro and in-vivo research is required.
Collapse
Affiliation(s)
- Faisal Bin Ashraf
- Department of Computer Science and Engineering, Brac University, Dhaka, Bangladesh
- Department of Computer Science and Engineering, University of California, Riverside, California, United States of America
| | - Sanjida Akter
- Department of Cell Molecular and Developmental Biology, University of California, Riverside, California, United States of America
| | - Sumona Hoque Mumu
- School of Kinesiology, University of Louisiana at Lafayette, Lafayette, Louisiana, United States of America
| | - Muhammad Usama Islam
- School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, Louisiana, United States of America
| | - Jasim Uddin
- Department of Applied Computing and Engineering, Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, Wales, United Kingdom
| |
Collapse
|
12
|
Williams AH, Zhan CG. Staying Ahead of the Game: How SARS-CoV-2 has Accelerated the Application of Machine Learning in Pandemic Management. BioDrugs 2023; 37:649-674. [PMID: 37464099 DOI: 10.1007/s40259-023-00611-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/28/2023] [Indexed: 07/20/2023]
Abstract
In recent years, machine learning (ML) techniques have garnered considerable interest for their potential use in accelerating the rate of drug discovery. With the emergence of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, the utilization of ML has become even more crucial in the search for effective antiviral medications. The pandemic has presented the scientific community with a unique challenge, and the rapid identification of potential treatments has become an urgent priority. Researchers have been able to accelerate the process of identifying drug candidates, repurposing existing drugs, and designing new compounds with desirable properties using machine learning in drug discovery. To train predictive models, ML techniques in drug discovery rely on the analysis of large datasets, including both experimental and clinical data. These models can be used to predict the biological activities, potential side effects, and interactions with specific target proteins of drug candidates. This strategy has proven to be an effective method for identifying potential coronavirus disease 2019 (COVID-19) and other disease treatments. This paper offers a thorough analysis of the various ML techniques implemented to combat COVID-19, including supervised and unsupervised learning, deep learning, and natural language processing. The paper discusses the impact of these techniques on pandemic drug development, including the identification of potential treatments, the understanding of the disease mechanism, and the creation of effective and safe therapeutics. The lessons learned can be applied to future outbreaks and drug discovery initiatives.
Collapse
Affiliation(s)
- Alexander H Williams
- Molecular Modeling and Biopharmaceutical Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA
- GSK Upper Providence, 1250 S. Collegeville Road, Collegeville, PA, 19426, USA
| | - Chang-Guo Zhan
- Molecular Modeling and Biopharmaceutical Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA.
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA.
| |
Collapse
|
13
|
Boldini D, Grisoni F, Kuhn D, Friedrich L, Sieber SA. Practical guidelines for the use of gradient boosting for molecular property prediction. J Cheminform 2023; 15:73. [PMID: 37641120 PMCID: PMC10464382 DOI: 10.1186/s13321-023-00743-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 08/09/2023] [Indexed: 08/31/2023] Open
Abstract
Decision tree ensembles are among the most robust, high-performing and computationally efficient machine learning approaches for quantitative structure-activity relationship (QSAR) modeling. Among them, gradient boosting has recently garnered particular attention, for its performance in data science competitions, virtual screening campaigns, and bioactivity prediction. However, different variants of gradient boosting exist, the most popular being XGBoost, LightGBM and CatBoost. Our study provides the first comprehensive comparison of these approaches for QSAR. To this end, we trained 157,590 gradient boosting models, which were evaluated on 16 datasets and 94 endpoints, comprising 1.4 million compounds in total. Our results show that XGBoost generally achieves the best predictive performance, while LightGBM requires the least training time, especially for larger datasets. In terms of feature importance, the models surprisingly rank molecular features differently, reflecting differences in regularization techniques and decision tree structures. Thus, expert knowledge must always be employed when evaluating data-driven explanations of bioactivity. Furthermore, our results show that the relevance of each hyperparameter varies greatly across datasets and that it is crucial to optimize as many hyperparameters as possible to maximize the predictive performance. In conclusion, our study provides the first set of guidelines for cheminformatics practitioners to effectively train, optimize and evaluate gradient boosting models for virtual screening and QSAR applications.
Collapse
Affiliation(s)
- Davide Boldini
- Department of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, Garching bei Munich, Germany
| | - Francesca Grisoni
- Department of Biomedical Engineering, Institute for Complex Molecular Sciences, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/E, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | | | | | - Stephan A Sieber
- Department of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, Garching bei Munich, Germany.
| |
Collapse
|
14
|
Soulère L, Barbier T, Queneau Y. In Silico Identification of Potential Inhibitors of the SARS-CoV-2 Main Protease among a PubChem Database of Avian Infectious Bronchitis Virus 3CLPro Inhibitors. Biomolecules 2023; 13:956. [PMID: 37371536 DOI: 10.3390/biom13060956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 05/31/2023] [Accepted: 06/02/2023] [Indexed: 06/29/2023] Open
Abstract
Remarkable structural homologies between the main proteases of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the avian infectious bronchitis virus (IBV) were revealed by comparative amino-acid sequence and 3D structural alignment. Assessing whether reported IBV 3CLPro inhibitors could also interact with SARS-CoV-2 has been undertaken in silico using a PubChem BioAssay database of 388 compounds active on the avian infectious bronchitis virus 3C-like protease. Docking studies of this database on the SARS-CoV-2 protease resulted in the identification of four covalent inhibitors targeting the catalytic cysteine residue and five non-covalent inhibitors for which the binding was further investigated by molecular dynamics (MD) simulations. Predictive ADMET calculations on the nine compounds suggest promising pharmacokinetic properties.
Collapse
Affiliation(s)
- Laurent Soulère
- Univ Lyon, INSA Lyon, Université Claude Bernard Lyon 1, CNRS, CPE-Lyon, ICBMS, UMR 5246, Institut de Chimie et de Biochimie Moléculaires et Supramoléculaires, Bâtiment Lederer, 1 Rue Victor Grignard, F-69622 Villeurbanne, France
| | - Thibaut Barbier
- Univ Lyon, INSA Lyon, Université Claude Bernard Lyon 1, CNRS, CPE-Lyon, ICBMS, UMR 5246, Institut de Chimie et de Biochimie Moléculaires et Supramoléculaires, Bâtiment Lederer, 1 Rue Victor Grignard, F-69622 Villeurbanne, France
| | - Yves Queneau
- Univ Lyon, INSA Lyon, Université Claude Bernard Lyon 1, CNRS, CPE-Lyon, ICBMS, UMR 5246, Institut de Chimie et de Biochimie Moléculaires et Supramoléculaires, Bâtiment Lederer, 1 Rue Victor Grignard, F-69622 Villeurbanne, France
| |
Collapse
|
15
|
Saramago LC, Santana MV, Gomes BF, Dantas RF, Senger MR, Oliveira Borges PH, Ferreira VNDS, dos Santos Rosa A, Tucci AR, Dias Miranda M, Lukacik P, Strain-Damerell C, Owen CD, Walsh MA, Ferreira SB, Silva-Junior FP. AI-Driven Discovery of SARS-CoV-2 Main Protease Fragment-like Inhibitors with Antiviral Activity In Vitro. J Chem Inf Model 2023; 63:2866-2880. [PMID: 37058135 PMCID: PMC10124747 DOI: 10.1021/acs.jcim.3c00409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Indexed: 04/15/2023]
Abstract
SARS-CoV-2 is the causative agent of COVID-19 and is responsible for the current global pandemic. The viral genome contains 5 major open reading frames of which the largest ORF1ab codes for two polyproteins, pp1ab and pp1a, which are subsequently cleaved into 16 nonstructural proteins (nsp) by two viral cysteine proteases encoded within the polyproteins. The main protease (Mpro, nsp5) cleaves the majority of the nsp's, making it essential for viral replication and has been successfully targeted for the development of antivirals. The first oral Mpro inhibitor, nirmatrelvir, was approved for treatment of COVID-19 in late December 2021 in combination with ritonavir as Paxlovid. Increasing the arsenal of antivirals and development of protease inhibitors and other antivirals with a varied mode of action remains a priority to reduce the likelihood for resistance emerging. Here, we report results from an artificial intelligence-driven approach followed by in vitro validation, allowing the identification of five fragment-like Mpro inhibitors with IC50 values ranging from 1.5 to 241 μM. The three most potent molecules (compounds 818, 737, and 183) were tested against SARS-CoV-2 by in vitro replication in Vero E6 and Calu-3 cells. Compound 818 was active in both cell models with an EC50 value comparable to its measured IC50 value. On the other hand, compounds 737 and 183 were only active in Calu-3, a preclinical model of respiratory cells, showing selective indexes twice as high as those for compound 818. We also show that our in silico methodology was successful in identifying both reversible and covalent inhibitors. For instance, compound 818 is a reversible chloromethylamide analogue of 8-methyl-γ-carboline, while compound 737 is an N-pyridyl-isatin that covalently inhibits Mpro. Given the small molecular weights of these fragments, their high binding efficiency in vitro and efficacy in blocking viral replication, these compounds represent good starting points for the development of potent lead molecules targeting the Mpro of SARS-CoV-2.
Collapse
Affiliation(s)
- Luiz Carlos Saramago
- LaBECFar-Laboratório de Bioquímica
Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz,
Fundação Oswaldo Cruz, 21040-900 Rio de
Janeiro, Brazil
| | - Marcos V. Santana
- LaBECFar-Laboratório de Bioquímica
Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz,
Fundação Oswaldo Cruz, 21040-900 Rio de
Janeiro, Brazil
| | - Bárbara Figueira Gomes
- LaBECFar-Laboratório de Bioquímica
Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz,
Fundação Oswaldo Cruz, 21040-900 Rio de
Janeiro, Brazil
| | - Rafael Ferreira Dantas
- LaBECFar-Laboratório de Bioquímica
Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz,
Fundação Oswaldo Cruz, 21040-900 Rio de
Janeiro, Brazil
| | - Mario R. Senger
- LaBECFar-Laboratório de Bioquímica
Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz,
Fundação Oswaldo Cruz, 21040-900 Rio de
Janeiro, Brazil
| | - Pedro Henrique Oliveira Borges
- LaBECFar-Laboratório de Bioquímica
Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz,
Fundação Oswaldo Cruz, 21040-900 Rio de
Janeiro, Brazil
- LaSOPB-Laboratório de Síntese
Orgânica e Prospecção Biológica, Instituto de Química,
Universidade Federal do Rio de Janeiro, 21040-900 Rio de
Janeiro, Brazil
| | - Vivian Neuza dos Santos Ferreira
- LMMV-Laboratório de Morfologia e
Morfogênese Viral (LMMV), Instituto Oswaldo Cruz,
Fundação Oswaldo Cruz, 21040-900 Rio de
Janeiro, Brazil
| | - Alice dos Santos Rosa
- LMMV-Laboratório de Morfologia e
Morfogênese Viral (LMMV), Instituto Oswaldo Cruz,
Fundação Oswaldo Cruz, 21040-900 Rio de
Janeiro, Brazil
| | - Amanda Resende Tucci
- LMMV-Laboratório de Morfologia e
Morfogênese Viral (LMMV), Instituto Oswaldo Cruz,
Fundação Oswaldo Cruz, 21040-900 Rio de
Janeiro, Brazil
| | - Milene Dias Miranda
- LMMV-Laboratório de Morfologia e
Morfogênese Viral (LMMV), Instituto Oswaldo Cruz,
Fundação Oswaldo Cruz, 21040-900 Rio de
Janeiro, Brazil
| | - Petra Lukacik
- Diamond Light Source, Harwell Science and
Innovation Campus, OX11 0DE Didcot, U.K.
- Research Complex at Harwell, Harwell
Science & Innovation Campus, OX11 0FA Didcot,
U.K.
| | - Claire Strain-Damerell
- Diamond Light Source, Harwell Science and
Innovation Campus, OX11 0DE Didcot, U.K.
- Research Complex at Harwell, Harwell
Science & Innovation Campus, OX11 0FA Didcot,
U.K.
| | - C. David Owen
- Diamond Light Source, Harwell Science and
Innovation Campus, OX11 0DE Didcot, U.K.
- Research Complex at Harwell, Harwell
Science & Innovation Campus, OX11 0FA Didcot,
U.K.
| | - Martin Austin Walsh
- Diamond Light Source, Harwell Science and
Innovation Campus, OX11 0DE Didcot, U.K.
- Research Complex at Harwell, Harwell
Science & Innovation Campus, OX11 0FA Didcot,
U.K.
| | - Sabrina Baptista Ferreira
- LaSOPB-Laboratório de Síntese
Orgânica e Prospecção Biológica, Instituto de Química,
Universidade Federal do Rio de Janeiro, 21040-900 Rio de
Janeiro, Brazil
| | - Floriano Paes Silva-Junior
- LaBECFar-Laboratório de Bioquímica
Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz,
Fundação Oswaldo Cruz, 21040-900 Rio de
Janeiro, Brazil
| |
Collapse
|
16
|
Chen L, Shen Q, Lou J. Magicmol: a light-weighted pipeline for drug-like molecule evolution and quick chemical space exploration. BMC Bioinformatics 2023; 24:173. [PMID: 37101113 PMCID: PMC10132416 DOI: 10.1186/s12859-023-05286-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Accepted: 04/13/2023] [Indexed: 04/28/2023] Open
Abstract
The flourishment of machine learning and deep learning methods has boosted the development of cheminformatics, especially regarding the application of drug discovery and new material exploration. Lower time and space expenses make it possible for scientists to search the enormous chemical space. Recently, some work combined reinforcement learning strategies with recurrent neural network (RNN)-based models to optimize the property of generated small molecules, which notably improved a batch of critical factors for these candidates. However, a common problem among these RNN-based methods is that several generated molecules have difficulty in synthesizing despite owning higher desired properties such as binding affinity. However, RNN-based framework better reproduces the molecule distribution among the training set than other categories of models during molecule exploration tasks. Thus, to optimize the whole exploration process and make it contribute to the optimization of specified molecules, we devised a light-weighted pipeline called Magicmol; this pipeline has a re-mastered RNN network and utilize SELFIES presentation instead of SMILES. Our backbone model achieved extraordinary performance while reducing the training cost; moreover, we devised reward truncate strategies to eliminate the model collapse problem. Additionally, adopting SELFIES presentation made it possible to combine STONED-SELFIES as a post-processing procedure for specified molecule optimization and quick chemical space exploration.
Collapse
Affiliation(s)
- Lin Chen
- Yangtze Delta Region (Huzhou) Institute of Intelligent Transportation, Huzhou University, Huzhou, China
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, China
| | - Qing Shen
- Yangtze Delta Region (Huzhou) Institute of Intelligent Transportation, Huzhou University, Huzhou, China
- School of Electronic Information, Huzhou College, Huzhou, China
| | - Jungang Lou
- Yangtze Delta Region (Huzhou) Institute of Intelligent Transportation, Huzhou University, Huzhou, China.
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, China.
| |
Collapse
|
17
|
Mucllari E, Zadorozhnyy V, Ye Q, Nguyen DD. Novel Molecular Representations Using Neumann-Cayley Orthogonal Gated Recurrent Unit. J Chem Inf Model 2023; 63:2656-2666. [PMID: 37075324 DOI: 10.1021/acs.jcim.2c01526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/21/2023]
Abstract
Advances in deep neural networks (DNNs) have made a very powerful machine learning method available to researchers across many fields of study, including the biomedical and cheminformatics communities, where DNNs help to improve tasks such as protein performance, molecular design, drug discovery, etc. Many of those tasks rely on molecular descriptors for representing molecular characteristics in cheminformatics. Despite significant efforts and the introduction of numerous methods that derive molecular descriptors, the quantitative prediction of molecular properties remains challenging. One widely used method of encoding molecule features into bit strings is the molecular fingerprint. In this work, we propose using new Neumann-Cayley Gated Recurrent Units (NC-GRU) inside the Neural Nets encoder (AutoEncoder) to create neural molecular fingerprints (NC-GRU fingerprints). The NC-GRU AutoEncoder introduces orthogonal weights into widely used GRU architecture, resulting in faster, more stable training, and more reliable molecular fingerprints. Integrating novel NC-GRU fingerprints and Multi-Task DNN schematics improves the performance of various molecular-related tasks such as toxicity, partition coefficient, lipophilicity, and solvation-free energy, producing state-of-the-art results on several benchmarks.
Collapse
Affiliation(s)
- Edison Mucllari
- Department of Mathematics, University of Kentucky, Lexington, Kentucky 40506, United States
| | - Vasily Zadorozhnyy
- Department of Mathematics, University of Kentucky, Lexington, Kentucky 40506, United States
| | - Qiang Ye
- Department of Mathematics, University of Kentucky, Lexington, Kentucky 40506, United States
| | - Duc Duy Nguyen
- Department of Mathematics, University of Kentucky, Lexington, Kentucky 40506, United States
| |
Collapse
|
18
|
Zhang H, Liang B, Sang X, An J, Huang Z. Discovery of Potential Inhibitors of SARS-CoV-2 Main Protease by a Transfer Learning Method. Viruses 2023; 15:v15040891. [PMID: 37112871 PMCID: PMC10143255 DOI: 10.3390/v15040891] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 03/26/2023] [Accepted: 03/27/2023] [Indexed: 03/31/2023] Open
Abstract
The COVID-19 pandemic caused by SARS-CoV-2 remains a global public health threat and has prompted the development of antiviral therapies. Artificial intelligence may be one of the strategies to facilitate drug development for emerging and re-emerging diseases. The main protease (Mpro) of SARS-CoV-2 is an attractive drug target due to its essential role in the virus life cycle and high conservation among SARS-CoVs. In this study, we used a data augmentation method to boost transfer learning model performance in screening for potential inhibitors of SARS-CoV-2 Mpro. This method appeared to outperform graph convolution neural network, random forest and Chemprop on an external test set. The fine-tuned model was used to screen for a natural compound library and a de novo generated compound library. By combination with other in silico analysis methods, a total of 27 compounds were selected for experimental validation of anti-Mpro activities. Among all the selected hits, two compounds (gyssypol acetic acid and hyperoside) displayed inhibitory effects against Mpro with IC50 values of 67.6 μM and 235.8 μM, respectively. The results obtained in this study may suggest an effective strategy of discovering potential therapeutic leads for SARS-CoV-2 and other coronaviruses.
Collapse
|
19
|
Ngo ST, Nguyen TH, Tung NT, Vu VV, Pham MQ, Mai BK. Characterizing the ligand-binding affinity toward SARS-CoV-2 Mpro via physics- and knowledge-based approaches. Phys Chem Chem Phys 2022; 24:29266-29278. [PMID: 36449268 DOI: 10.1039/d2cp04476e] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Computational approaches, including physics- and knowledge-based methods, have commonly been used to determine the ligand-binding affinity toward SARS-CoV-2 main protease (Mpro or 3CLpro). Strong binding ligands can thus be suggested as potential inhibitors for blocking the biological activity of the protease. In this context, this paper aims to provide a short review of computational approaches that have recently been applied in the search for inhibitor candidates of Mpro. In particular, molecular docking and molecular dynamics (MD) simulations are usually combined to predict the binding affinity of thousands of compounds. Quantitative structure-activity relationship (QSAR) is the least computationally demanding and therefore can be used for large chemical collections of ligands. However, its accuracy may not be high. Moreover, the quantum mechanics/molecular mechanics (QM/MM) method is most commonly used for covalently binding inhibitors, which also play an important role in inhibiting the activity of SARS-CoV-2. Furthermore, machine learning (ML) models can significantly increase the searching space of ligands with high accuracy for binding affinity prediction. Physical insights into the binding process can then be confirmed via physics-based calculations. Integration of ML models into computational chemistry provides many more benefits and can lead to new therapies sooner.
Collapse
Affiliation(s)
- Son Tung Ngo
- Laboratory of Theoretical and Computational Biophysics, Advanced Institute of Materials Science, Ton Duc Thang University, Ho Chi Minh City, Vietnam. .,Faculty of Pharmacy, Ton Duc Thang University, Ho Chi Minh City, Vietnam
| | - Trung Hai Nguyen
- Laboratory of Theoretical and Computational Biophysics, Advanced Institute of Materials Science, Ton Duc Thang University, Ho Chi Minh City, Vietnam. .,Faculty of Pharmacy, Ton Duc Thang University, Ho Chi Minh City, Vietnam
| | - Nguyen Thanh Tung
- Institute of Materials Science, Vietnam Academy of Science and Technology, Hanoi, Vietnam. .,Graduate University of Science and Technology, Vietnam Academy of Science and Technology, Hanoi, Vietnam
| | - Van V Vu
- NTT Hi-Tech Institute, Nguyen Tat Thanh University, Ho Chi Minh City, Vietnam
| | - Minh Quan Pham
- Graduate University of Science and Technology, Vietnam Academy of Science and Technology, Hanoi, Vietnam.,Institute of Natural Products Chemistry, Vietnam Academy of Science and Technology, Hanoi, Vietnam
| | - Binh Khanh Mai
- Department of Chemistry, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
20
|
Wang M, Wang J, Weng G, Kang Y, Pan P, Li D, Deng Y, Li H, Hsieh CY, Hou T. ReMODE: a deep learning-based web server for target-specific drug design. J Cheminform 2022; 14:84. [PMID: 36510307 PMCID: PMC9743675 DOI: 10.1186/s13321-022-00665-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 12/01/2022] [Indexed: 12/14/2022] Open
Abstract
Deep learning (DL) and machine learning contribute significantly to basic biology research and drug discovery in the past few decades. Recent advances in DL-based generative models have led to superior developments in de novo drug design. However, data availability, deep data processing, and the lack of user-friendly DL tools and interfaces make it difficult to apply these DL techniques to drug design. We hereby present ReMODE (Receptor-based MOlecular DEsign), a new web server based on DL algorithm for target-specific ligand design, which integrates different functional modules to enable users to develop customizable drug design tasks. As designed, the ReMODE sever can construct the target-specific tasks toward the protein targets selected by users. Meanwhile, the server also provides some extensions: users can optimize the drug-likeness or synthetic accessibility of the generated molecules, and control other physicochemical properties; users can also choose a sub-structure/scaffold as a starting point for fragment-based drug design. The ReMODE server also enables users to optimize the pharmacophore matching and docking conformations of the generated molecules. We believe that the ReMODE server will benefit researchers for drug discovery. ReMODE is publicly available at http://cadd.zju.edu.cn/relation/remode/ .
Collapse
Affiliation(s)
- Mingyang Wang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China ,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018 Zhejiang People’s Republic of China
| | - Jike Wang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China ,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018 Zhejiang People’s Republic of China
| | - Gaoqi Weng
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China ,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018 Zhejiang People’s Republic of China
| | - Yu Kang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| | - Peichen Pan
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| | - Dan Li
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018 Zhejiang People’s Republic of China
| | - Honglin Li
- grid.28056.390000 0001 2163 4895Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237 People’s Republic of China
| | - Chang-Yu Hsieh
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| | - Tingjun Hou
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| |
Collapse
|
21
|
Noguchi S, Inoue J. Exploration of Chemical Space Guided by PixelCNN for Fragment-Based De Novo Drug Discovery. J Chem Inf Model 2022; 62:5988-6001. [PMID: 36454646 DOI: 10.1021/acs.jcim.2c01345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
We report a novel framework for achieving fragment-based molecular design using pixel convolutional neural network (PixelCNN) combined with the simplified molecular input line entry system (SMILES) as molecular representation. While a widely used recurrent neural network (RNN) assumes monotonically decaying correlations in strings, PixelCNN captures a periodicity among characters of SMILES. Thus, PixelCNN provides us with a novel solution for the analysis of chemical space by extracting the periodicity of molecular structures that will be buried in SMILES. Moreover, this characteristic enables us to generate molecules by combining several simple building blocks, such as a benzene ring and side-chain structures, which contributes to the effective exploration of chemical space by step-by-step searching for molecules from a target fragment. In conclusion, PixelCNN could be a powerful approach focusing on the periodicity of molecules to explore chemical space for the fragment-based molecular design.
Collapse
Affiliation(s)
- Satoshi Noguchi
- Department of Advanced Interdisciplinary Studies, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo153-8904, Japan
| | - Junya Inoue
- Institute for Industrial Science, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba277-0082, Japan.,Department of Materials Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo113-8656, Japan.,Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo153-8904, Japan
| |
Collapse
|
22
|
Akbasheva OE, Spirina LV, Dyakov DA, Masunova NV. Proteolysis and Deficiency of α1-Proteinase Inhibitor in SARS-CoV-2 Infection. BIOCHEMISTRY (MOSCOW) SUPPLEMENT. SERIES B, BIOMEDICAL CHEMISTRY 2022; 16:271-291. [PMID: 36407837 PMCID: PMC9668222 DOI: 10.1134/s1990750822040035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 03/30/2022] [Accepted: 04/11/2022] [Indexed: 11/17/2022]
Abstract
The SARS-CoV-2 pandemic had stimulated the emergence of numerous publications on the α1-proteinase inhibitor (α1-PI, α1-antitrypsin), especially when it was found that the regions of high mortality corresponded to the regions with deficient α1-PI alleles. By analogy with the data obtained in the last century, when the first cause of the genetic deficiency of α1-antitrypsin leading to elastase activation in pulmonary emphysema was proven, it can be supposed that proteolysis hyperactivation in COVID-19 may be associated with the impaired functions of α1-PI. The purpose of this review was to systematize the scientific data and critical directions for translational studies on the role of α1-PI in SARS-CoV-2-induced proteolysis hyperactivation as a diagnostic marker and a therapeutic target. This review describes the proteinase-dependent stages of viral infection: the reception and penetration of the virus into a cell and the imbalance of the plasma aldosterone-angiotensin-renin, kinin, and blood clotting systems. The role of ACE2, TMPRSS, ADAM17, furin, cathepsins, trypsin- and elastase-like serine proteinases in the virus tropism, the activation of proteolytic cascades in blood, and the COVID-19-dependent complications is considered. The scientific reports on α1-PI involvement in the SARS-CoV-2-induced inflammation, the relationship with the severity of infection and comorbidities were analyzed. Particular attention is paid to the acquired α1-PI deficiency in assessing the state of patients with proteolysis overactivation and chronic non-inflammatory diseases, which are accompanied by the risk factors for comorbidity progression and the long-term consequences of COVID-19. Essential data on the search and application of protease inhibitor drugs in the therapy for bronchopulmonary and cardiovascular pathologies were analyzed. The evidence of antiviral, anti-inflammatory, anticoagulant, and anti-apoptotic effects of α1-PI, as well as the prominent data and prospects for its application as a targeted drug in the SARS-CoV-2 acquired pneumonia and related disorders, are presented.
Collapse
Affiliation(s)
| | - L. V. Spirina
- Siberian State Medical University, 634050 Tomsk, Russia
- Cancer Research Institute, Tomsk National Research Medical Center, 634009 Tomsk, Russia
| | - D. A. Dyakov
- Siberian State Medical University, 634050 Tomsk, Russia
| | | |
Collapse
|
23
|
Chadi MA, Mousannif H, Aamouche A. Conditional reduction of the loss value versus reinforcement learning for biassing a de-novo drug design generator. J Cheminform 2022; 14:65. [PMID: 36167559 PMCID: PMC9516832 DOI: 10.1186/s13321-022-00643-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 09/07/2022] [Indexed: 11/10/2022] Open
Abstract
Deep learning has demonstrated promising results in de novo drug design. Often, the general pipeline consists of training a generative model (G) to learn the building rules of valid molecules, then using a biassing technique such as reinforcement learning (RL) to focus G on the desired chemical space. However, this sequential training of the same model for different tasks is known to be prone to a catastrophic forgetting (CF) phenomenon. This work presents a novel yet simple approach to bias G with significantly less CF than RL. The proposed method relies on backpropagating a reduced value of the cross-entropy loss used to train G according to the proportion of desired molecules that the biased-G can generate. We named our approach CRLV, short for conditional reduction of the loss value. We compared the two biased models (RL-biased-G and CRLV-biased-G) for four different objectives related to de novo drug design.CRLV-biased-G outperformed RL-biased-G in all four objectives and manifested appreciably less CF. Besides, an intersection analysis between molecules generated by the RL-biased-G and the CRLV-biased-G revealed that they can be used jointly without losing diversity given the low percentage of overlap between the two to further increase the desirability. Finally, we show that the difficulty of an objective is proportional to (i) its frequency in the dataset used to train G and (ii) the associated structural variance (SV), which is a new parameter we introduced in this paper, calling for novel exploration techniques for such difficult objectives.
Collapse
Affiliation(s)
- Mohamed-Amine Chadi
- Laboratoire Ingénierie des Systems Informatiques (LISI), Department of Computer Science, Faculty of Sciences Semlalia, Cadi Ayyad University, 40000, Marrakech, Morocco.
| | - Hajar Mousannif
- Laboratoire Ingénierie des Systems Informatiques (LISI), Department of Computer Science, Faculty of Sciences Semlalia, Cadi Ayyad University, 40000, Marrakech, Morocco
| | - Ahmed Aamouche
- Laboratoire Ingénierie des Systèmes et Applications (LISA), Ecole Nationale des Sciences Appliquées de Marrakech, Cadi Ayyad University, BP 575, Avenue Abdelkrim Khattabi, 40000, Marrakech, Morocco
| |
Collapse
|
24
|
Singh S, Sunoj RB. A Transfer Learning Approach for Reaction Discovery in Small Data Situations Using Generative Model. iScience 2022; 25:104661. [PMID: 35832891 PMCID: PMC9272387 DOI: 10.1016/j.isci.2022.104661] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 05/20/2022] [Accepted: 06/16/2022] [Indexed: 11/01/2022] Open
Abstract
Sustainable practices in chemical sciences can be better realized by adopting interdisciplinary approaches that combine the advantages of machine learning (ML) on the initially acquired small data in reaction discovery. Developing new reactions generally remains heuristic and even time and resource intensive. For instance, synthesis of fluorine-containing compounds, which constitute ∼20% of the marketed drugs, relies on deoxyfluorination of abundantly available alcohols. Herein, we demonstrate the use of a recurrent neural network-based deep generative model built on a library of just 37 alcohols for effective learning and exploration of the chemical space. The proof-of-concept ML model is able to generate good quality, synthetically accessible, higher-yielding novel alcohol molecules. This protocol would have superior utility for deployment into a practical reaction discovery pipeline. Dual pronged transfer learning, both to generate and predict yields of new molecules Demonstrated the utility for an important family of deoxyfluorination of alcohols Applicable for practically more likely situations with relatively smaller data Extendable to other reaction manifolds to facilitate expedited reaction discovery
Collapse
|
25
|
Akbasheva OE, Spirina LV, Dyakov DA, Masunova NV. [Proteolysis and deficiency of α1-proteinase inhibitor in SARS-CoV-2 infection]. BIOMEDITSINSKAIA KHIMIIA 2022; 68:157-176. [PMID: 35717581 DOI: 10.18097/pbmc20226803157] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The SARS-CoV-2 pandemia had stimulated the numerous publications emergence on the α1-proteinase inhibitor (α1-PI, α1-antitrypsin), primarily when it was found that high mortality in some regions corresponded to the regions with deficient α1-PI alleles. By analogy with the last century's data, when the root cause of the α1-antitrypsin, genetic deficiency leading to the elastase activation in pulmonary emphysema, was proven. It is evident that proteolysis hyperactivation in COVID-19 may be associated with α1-PI impaired functions. The purpose of this review is to systematize scientific data, critical directions for translational studies on the role of α1-PI in SARS-CoV-2-induced proteolysis hyperactivation as a diagnostic marker and a target in therapy. This review describes the proteinase-dependent stages of a viral infection: the reception and virus penetration into the cell, the plasma aldosterone-angiotensin-renin, kinins, blood clotting systems imbalance. The ACE2, TMPRSS, ADAM17, furin, cathepsins, trypsin- and elastase-like serine proteinases role in the virus tropism, proteolytic cascades activation in blood, and the COVID-19-dependent complications is presented. The analysis of scientific reports on the α1-PI implementation in the SARS-CoV-2-induced inflammation, the links with the infection severity, and comorbidities were carried out. Particular attention is paid to the acquired α1-PI deficiency in assessing the patients with the proteolysis overactivation and chronic non-inflammatory diseases that are accompanied by the risk factors for the comorbidities progression, and the long-term consequences of COVID-19 initiation. Analyzed data on the search and proteases inhibitory drugs usage in the bronchopulmonary cardiovascular pathologies therapy are essential. It becomes evident the antiviral, anti-inflammatory, anticoagulant, anti-apoptotic effect of α1-PI. The prominent data and prospects for its application as a targeted drug in the SARS-CoV-2 acquired pneumonia and related disorders are presented.
Collapse
Affiliation(s)
| | - L V Spirina
- Siberian State Medical University, Tomsk, Russia; Cancer Research Institute, Tomsk National Research Medical Center, Tomsk, Russia
| | - D A Dyakov
- Siberian State Medical University, Tomsk, Russia
| | - N V Masunova
- Siberian State Medical University, Tomsk, Russia
| |
Collapse
|
26
|
Saldívar-González FI, Medina-Franco JL. Approaches for enhancing the analysis of chemical space for drug discovery. Expert Opin Drug Discov 2022; 17:789-798. [PMID: 35640229 DOI: 10.1080/17460441.2022.2084608] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
INTRODUCTION Chemical space is a powerful, general, and practical conceptual framework in drug discovery and other areas in chemistry that addresses the diversity of molecules and it has various applications. Moreover, chemical space is a cornerstone of chemoinformatics as a scientific discipline. In response to the increase in the set of chemical compounds in databases, generators of chemical structures, and tools to calculate molecular descriptors, novel approaches to generate visual representations of chemical space in low dimensions are emerging and evolving. Such approaches include a wide range of commercial and free applications, software, and open-source methods. AREAS COVERED The current state of chemical space in drug design and discovery is reviewed. The topics discussed herein include advances for efficient navigation in chemical space, the use of this concept in assessing the diversity of different data sets, exploring structure-property/activity relationships for one or multiple endpoints, and compound library design. Recent advances in methodologies for generating visual representations of chemical space have been highlighted, thereby emphasizing open-source methods. EXPERT OPINION Quantitative and qualitative generation and analysis of chemical space require novel approaches for handling the increasing number of molecules and their information available in chemical databases (including emerging ultra-large libraries). In addition, it is of utmost importance to note that chemical space is a conceptual framework that goes beyond visual representation in low dimensions. However, the graphical representation of chemical space has several practical applications in drug discovery and beyond.
Collapse
Affiliation(s)
- Fernanda I Saldívar-González
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| |
Collapse
|
27
|
Simončič M, Lukšič M, Druchok M. Machine learning assessment of the binding region as a tool for more efficient computational receptor-ligand docking. J Mol Liq 2022; 353:118759. [PMID: 35273421 PMCID: PMC8903148 DOI: 10.1016/j.molliq.2022.118759] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
We present a combined computational approach to protein-ligand binding, which consists of two steps: (1) a deep neural network is used to locate a binding region on a target protein, and (2) molecular docking of a ligand is performed within the specified region to obtain the best pose using Autodock Vina. Our in-house designed neural network was trained using the PepBDB dataset. Although the training dataset consisted of protein-peptide complexes, we show that the approach is not limited to peptides, but also works remarkably well for a large class of non-peptide ligands. The results are compared with those in which the binding region (first step) was provided by Accluster. In cases where no prior experimental data on the binding region are available, our deep neural network provides a fast and effective alternative to classical software for its localization. Our code is available at https://github.com/mksmd/NNforDocking.
Collapse
Affiliation(s)
- Matjaž Simončič
- Faculty of Chemistry and Chemical Technology, University of Ljubljana, Večna pot 113, SI-1000 Ljubljana, Slovenia
| | - Miha Lukšič
- Faculty of Chemistry and Chemical Technology, University of Ljubljana, Večna pot 113, SI-1000 Ljubljana, Slovenia
| | - Maksym Druchok
- Institute for Condensed Matter Physics, 1 Svientsitskii Str., UA-79011 Lviv, Ukraine
- SoftServe Inc., 2d Sadova Str., UA-79021 Lviv, Ukraine
| |
Collapse
|
28
|
Bhatnagar R, Sardar S, Beheshti M, Podichetty JT. How can natural language processing help model informed drug development?: a review. JAMIA Open 2022; 5:ooac043. [PMID: 35702625 PMCID: PMC9188322 DOI: 10.1093/jamiaopen/ooac043] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/28/2022] [Accepted: 05/26/2022] [Indexed: 01/20/2023] Open
Abstract
Objective To summarize applications of natural language processing (NLP) in model informed drug development (MIDD) and identify potential areas of improvement. Materials and Methods Publications found on PubMed and Google Scholar, websites and GitHub repositories for NLP libraries and models. Publications describing applications of NLP in MIDD were reviewed. The applications were stratified into 3 stages: drug discovery, clinical trials, and pharmacovigilance. Key NLP functionalities used for these applications were assessed. Programming libraries and open-source resources for the implementation of NLP functionalities in MIDD were identified. Results NLP has been utilized to aid various processes in drug development lifecycle such as gene-disease mapping, biomarker discovery, patient-trial matching, adverse drug events detection, etc. These applications commonly use NLP functionalities of named entity recognition, word embeddings, entity resolution, assertion status detection, relation extraction, and topic modeling. The current state-of-the-art for implementing these functionalities in MIDD applications are transformer models that utilize transfer learning for enhanced performance. Various libraries in python, R, and Java like huggingface, sparkNLP, and KoRpus as well as open-source platforms such as DisGeNet, DeepEnroll, and Transmol have enabled convenient implementation of NLP models to MIDD applications. Discussion Challenges such as reproducibility, explainability, fairness, limited data, limited language-support, and security need to be overcome to ensure wider adoption of NLP in MIDD landscape. There are opportunities to improve the performance of existing models and expand the use of NLP in newer areas of MIDD. Conclusions This review provides an overview of the potential and pitfalls of current NLP approaches in MIDD.
Collapse
Affiliation(s)
- Roopal Bhatnagar
- Data Science, Data Collaboration Center, Critical Path Institute , Tucson, Arizona, USA
| | - Sakshi Sardar
- Quantitative Medicine, Critical Path Institute , Tucson, Arizona, USA
| | - Maedeh Beheshti
- Quantitative Medicine, Critical Path Institute , Tucson, Arizona, USA
| | | |
Collapse
|
29
|
Floresta G, Zagni C, Gentile D, Patamia V, Rescifina A. Artificial Intelligence Technologies for COVID-19 De Novo Drug Design. Int J Mol Sci 2022; 23:ijms23063261. [PMID: 35328682 PMCID: PMC8949797 DOI: 10.3390/ijms23063261] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 03/14/2022] [Accepted: 03/15/2022] [Indexed: 12/16/2022] Open
Abstract
The recent covid crisis has provided important lessons for academia and industry regarding digital reorganization. Among the fascinating lessons from these times is the huge potential of data analytics and artificial intelligence. The crisis exponentially accelerated the adoption of analytics and artificial intelligence, and this momentum is predicted to continue into the 2020s and beyond. Drug development is a costly and time-consuming business, and only a minority of approved drugs generate returns exceeding the research and development costs. As a result, there is a huge drive to make drug discovery cheaper and faster. With modern algorithms and hardware, it is not too surprising that the new technologies of artificial intelligence and other computational simulation tools can help drug developers. In only two years of covid research, many novel molecules have been designed/identified using artificial intelligence methods with astonishing results in terms of time and effectiveness. This paper reviews the most significant research on artificial intelligence in de novo drug design for COVID-19 pharmaceutical research.
Collapse
|
30
|
Martinelli DD. Generative machine learning for de novo drug discovery: A systematic review. Comput Biol Med 2022; 145:105403. [PMID: 35339849 DOI: 10.1016/j.compbiomed.2022.105403] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 03/10/2022] [Accepted: 03/11/2022] [Indexed: 02/08/2023]
Abstract
Recent research on artificial intelligence indicates that machine learning algorithms can auto-generate novel drug-like molecules. Generative models have revolutionized de novo drug discovery, rendering the explorative process more efficient. Several model frameworks and input formats have been proposed to enhance the performance of intelligent algorithms in generative molecular design. In this systematic literature review of experimental articles and reviews over the last five years, machine learning models, challenges associated with computational molecule design along with proposed solutions, and molecular encoding methods are discussed. A query-based search of the PubMed, ScienceDirect, Springer, Wiley Online Library, arXiv, MDPI, bioRxiv, and IEEE Xplore databases yielded 87 studies. Twelve additional studies were identified via citation searching. Of the articles in which machine learning was implemented, six prominent algorithms were identified: long short-term memory recurrent neural networks (LSTM-RNNs), variational autoencoders (VAEs), generative adversarial networks (GANs), adversarial autoencoders (AAEs), evolutionary algorithms, and gated recurrent unit (GRU-RNNs). Furthermore, eight central challenges were designated: homogeneity of generated molecular libraries, deficient synthesizability, limited assay data, model interpretability, incapacity for multi-property optimization, incomparability, restricted molecule size, and uncertainty in model evaluation. Molecules were encoded either as strings, which were occasionally augmented using randomization, as 2D graphs, or as 3D graphs. Statistical analysis and visualization are performed to illustrate how approaches to machine learning in de novo drug design have evolved over the past five years. Finally, future opportunities and reservations are discussed.
Collapse
|
31
|
Nikolaienko T, Gurbych O, Druchok M. Complex machine learning model needs complex testing: Examining predictability of molecular binding affinity by a graph neural network. J Comput Chem 2022; 43:728-739. [PMID: 35201629 DOI: 10.1002/jcc.26831] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 01/04/2022] [Accepted: 02/09/2022] [Indexed: 12/12/2022]
Abstract
Drug discovery pipelines typically involve high-throughput screening of large amounts of compounds in a search of potential drugs candidates. As a chemical space of small organic molecules is huge, a "navigation" over it urges for fast and lightweight computational methods, thus promoting machine-learning approaches for processing huge pools of candidates. In this contribution, we present a graph-based deep neural network for prediction of protein-drug binding affinity and assess its predictive power under thorough testing conditions. Within the suggested approach, both protein and drug molecules are represented as graphs and passed to separate graph sub-networks, then concatenated and regressed towards a binding affinity. The neural network is trained on two binding affinity datasets-PDBbind and data imported from RCSB Protein Data Bank. In order to explore the generalization capabilities of the model we go beyond traditional random or leave-cluster-out techniques and demonstrate the need for more elaborate model performance assessment - six different strategies for test/train data partitioning (random, time- and property-arranged, protein- and ligand-clustered) with a k-fold cross-validation are engaged. Finally, we discuss the model performance in terms of a set of metrics for different split strategies and fold arrangement. Our code is available at https://github.com/SoftServeInc/affinity-by-GNN.
Collapse
Affiliation(s)
- Tymofii Nikolaienko
- SoftServe, Inc., Lviv, Ukraine.,Faculty of Physics, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Oleksandr Gurbych
- Blackthorn AI Ltd., London, UK.,Department of Artificial Intelligence Systems, Lviv Polytechnic National University, Lviv, Ukraine
| | - Maksym Druchok
- SoftServe, Inc., Lviv, Ukraine.,Institute for Condensed Matter Physics, NAS of Ukraine, Lviv, Ukraine
| |
Collapse
|
32
|
Wang J, Ishchenko A, Zhang W, Razavi A, Langley D. A highly accurate metadynamics-based Dissociation Free Energy method to calculate protein-protein and protein-ligand binding potencies. Sci Rep 2022; 12:2024. [PMID: 35132139 PMCID: PMC8821539 DOI: 10.1038/s41598-022-05875-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 01/17/2022] [Indexed: 12/13/2022] Open
Abstract
Although seeking to develop a general and accurate binding free energy calculation method for protein-protein and protein-ligand interactions has been a continuous effort for decades, only limited successes have been obtained so far. Here, we report the development of a metadynamics-based procedure that calculates Dissociation Free Energy (DFE) and its application to 19 non-congeneric protein-protein complexes and hundreds of protein-ligand complexes covering eight targets. We achieved very high correlations in comparison to experimental binding free energies for these diverse sets of systems, demonstrating the generality and accuracy of the method. Since structures of most proteins are available owing to the recent success of prediction by artificial intelligence, a general free energy method such as DFE, combined with other methods, can make structure-based drug design a widely viable and reliable solution to develop both traditional small molecule drugs and biologic drugs as well as PROTACS.
Collapse
Affiliation(s)
- Jing Wang
- Arvinas, Inc., 5 Science Park, New Haven, CT, 06511, USA.
| | | | - Wei Zhang
- Arvinas, Inc., 5 Science Park, New Haven, CT, 06511, USA
| | - Asghar Razavi
- Arvinas, Inc., 5 Science Park, New Haven, CT, 06511, USA
| | - David Langley
- Arvinas, Inc., 5 Science Park, New Haven, CT, 06511, USA
| |
Collapse
|
33
|
Bucinsky L, Bortňák D, Gall M, Matúška J, Milata V, Pitoňák M, Štekláč M, Végh D, Zajaček D. Machine learning prediction of 3CLpro SARS-CoV-2 docking scores. Comput Biol Chem 2022; 98:107656. [PMID: 35288359 PMCID: PMC8881816 DOI: 10.1016/j.compbiolchem.2022.107656] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 02/23/2022] [Accepted: 02/24/2022] [Indexed: 12/14/2022]
Abstract
Molecular docking results of two training sets containing 866 and 8,696 compounds were used to train three different machine learning (ML) approaches. Neural network approaches according to Keras and TensorFlow libraries and the gradient boosted decision trees approach of XGBoost were used with DScribe’s Smooth Overlap of Atomic Positions molecular descriptors. In addition, neural networks using the SchNetPack library and descriptors were used. The ML performance was tested on three different sets, including compounds for future organic synthesis. The final evaluation of the ML predicted docking scores was based on the ZINC in vivo set, from which 1,200 compounds were randomly selected with respect to their size. The results obtained showed a consistent ML prediction capability of docking scores, and even though compounds with more than 60 atoms were found slightly overestimated they remain valid for a subsequent evaluation of their drug repurposing suitability.
Collapse
|
34
|
Liang S, Liu X, Zhang S, Li M, Zhang Q, Chen J. Binding mechanism of inhibitors to SARS-CoV-2 main protease deciphered by multiple replica molecular dynamics simulations. Phys Chem Chem Phys 2022; 24:1743-1759. [PMID: 34985081 DOI: 10.1039/d1cp04361g] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The outbreak caused by SARS-CoV-2 has received extensive worldwide attention. As the main protease (Mpro) in SARS-CoV-2 has no human homologues, it is feasible to reduce the possibility of targeting the host protein by accidental drugs. Thus, Mpro has been an attractive target of efficient drug design for anti-SARS-CoV-2 treatment. In this work, multiple replica molecular dynamics (MRMD) simulations, principal component analysis (PCA), free energy landscapes (FELs), and the molecular mechanics-generalized Born surface area (MM-GBSA) method were integrated together to decipher the binding mechanism of four inhibitors masitinib, O6K, FJC and GQU to Mpro. The results indicate that the binding of four inhibitors clearly affects the structural flexibility and internal dynamics of Mpro along with dihedral angle changes of key residues. The analysis of FELs unveils that the stability in the relative orientation and geometric position of inhibitors to Mpro is favorable for inhibitor binding. Residue-based free energy decomposition reveals that the inhibitor-Mpro interaction networks involving hydrogen bonding interactions and hydrophobic interactions provide significant information for the design of potent inhibitors against Mpro. The hot spot residues including H41, M49, F140, N142, G143, C145, H163, H164, M165, E166 and Q189 identified by computational alanine scanning are considered as reliable targets of clinically available inhibitors inhibiting the activities of Mpro.
Collapse
Affiliation(s)
- Shanshan Liang
- School of Physics and Electronics, Shandong Normal University, Jinan, 250358, China.
| | - Xinguo Liu
- School of Physics and Electronics, Shandong Normal University, Jinan, 250358, China.
| | - Shaolong Zhang
- School of Physics and Electronics, Shandong Normal University, Jinan, 250358, China.
| | - Meng Li
- School of Physics and Electronics, Shandong Normal University, Jinan, 250358, China.
| | - Qinggang Zhang
- School of Physics and Electronics, Shandong Normal University, Jinan, 250358, China.
| | - Jianzhong Chen
- School of Science, Shandong Jiaotong University, Jinan, 250357, China.
| |
Collapse
|
35
|
Santana MV, Silva-Jr FP. Artificial intelligence methods to repurpose and discover new drugs to fight the Coronavirus disease-2019 pandemic. COMPUTATIONAL APPROACHES FOR NOVEL THERAPEUTIC AND DIAGNOSTIC DESIGNING TO MITIGATE SARS-COV-2 INFECTION 2022. [PMCID: PMC9300478 DOI: 10.1016/b978-0-323-91172-6.00016-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The Coronavirus disease 2019 pandemic struck the world at the end of 2019 and, as of 2021, there are no specific drugs available against the causative agent, the severe acute respiratory syndrome-Coronavirus-2 (SARS-CoV-2). From the onset of the pandemic, researchers have been trying to find drugs among the current therapeutic arsenal that could target crucial viral function, and many of these efforts resulted in clinical trials to repurpose a drug for this new indication. In this scenario, artificial intelligence (AI) is of fundamental importance, allowing academia and pharmaceutical companies to accelerate the discovery of biochemical insights from the chemical and biological information available in literature databases. This chapter will cover some AI methods that are being explored to repurpose drugs against SARS-CoV-2. It will be outlined how these methods work followed by a discussion of selected examples applying them to identify promising drugs.
Collapse
|
36
|
Zia SR. Identification of Potential Ligands of the Main Protease of Coronavirus SARS-CoV-2 (2019-nCoV) Using Multimodal Generative Neural-Networks. FRENCH-UKRAINIAN JOURNAL OF CHEMISTRY 2022. [DOI: 10.17721/fujcv10i1p30-47] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The recent outbreak of coronavirus disease 2019 (COVID-19) is posing a global threat to human population. The pandemic caused by novel coronavirus (2019-nCoV), also called as severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2); first emerged in Wuhan city, Hubei province of China in December 2019. The rapid human to human transmission has caused the contagion to spread world-wide affecting 244,385,444 (244.4 million) people globally causing 4,961,489 (5 million) fatalities dated by 27 October 2021. At present, 6,697,607,393 (6.7 billion) vaccine doses have been administered dated by 27 October 2021, for the prevention of COVID-19 infections. Even so, this critical and threatening situation of pandemic and due to various variants’ emergence, the pandemic control has become challenging; this calls for gigantic efforts to find new potent drug candidates and effective therapeutic approaches against the virulent respiratory disease of COVID-19. In the respiratory morbidities of COVID-19, the functionally crucial drug target for the antiviral treatment could be the main protease/3-chymotrypsin protease (Mpro/3CLpro) enzyme that is primarily involved in viral maturation and replication. In view of this, in the current study I have designed a library of small molecules against the main protease (Mpro) of coronavirus SARS-CoV-2 (2019-nCoV) by using multimodal generative neural-networks. The scaffold-based molecular docking of the series of compounds at the active site of the protein was performed; binding poses of the molecules were evaluated and protein-ligand interaction studies followed by the binding affinity calculations validated the findings. I have identified a number of small promising lead compounds that could serve as potential inhibitors of the main protease (Mpro) enzyme of coronavirus SARS-CoV-2 (2019-nCoV). This study would serve as a step forward in the development of effective antiviral therapeutic agents against the COVID-19.
Collapse
|
37
|
Wang M, Sun H, Wang J, Pang J, Chai X, Xu L, Li H, Cao D, Hou T. Comprehensive assessment of deep generative architectures for de novo drug design. Brief Bioinform 2021; 23:6470970. [PMID: 34929743 DOI: 10.1093/bib/bbab544] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 11/24/2021] [Accepted: 11/25/2021] [Indexed: 01/20/2023] Open
Abstract
Recently, deep learning (DL)-based de novo drug design represents a new trend in pharmaceutical research, and numerous DL-based methods have been developed for the generation of novel compounds with desired properties. However, a comprehensive understanding of the advantages and disadvantages of these methods is still lacking. In this study, the performances of different generative models were evaluated by analyzing the properties of the generated molecules in different scenarios, such as goal-directed (rediscovery, optimization and scaffold hopping of active compounds) and target-specific (generation of novel compounds for a given target) tasks. In overall, the DL-based models have significant advantages over the baseline models built by the traditional methods in learning the physicochemical property distributions of the training sets and may be more suitable for target-specific tasks. However, both the baselines and DL-based generative models cannot fully exploit the scaffolds of the training sets, and the molecules generated by the DL-based methods even have lower scaffold diversity than those generated by the traditional models. Moreover, our assessment illustrates that the DL-based methods do not exhibit obvious advantages over the genetic algorithm-based baselines in goal-directed tasks. We believe that our study provides valuable guidance for the effective use of generative models in de novo drug design.
Collapse
Affiliation(s)
- Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Huiyong Sun
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009, Jiangsu, P. R. China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Jinping Pang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Xin Chai
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, Jiangsu, China
| | - Honglin Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai 200237, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| |
Collapse
|
38
|
Plavec Z, Pöhner I, Poso A, Butcher SJ. Virus structure and structure-based antivirals. Curr Opin Virol 2021; 51:16-24. [PMID: 34564030 PMCID: PMC8460353 DOI: 10.1016/j.coviro.2021.09.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 08/13/2021] [Accepted: 09/12/2021] [Indexed: 01/18/2023]
Abstract
Structure-based antiviral developments in the past two years have been dominated by the structure determination and inhibition of SARS-CoV-2 proteins and new lead molecules for picornaviruses. The SARS-CoV-2 spike protein has been targeted successfully with antibodies, nanobodies, and receptor protein mimics effectively blocking receptor binding or fusion. The two most promising non-structural proteins sharing strong structural and functional conservation across virus families are the main protease and the RNA-dependent RNA polymerase, for which design and reuse of broad range inhibitors already approved for use has been an attractive avenue. For picornaviruses, the increasing recognition of the transient expansion of the capsid as a critical transition towards RNA release has been targeted through a newly identified, apparently widely conserved, druggable, interprotomer pocket preventing viral entry. We summarize some of the key papers in these areas and ponder the practical uses and contributions of molecular modeling alongside empirical structure determination.
Collapse
Affiliation(s)
- Zlatka Plavec
- Faculty of Biological and Environmental Sciences, Molecular and Integrative Bioscience Research Programme, University of Helsinki, Helsinki, Finland; Helsinki Institute of Life Sciences-Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
| | - Ina Pöhner
- School of Pharmacy, University of Eastern Finland, Kuopio, Finland
| | - Antti Poso
- School of Pharmacy, University of Eastern Finland, Kuopio, Finland; University Hospital Tübingen, Department of Internal Medicine VII, Tübingen, Germany
| | - Sarah J Butcher
- Faculty of Biological and Environmental Sciences, Molecular and Integrative Bioscience Research Programme, University of Helsinki, Helsinki, Finland; Helsinki Institute of Life Sciences-Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
39
|
Arshia AH, Shadravan S, Solhjoo A, Sakhteman A, Sami A. De novo design of novel protease inhibitor candidates in the treatment of SARS-CoV-2 using deep learning, docking, and molecular dynamic simulations. Comput Biol Med 2021; 139:104967. [PMID: 34739968 PMCID: PMC8545757 DOI: 10.1016/j.compbiomed.2021.104967] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 10/15/2021] [Accepted: 10/19/2021] [Indexed: 12/15/2022]
Abstract
The main protease of SARS-CoV-2 is a critical target for the design and development of antiviral drugs. 2.5 M compounds were used in this study to train an LSTM generative network via transfer learning in order to identify the four best candidates capable of inhibiting the main proteases in SARS-CoV-2. The network was fine-tuned over ten generations, with each generation resulting in higher binding affinity scores. The binding affinities and interactions between the selected candidates and the SARS-CoV-2 main protease are predicted using a molecular docking simulation using AutoDock Vina. The compounds selected have a strong interaction with the key MET 165 and Cys145 residues. Molecular dynamics (MD) simulations were run for 150ns to validate the docking results on the top four ligands. Additionally, root-mean-square deviation (RMSD), root-mean-square fluctuation (RMSF), and hydrogen bond analysis strongly support these findings. Furthermore, the MM-PBSA free energy calculations revealed that these chemical molecules have stable and favorable energies, resulting in a strong binding with Mpro's binding site. This study's extensive computational and statistical analyses indicate that the selected candidates may be used as potential inhibitors against the SARS-CoV-2 in-silico environment. However, additional in-vitro, in-vivo, and clinical trials are required to demonstrate their true efficacy.
Collapse
Affiliation(s)
- Amir Hossein Arshia
- CSE and IT Department; School of Electrical Engineering and Computer; Shiraz University, Shiraz, Iran
| | - Shayan Shadravan
- CSE and IT Department; School of Electrical Engineering and Computer; Shiraz University, Shiraz, Iran
| | - Aida Solhjoo
- Department of Medicinal Chemistry, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Amirhossein Sakhteman
- Department of Medicinal Chemistry, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran; Institute of Biomedicine, University of Eastern Finland, Kuopio, Finland
| | - Ashkan Sami
- CSE and IT Department; School of Electrical Engineering and Computer; Shiraz University, Shiraz, Iran.
| |
Collapse
|
40
|
Deshmukh MG, Ippolito JA, Zhang CH, Stone EA, Reilly RA, Miller SJ, Jorgensen WL, Anderson KS. Structure-guided design of a perampanel-derived pharmacophore targeting the SARS-CoV-2 main protease. Structure 2021; 29:823-833.e5. [PMID: 34161756 PMCID: PMC8218531 DOI: 10.1016/j.str.2021.06.002] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Revised: 05/17/2021] [Accepted: 06/02/2021] [Indexed: 12/12/2022]
Abstract
There is a clinical need for direct-acting antivirals targeting SARS-CoV-2, the coronavirus responsible for the COVID-19 pandemic, to complement current therapeutic strategies. The main protease (Mpro) is an attractive target for antiviral therapy. However, the vast majority of protease inhibitors described thus far are peptidomimetic and bind to the active-site cysteine via a covalent adduct, which is generally pharmacokinetically unfavorable. We have reported the optimization of an existing FDA-approved chemical scaffold, perampanel, to bind to and inhibit Mpro noncovalently with IC50s in the low-nanomolar range and EC50s in the low-micromolar range. Here, we present nine crystal structures of Mpro bound to a series of perampanel analogs, providing detailed structural insights into their mechanism of action and structure-activity relationship. These insights further reveal strategies for pursuing rational inhibitor design efforts in the context of considerable active-site flexibility and potential resistance mechanisms.
Collapse
Affiliation(s)
- Maya G Deshmukh
- Medical Scientist Training Program (MD-PhD), Yale School of Medicine, New Haven, CT, USA; Department of Pharmacology, Yale University School of Medicine, New Haven, CT 06520-8066, USA
| | - Joseph A Ippolito
- Department of Chemistry, Yale University, New Haven, CT 06520-8107, USA
| | - Chun-Hui Zhang
- Department of Chemistry, Yale University, New Haven, CT 06520-8107, USA
| | - Elizabeth A Stone
- Department of Chemistry, Yale University, New Haven, CT 06520-8107, USA
| | - Raquel A Reilly
- Department of Pharmacology, Yale University School of Medicine, New Haven, CT 06520-8066, USA
| | - Scott J Miller
- Department of Chemistry, Yale University, New Haven, CT 06520-8107, USA
| | | | - Karen S Anderson
- Department of Pharmacology, Yale University School of Medicine, New Haven, CT 06520-8066, USA; Department of Molecular Biophysics and Biochemistry, Yale University School of Medicine, New Haven, CT 06520-8066, USA.
| |
Collapse
|
41
|
Wang SH, Zhou Q, Yang M, Zhang YD. ADVIAN: Alzheimer's Disease VGG-Inspired Attention Network Based on Convolutional Block Attention Module and Multiple Way Data Augmentation. Front Aging Neurosci 2021; 13:687456. [PMID: 34220487 PMCID: PMC8250430 DOI: 10.3389/fnagi.2021.687456] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 05/18/2021] [Indexed: 11/23/2022] Open
Abstract
Aim: Alzheimer's disease is a neurodegenerative disease that causes 60-70% of all cases of dementia. This study is to provide a novel method that can identify AD more accurately. Methods: We first propose a VGG-inspired network (VIN) as the backbone network and investigate the use of attention mechanisms. We proposed an Alzheimer's Disease VGG-Inspired Attention Network (ADVIAN), where we integrate convolutional block attention modules on a VIN backbone. Also, 18-way data augmentation is proposed to avoid overfitting. Ten runs of 10-fold cross-validation are carried out to report the unbiased performance. Results: The sensitivity and specificity reach 97.65 ± 1.36 and 97.86 ± 1.55, respectively. Its precision and accuracy are 97.87 ± 1.53 and 97.76 ± 1.13, respectively. The F1 score, MCC, and FMI are obtained as 97.75 ± 1.13, 95.53 ± 2.27, and 97.76 ± 1.13, respectively. The AUC is 0.9852. Conclusion: The proposed ADVIAN gives better results than 11 state-of-the-art methods. Besides, experimental results demonstrate the effectiveness of 18-way data augmentation.
Collapse
Affiliation(s)
- Shui-Hua Wang
- Key Laboratory of Child Development and Learning Science (Southeast University), Ministry of Education, Nanjing, China
- School of Mathematics and Actuarial Science, University of Leicester, Leicester, United Kingdom
| | - Qinghua Zhou
- School of Informatics, University of Leicester, Leicester, United Kingdom
| | - Ming Yang
- Department of Radiology, Children's Hospital of Nanjing Medical University, Nanjing, China
| | - Yu-Dong Zhang
- Key Laboratory of Child Development and Learning Science (Southeast University), Ministry of Education, Nanjing, China
- School of Informatics, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
42
|
Druchok M, Yarish D, Garkot S, Nikolaienko T, Gurbych O. Ensembling machine learning models to boost molecular affinity prediction. Comput Biol Chem 2021; 93:107529. [PMID: 34192653 DOI: 10.1016/j.compbiolchem.2021.107529] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 06/01/2021] [Accepted: 06/08/2021] [Indexed: 02/01/2023]
Abstract
This study unites six popular machine learning approaches to enhance the prediction of a molecular binding affinity between receptors (large protein molecules) and ligands (small organic molecules). Here we examine a scheme where affinity of ligands is predicted against a single receptor - human thrombin, thus, the models consider ligand features only. However, the suggested approach can be repurposed for other receptors. The methods include Support Vector Machine, Random Forest, CatBoost, feed-forward neural network, graph neural network, and Bidirectional Encoder Representations from Transformers. The first five methods use input features based on physico-chemical properties of molecules, while the last one is based on textual molecular representations. All approaches do not rely on atomic spatial coordinates, avoiding a potential bias from known structures, and are capable of generalizing for compounds with unknown conformations. Within each of the methods, we have trained two models that solve classification and regression tasks. Then, all models are grouped into a pipeline of two subsequent ensembles. The first ensemble aggregates six classification models which vote whether a ligand binds to a receptor or not. If a ligand is classified as active (i.e., binds), the second ensemble predicts its binding affinity in terms of the inhibition constant Ki.
Collapse
Affiliation(s)
- Maksym Druchok
- SoftServe, Inc., 2d Sadova Str., 79021 Lviv, Ukraine; Institute for Condensed Matter Physics, NAS of Ukraine, 1 Svientsitskii Str., 79011 Lviv, Ukraine.
| | | | - Sofiya Garkot
- SoftServe, Inc., 2d Sadova Str., 79021 Lviv, Ukraine; Ukrainian Catholic University, 17 Svientsitskii Str., 79011 Lviv, Ukraine
| | - Tymofii Nikolaienko
- SoftServe, Inc., 2d Sadova Str., 79021 Lviv, Ukraine; Taras Shevchenko National University of Kyiv, 64/13, Volodymyrska Str., 01601 Kyiv, Ukraine
| | - Oleksandr Gurbych
- SoftServe, Inc., 2d Sadova Str., 79021 Lviv, Ukraine; Lviv Polytechnic National University, 5 Kniazia Romana Str., 79005 Lviv, Ukraine
| |
Collapse
|