1
|
Zhang D, Meng Q, Guo F. Incorporating Water Molecules into Highly Accurate Binding Affinity Prediction for Proteins and Ligands. Int J Mol Sci 2024; 25:12676. [PMID: 39684398 DOI: 10.3390/ijms252312676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 11/16/2024] [Accepted: 11/24/2024] [Indexed: 12/18/2024] Open
Abstract
In the binding process between proteins and ligand molecules, water molecules play a pivotal role by forming hydrogen bonds that enable proteins and ligand molecules to bind more strongly. However, current methodologies for predicting binding affinity overlook the importance of water molecules. Therefore, we developed a model called GraphWater-Net, specifically designed for predicting protein-ligand binding affinity, by incorporating water molecules. GraphWater-Net employs topological structures to represent protein atoms, ligand atoms and water molecules, and their interactions. Leveraging the Graphormer network, the model extracts interaction features between nodes within the topology, alongside the interaction features of edges and nodes. Subsequently, it generates embeddings with attention weights, inputs them into a Softmax function for regression prediction, and ultimately outputs the predicted binding affinity value. Experimental results on the Comparative Assessment of Scoring Functions (CASF) 2016 test set show that the introduction of water molecules into the complex significantly improves the prediction performance of the proposed model for protein and ligand binding affinity. Specifically, the Pearson correlation coefficient (Rp) exceeds that of current state-of-the-art methods by a margin of 0.022 to 0.129. By integrating water molecules, GraphWater-Net has the potential to facilitate the rational design of protein-ligand interactions and aid in drug discovery.
Collapse
Affiliation(s)
- Diya Zhang
- School of Computer Science and Engineering, Central South University, Changsha 410000, China
| | - Qiaozhen Meng
- School of Computer Science, Xiangtan University, Xiangtan 411105, China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha 410000, China
| |
Collapse
|
2
|
Li X, Liu S, Liu D, Yu M, Wu X, Wang H. Application of Virtual Drug Study to New Drug Research and Development: Challenges and Opportunity. Clin Pharmacokinet 2024; 63:1239-1249. [PMID: 39225885 DOI: 10.1007/s40262-024-01416-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/15/2024] [Indexed: 09/04/2024]
Abstract
In recent years, virtual drug study, as an emerging research strategy, has become increasingly important in guiding and promoting new drug research and development. Researchers can integrate a variety of technical methods to improve the efficiency of all phases of new drug research and development, including the use of artificial intelligence, modeling and simulation for target identification, compound screening and pharmacokinetic characteristics evaluation, and the application of clinical trial simulation to carry out clinical research. This paper aims to elaborate on the application of virtual drug study in the key stages of new drug research and development and discuss the opportunities and challenges it faces in supporting new drug research and development.
Collapse
Affiliation(s)
- Xiuqi Li
- Clinical Pharmacology Research Center, Peking Union Medical College Hospital, State Key Laboratory of Complex Severe and Rare Diseases, NMPA Key Laboratory for Clinical Research and Evaluation of Drug, Beijing Key Laboratory of Clinical PK & PD Investigation for Innovative Drugs, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, China
| | - Shupeng Liu
- Clinical Pharmacology Research Center, Peking Union Medical College Hospital, State Key Laboratory of Complex Severe and Rare Diseases, NMPA Key Laboratory for Clinical Research and Evaluation of Drug, Beijing Key Laboratory of Clinical PK & PD Investigation for Innovative Drugs, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, China
| | - Dan Liu
- College of Pharmacy, Shenyang Pharmaceutical University, Shenyang, 110016, Liaoning, China
| | - Mengyang Yu
- Clinical Pharmacology Research Center, Peking Union Medical College Hospital, State Key Laboratory of Complex Severe and Rare Diseases, NMPA Key Laboratory for Clinical Research and Evaluation of Drug, Beijing Key Laboratory of Clinical PK & PD Investigation for Innovative Drugs, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, China
| | - Xiaofei Wu
- Clinical Pharmacology Research Center, Peking Union Medical College Hospital, State Key Laboratory of Complex Severe and Rare Diseases, NMPA Key Laboratory for Clinical Research and Evaluation of Drug, Beijing Key Laboratory of Clinical PK & PD Investigation for Innovative Drugs, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, China
| | - Hongyun Wang
- Clinical Pharmacology Research Center, Peking Union Medical College Hospital, State Key Laboratory of Complex Severe and Rare Diseases, NMPA Key Laboratory for Clinical Research and Evaluation of Drug, Beijing Key Laboratory of Clinical PK & PD Investigation for Innovative Drugs, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, China.
| |
Collapse
|
3
|
Marciniec K, Nowakowska J, Chrobak E, Bębenek E, Latocha M. Synthesis, Docking, and Machine Learning Studies of Some Novel Quinolinesulfonamides-Triazole Hybrids with Anticancer Activity. Molecules 2024; 29:3158. [PMID: 38999109 PMCID: PMC11243625 DOI: 10.3390/molecules29133158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 06/18/2024] [Accepted: 06/28/2024] [Indexed: 07/14/2024] Open
Abstract
In the presented work, a series of 22 hybrids of 8-quinolinesulfonamide and 1,4-disubstituted triazole with antiproliferative activity were designed and synthesised. The title compounds were designed using molecular modelling techniques. For this purpose, machine-learning, molecular docking, and molecular dynamics methods were used. Calculations of the pharmacokinetic parameters (connected with absorption, distribution, metabolism, excretion, and toxicity) of the hybrids were also performed. The new compounds were synthesised via a copper-catalysed azide-alkyne cycloaddition reaction (CuAAC). 8-N-Methyl-N-{[1-(7-chloroquinolin-4-yl)-1H-1,2,3-triazol-4-yl]methyl}quinolinesulfonamide was identified in in silico studies as a potential strong inhibitor of Rho-associated protein kinase and as a compound that has an appropriate pharmacokinetic profile. The results obtained from in vitro experiments confirm the cytotoxicity of derivative 9b in four selected cancer cell lines and the lack of cytotoxicity of this derivative towards normal cells. The results obtained from silico and in vitro experiments indicate that the introduction of another quinolinyl fragment into the inhibitor molecule may have a significant impact on increasing the level of cytotoxicity toward cancer cells and indicate a further direction for future research in order to find new substances suitable for clinical applications in cancer treatment.
Collapse
Affiliation(s)
- Krzysztof Marciniec
- Department of Organic Chemistry, Medical University of Silesia, Jagiellońska 4, 41-200 Sosnowiec, Poland; (J.N.); (E.C.); (E.B.)
| | - Justyna Nowakowska
- Department of Organic Chemistry, Medical University of Silesia, Jagiellońska 4, 41-200 Sosnowiec, Poland; (J.N.); (E.C.); (E.B.)
| | - Elwira Chrobak
- Department of Organic Chemistry, Medical University of Silesia, Jagiellońska 4, 41-200 Sosnowiec, Poland; (J.N.); (E.C.); (E.B.)
| | - Ewa Bębenek
- Department of Organic Chemistry, Medical University of Silesia, Jagiellońska 4, 41-200 Sosnowiec, Poland; (J.N.); (E.C.); (E.B.)
| | - Małgorzata Latocha
- Department of Molecular Biology, Jagiellońska 4, 41-200 Sosnowiec, Poland;
| |
Collapse
|
4
|
Koroleva EV, Ermolinskaya AL, Ignatovich ZV, Kornoushenko YV, Panibrat AV, Potkin VI, Andrianov AM. Design, in silico Evaluation, and Determination of Antitumor Activity of Potential Inhibitors Against Protein Kinases: Application to BCR-ABL Tyrosine Kinase. BIOCHEMISTRY. BIOKHIMIIA 2024; 89:1094-1108. [PMID: 38981703 DOI: 10.1134/s0006297924060099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 05/07/2024] [Accepted: 05/11/2024] [Indexed: 07/11/2024]
Abstract
Despite significant progress made over the past two decades in the treatment of chronic myeloid leukemia (CML), there is still an unmet need for effective and safe agents to treat patients with resistance and intolerance to the drugs used in clinic. In this work, we designed 2-arylaminopyrimidine amides of isoxazole-3-carboxylic acid, assessed in silico their inhibitory potential against Bcr-Abl tyrosine kinase, and determined their antitumor activity in K562 (CML), HL-60 (acute promyelocytic leukemia), and HeLa (cervical cancer) cells. Based on the analysis of computational and experimental data, three compounds with the antitumor activity against K562 and HL-60 cells were identified. The lead compound efficiently suppressed the growth of these cells, as evidenced by the low IC50 values of 2.8 ± 0.8 μM (K562) and 3.5 ± 0.2 μM (HL-60). The obtained compounds represent promising basic structures for the design of novel, effective, and safe anticancer drugs able to inhibit the catalytic activity of Bcr-Abl kinase by blocking the ATP-binding site of the enzyme.
Collapse
Affiliation(s)
- Elena V Koroleva
- Institute of Chemistry of New Materials, National Academy of Sciences of Belarus, Minsk, 220141, Republic of Belarus
| | - Anastasiya L Ermolinskaya
- Institute of Chemistry of New Materials, National Academy of Sciences of Belarus, Minsk, 220141, Republic of Belarus
| | - Zhanna V Ignatovich
- Institute of Chemistry of New Materials, National Academy of Sciences of Belarus, Minsk, 220141, Republic of Belarus
| | - Yury V Kornoushenko
- Institute of Bioorganic Chemistry, National Academy of Sciences of Belarus, Minsk, 220141, Republic of Belarus
| | - Alesia V Panibrat
- Institute of Bioorganic Chemistry, National Academy of Sciences of Belarus, Minsk, 220141, Republic of Belarus
| | - Vladimir I Potkin
- Institute of Physical Organic Chemistry, National Academy of Sciences of Belarus, Minsk, 220072, Republic of Belarus
| | - Alexander M Andrianov
- Institute of Bioorganic Chemistry, National Academy of Sciences of Belarus, Minsk, 220141, Republic of Belarus.
| |
Collapse
|
5
|
Siddiqui AJ, Jamal A, Zafar M, Jahan S. Identification of TBK1 inhibitors against breast cancer using a computational approach supported by machine learning. Front Pharmacol 2024; 15:1342392. [PMID: 38567349 PMCID: PMC10985244 DOI: 10.3389/fphar.2024.1342392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 03/07/2024] [Indexed: 04/04/2024] Open
Abstract
Introduction: The cytosolic Ser/Thr kinase TBK1 is of utmost importance in facilitating signals that facilitate tumor migration and growth. TBK1-related signaling plays important role in tumor progression, and there is need to work on new methods and workflows to identify new molecules for potential treatments for TBK1-affecting oncologies such as breast cancer. Methods: Here, we propose the machine learning assisted computational drug discovery approach to identify TBK1 inhibitors. Through our computational ML-integrated approach, we identified four novel inhibitors that could be used as new hit molecules for TBK1 inhibition. Results and Discussion: All these four molecules displayed solvent based free energy values of -48.78, -47.56, -46.78 and -45.47 Kcal/mol and glide docking score of -10.4, -9.84, -10.03, -10.06 Kcal/mol respectively. The molecules displayed highly stable RMSD plots, hydrogen bond patterns and MMPBSA score close to or higher than BX795 molecule. In future, all these compounds can be further refined or validated by in vitro as well as in vivo activity. Also, we have found two novel groups that have the potential to be utilized in a fragment-based design strategy for the discovery and development of novel inhibitors targeting TBK1. Our method for identifying small molecule inhibitors can be used to make fundamental advances in drug design methods for the TBK1 protein which will further help to reduce breast cancer incidence.
Collapse
Affiliation(s)
- Arif Jamal Siddiqui
- Department of Biology, College of Science, University of Ha’il, Ha’il, Saudi Arabia
| | - Arshad Jamal
- Department of Biology, College of Science, University of Ha’il, Ha’il, Saudi Arabia
| | - Mubashir Zafar
- Department of Family and Community Medicine, College of Medicine, University of Ha’il, Ha’il, Saudi Arabia
| | - Sadaf Jahan
- Department of Medical Laboratory Sciences, College of Applied Medical Sciences, Majmaah University, Al Majmaah, Saudi Arabia
| |
Collapse
|
6
|
Cai H, Shen C, Jian T, Zhang X, Chen T, Han X, Yang Z, Dang W, Hsieh CY, Kang Y, Pan P, Ji X, Song J, Hou T, Deng Y. CarsiDock: a deep learning paradigm for accurate protein-ligand docking and screening based on large-scale pre-training. Chem Sci 2024; 15:1449-1471. [PMID: 38274053 PMCID: PMC10806797 DOI: 10.1039/d3sc05552c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 12/18/2023] [Indexed: 01/27/2024] Open
Abstract
The expertise accumulated in deep neural network-based structure prediction has been widely transferred to the field of protein-ligand binding pose prediction, thus leading to the emergence of a variety of deep learning-guided docking models for predicting protein-ligand binding poses without relying on heavy sampling. However, their prediction accuracy and applicability are still far from satisfactory, partially due to the lack of protein-ligand binding complex data. To this end, we create a large-scale complex dataset containing ∼9 M protein-ligand docking complexes for pre-training, and propose CarsiDock, the first deep learning-guided docking approach that leverages pre-training of millions of predicted protein-ligand complexes. CarsiDock contains two main stages, i.e., a deep learning model for the prediction of protein-ligand atomic distance matrices, and a translation, rotation and torsion-guided geometry optimization procedure to reconstruct the matrices into a credible binding pose. The pre-training and multiple innovative architectural designs facilitate the dramatically improved docking accuracy of our approach over the baselines in terms of multiple docking scenarios, thereby contributing to its outstanding early recognition performance in several retrospective virtual screening campaigns. Further explorations demonstrate that CarsiDock can not only guarantee the topological reliability of the binding poses but also successfully reproduce the crucial interactions in crystalized structures, highlighting its superior applicability.
Collapse
Affiliation(s)
- Heng Cai
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Chao Shen
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tianye Jian
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tong Chen
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Xiaoqi Han
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Zhuo Yang
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Wei Dang
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Chang-Yu Hsieh
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xiangyang Ji
- Department of Automation, Tsinghua University Beijing 100084 China
| | - Jianfei Song
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Tingjun Hou
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| |
Collapse
|
7
|
Gómez-García A, Jiménez DAA, Zamora WJ, Barazorda-Ccahuana HL, Chávez-Fumagalli MÁ, Valli M, Andricopulo AD, Bolzani VDS, Olmedo DA, Solís PN, Núñez MJ, Rodríguez Pérez JR, Valencia Sánchez HA, Cortés Hernández HF, Medina-Franco JL. Navigating the Chemical Space and Chemical Multiverse of a Unified Latin American Natural Product Database: LANaPDB. Pharmaceuticals (Basel) 2023; 16:1388. [PMID: 37895859 PMCID: PMC10609821 DOI: 10.3390/ph16101388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 09/22/2023] [Accepted: 09/26/2023] [Indexed: 10/29/2023] Open
Abstract
The number of databases of natural products (NPs) has increased substantially. Latin America is extraordinarily rich in biodiversity, enabling the identification of novel NPs, which has encouraged both the development of databases and the implementation of those that are being created or are under development. In a collective effort from several Latin American countries, herein we introduce the first version of the Latin American Natural Products Database (LANaPDB), a public compound collection that gathers the chemical information of NPs contained in diverse databases from this geographical region. The current version of LANaPDB unifies the information from six countries and contains 12,959 chemical structures. The structural classification showed that the most abundant compounds are the terpenoids (63.2%), phenylpropanoids (18%) and alkaloids (11.8%). From the analysis of the distribution of properties of pharmaceutical interest, it was observed that many LANaPDB compounds satisfy some drug-like rules of thumb for physicochemical properties. The concept of the chemical multiverse was employed to generate multiple chemical spaces from two different fingerprints and two dimensionality reduction techniques. Comparing LANaPDB with FDA-approved drugs and the major open-access repository of NPs, COCONUT, it was concluded that the chemical space covered by LANaPDB completely overlaps with COCONUT and, in some regions, with FDA-approved drugs. LANaPDB will be updated, adding more compounds from each database, plus the addition of databases from other Latin American countries.
Collapse
Affiliation(s)
- Alejandro Gómez-García
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México Avenida Universidad 3000, Mexico City 04510, Mexico;
| | - Daniel A. Acuña Jiménez
- CBio3 Laboratory, School of Chemistry, University of Costa Rica, San Pedro, San José 11501-2060, Costa Rica; (D.A.A.J.); (W.J.Z.)
| | - William J. Zamora
- CBio3 Laboratory, School of Chemistry, University of Costa Rica, San Pedro, San José 11501-2060, Costa Rica; (D.A.A.J.); (W.J.Z.)
- Laboratory of Computational Toxicology and Artificial Intelligence (LaToxCIA), Biological Testing Laboratory (LEBi), University of Costa Rica, San Pedro, San José 11501-2060, Costa Rica
- Advanced Computing Lab (CNCA), National High Technology Center (CeNAT), Pavas, San José 1174-1200, Costa Rica
| | - Haruna L. Barazorda-Ccahuana
- Computational Biology and Chemistry Research Group, Vicerrectorado de Investigación, Universidad Católica de Santa Maria, Arequipa 04000, Peru; (H.L.B.-C.); (M.Á.C.-F.)
| | - Miguel Á. Chávez-Fumagalli
- Computational Biology and Chemistry Research Group, Vicerrectorado de Investigación, Universidad Católica de Santa Maria, Arequipa 04000, Peru; (H.L.B.-C.); (M.Á.C.-F.)
| | - Marilia Valli
- Laboratory of Medicinal and Computational Chemistry (LQMC), Centre for Research and Innovation in Biodiversity and Drug Discovery (CIBFar), São Carlos Institute of Physics (IFSC), University of São Paulo (USP), Av. João Dagnone, 1100, São Carlos 13563-120, SP, Brazil; (M.V.); (A.D.A.)
| | - Adriano D. Andricopulo
- Laboratory of Medicinal and Computational Chemistry (LQMC), Centre for Research and Innovation in Biodiversity and Drug Discovery (CIBFar), São Carlos Institute of Physics (IFSC), University of São Paulo (USP), Av. João Dagnone, 1100, São Carlos 13563-120, SP, Brazil; (M.V.); (A.D.A.)
| | - Vanderlan da S. Bolzani
- Nuclei of Bioassays, Biosynthesis and Ecophysiology of Natural Products (NuBBE), Department of Organic Chemistry, Institute of Chemistry, São Paulo State University (UNESP), Av. Prof. Francisco Degni, 55, Araraquara 14800-900, SP, Brazil;
| | - Dionisio A. Olmedo
- Center for Pharmacognostic Research on Panamanian Flora (CIFLORPAN), College of Pharmacy, University of Panama, Av. Manuel E. Batista and Jose De Fabrega, Panama City 3366, Panama; (D.A.O.); (P.N.S.)
| | - Pablo N. Solís
- Center for Pharmacognostic Research on Panamanian Flora (CIFLORPAN), College of Pharmacy, University of Panama, Av. Manuel E. Batista and Jose De Fabrega, Panama City 3366, Panama; (D.A.O.); (P.N.S.)
| | - Marvin J. Núñez
- Natural Product Research Laboratory, School of Chemistry and Pharmacy, University of El Salvador, Final Ave. Mártires Estudiantes del 30 de Julio, San Salvador 01101, El Salvador;
| | - Johny R. Rodríguez Pérez
- GIFES Research Group, School of Chemistry Technology, Universidad Tecnológica de Pereira, Pereira 660003, Colombia; (J.R.R.P.); (H.A.V.S.); (H.F.C.H.)
- GIEPRONAL Research Group, School of Basic Sciences, Technology and Engineering, Universidad Nacional Abierta y a Distancia, Dosquebradas 661001, Colombia
| | - Hoover A. Valencia Sánchez
- GIFES Research Group, School of Chemistry Technology, Universidad Tecnológica de Pereira, Pereira 660003, Colombia; (J.R.R.P.); (H.A.V.S.); (H.F.C.H.)
| | - Héctor F. Cortés Hernández
- GIFES Research Group, School of Chemistry Technology, Universidad Tecnológica de Pereira, Pereira 660003, Colombia; (J.R.R.P.); (H.A.V.S.); (H.F.C.H.)
| | - José L. Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México Avenida Universidad 3000, Mexico City 04510, Mexico;
| |
Collapse
|
8
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
9
|
Shen C, Zhang X, Hsieh CY, Deng Y, Wang D, Xu L, Wu J, Li D, Kang Y, Hou T, Pan P. A generalized protein-ligand scoring framework with balanced scoring, docking, ranking and screening powers. Chem Sci 2023; 14:8129-8146. [PMID: 37538816 PMCID: PMC10395315 DOI: 10.1039/d3sc02044d] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 07/03/2023] [Indexed: 08/05/2023] Open
Abstract
Applying machine learning algorithms to protein-ligand scoring functions has aroused widespread attention in recent years due to the high predictive accuracy and affordable computational cost. Nevertheless, most machine learning-based scoring functions are only applicable to a specific task, e.g., binding affinity prediction, binding pose prediction or virtual screening, suggesting that the development of a scoring function with balanced performance in all critical tasks remains a grand challenge. To this end, we propose a novel parameterization strategy by introducing an adjustable binding affinity term that represents the correlation between the predicted outcomes and experimental data into the training of mixture density network. The resulting residue-atom distance likelihood potential not only retains the superior docking and screening power over all the other state-of-the-art approaches, but also achieves a remarkable improvement in scoring and ranking performance. We emphatically explore the impacts of several key elements on prediction accuracy as well as the task preference, and demonstrate that the performance of scoring/ranking and docking/screening tasks of a certain model could be well balanced through an appropriate manner. Overall, our study highlights the potential utility of our innovative parameterization strategy as well as the resulting scoring framework in future structure-based drug design.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- State Key Lab of CAD&CG, Zhejiang University Hangzhou 310058 Zhejiang China
- School of Public Health, Zhejiang University Hangzhou 310058 Zhejiang China
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology Changzhou 213001 China
| | - Jian Wu
- School of Public Health, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- State Key Lab of CAD&CG, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
10
|
Koroleva EV, Kornoushenko YV, Karpenko AD, Bosko IP, Siniutsich JV, Ignatovich ZV, Andrianov AM. In silico design and computational evaluation of novel 2-arylaminopyrimidine-based compounds as potential multi-targeted protein kinase inhibitors: application for the native and mutant (T315I) Bcr-Abl tyrosine kinase. J Biomol Struct Dyn 2023; 41:4065-4080. [PMID: 35470777 DOI: 10.1080/07391102.2022.2062784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 04/02/2022] [Indexed: 10/18/2022]
Abstract
An integrated computational approach to drug discovery was used to identify novel potential inhibitors of the native and mutant (T315I) Bcr-Abl tyrosine kinase, the enzyme playing a key role in the pathogenesis of chronic myeloid leukemia (CML). This approach included i) design of chimeric molecules based on the 2-arylaminopyrimidine fragment, the main pharmacophore of the Abl kinase inhibitors imatinib and nilotinib used in the clinic for the CML treatment, ii) molecular docking of these compounds with the ATP-binding site of the native and mutant Abl kinase, iii) refinement of the ligand-binding poses by the quantum chemical method PM7, iv) molecular dynamics simulations of the ligand/Abl complexes, and v) prediction of the ligand/Abl binding affinity in terms of scoring functions of molecular docking, machine learning, quantum chemistry, and molecular dynamics. As a result, five top-ranking compounds able to effectively block the enzyme catalytic site were identified. According to the data obtained, these compounds exhibit close modes of binding to the Abl kinase active site that are mainly provided by hydrogen bonds and multiple van der Waals contacts. The identified compounds show high binding affinity to the native and mutant Abl kinase comparable with the one calculated for the FDA-approved kinase-targeted inhibitors imatinib, nilotinib, and ponatinib used in the calculations as a positive control. The results obtained testify to the predicted drug candidates against CML may serve as good scaffolds for the design of novel anticancer agents able to target the ATP-binding pocket of the native and mutant Abl kinase.Communicated by Ramaswamy H. Sarma.
Collapse
MESH Headings
- Humans
- Adenosine Triphosphate/metabolism
- Antineoplastic Agents/chemistry
- Antineoplastic Agents/pharmacology
- Catalytic Domain
- Computer Simulation
- Drug Design
- Fusion Proteins, bcr-abl/antagonists & inhibitors
- Fusion Proteins, bcr-abl/genetics
- Hydrogen Bonding
- Imatinib Mesylate/pharmacology
- Leukemia, Myelogenous, Chronic, BCR-ABL Positive/enzymology
- Leukemia, Myelogenous, Chronic, BCR-ABL Positive/genetics
- Leukemia, Myelogenous, Chronic, BCR-ABL Positive/pathology
- Ligands
- Machine Learning
- Molecular Docking Simulation
- Molecular Dynamics Simulation
- Mutant Proteins/antagonists & inhibitors
- Mutant Proteins/genetics
- Mutation
- Protein Kinase Inhibitors/chemistry
- Protein Kinase Inhibitors/pharmacology
- Pyrimidines/chemistry
- Pyrimidines/pharmacology
Collapse
Affiliation(s)
- Elena V Koroleva
- Institute of Chemistry of New Materials, National Academy of Sciences of Belarus, Minsk, Republic of Belarus
| | - Yuri V Kornoushenko
- Institute of Bioorganic Chemistry, National Academy of Sciences of Belarus, Minsk, Republic of Belarus
| | - Anna D Karpenko
- United Institute of Informatics Problems, National Academy of Sciences of Belarus, Minsk, Republic of Belarus
| | - Ivan P Bosko
- United Institute of Informatics Problems, National Academy of Sciences of Belarus, Minsk, Republic of Belarus
| | - Julia V Siniutsich
- Institute of Chemistry of New Materials, National Academy of Sciences of Belarus, Minsk, Republic of Belarus
| | - Zhanna V Ignatovich
- Institute of Chemistry of New Materials, National Academy of Sciences of Belarus, Minsk, Republic of Belarus
| | - Alexander M Andrianov
- Institute of Bioorganic Chemistry, National Academy of Sciences of Belarus, Minsk, Republic of Belarus
| |
Collapse
|
11
|
Andrianov AM, Shuldau MA, Furs KV, Yushkevich AM, Tuzikov AV. AI-Driven De Novo Design and Molecular Modeling for Discovery of Small-Molecule Compounds as Potential Drug Candidates Targeting SARS-CoV-2 Main Protease. Int J Mol Sci 2023; 24:ijms24098083. [PMID: 37175788 PMCID: PMC10178971 DOI: 10.3390/ijms24098083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 04/21/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023] Open
Abstract
Over the past three years, significant progress has been made in the development of novel promising drug candidates against COVID-19. However, SARS-CoV-2 mutations resulting in the emergence of new viral strains that can be resistant to the drugs used currently in the clinic necessitate the development of novel potent and broad therapeutic agents targeting different vulnerable spots of the viral proteins. In this study, two deep learning generative models were developed and used in combination with molecular modeling tools for de novo design of small molecule compounds that can inhibit the catalytic activity of SARS-CoV-2 main protease (Mpro), an enzyme critically important for mediating viral replication and transcription. As a result, the seven best scoring compounds that exhibited low values of binding free energy comparable with those calculated for two potent inhibitors of Mpro, via the same computational protocol, were selected as the most probable inhibitors of the enzyme catalytic site. In light of the data obtained, the identified compounds are assumed to present promising scaffolds for the development of new potent and broad-spectrum drugs inhibiting SARS-CoV-2 Mpro, an attractive therapeutic target for anti-COVID-19 agents.
Collapse
Affiliation(s)
- Alexander M Andrianov
- Institute of Bioorganic Chemistry, National Academy of Sciences of Belarus, 220141 Minsk, Belarus
| | - Mikita A Shuldau
- United Institute of Informatics Problems, National Academy of Sciences of Belarus, 220012 Minsk, Belarus
| | - Konstantin V Furs
- United Institute of Informatics Problems, National Academy of Sciences of Belarus, 220012 Minsk, Belarus
| | - Artsemi M Yushkevich
- United Institute of Informatics Problems, National Academy of Sciences of Belarus, 220012 Minsk, Belarus
| | - Alexander V Tuzikov
- United Institute of Informatics Problems, National Academy of Sciences of Belarus, 220012 Minsk, Belarus
| |
Collapse
|
12
|
Gu S, Shen C, Yu J, Zhao H, Liu H, Liu L, Sheng R, Xu L, Wang Z, Hou T, Kang Y. Can molecular dynamics simulations improve predictions of protein-ligand binding affinity with machine learning? Brief Bioinform 2023; 24:6995375. [PMID: 36681903 DOI: 10.1093/bib/bbad008] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 12/04/2022] [Accepted: 12/30/2023] [Indexed: 01/23/2023] Open
Abstract
Binding affinity prediction largely determines the discovery efficiency of lead compounds in drug discovery. Recently, machine learning (ML)-based approaches have attracted much attention in hopes of enhancing the predictive performance of traditional physics-based approaches. In this study, we evaluated the impact of structural dynamic information on the binding affinity prediction by comparing the models trained on different dimensional descriptors, using three targets (i.e. JAK1, TAF1-BD2 and DDR1) and their corresponding ligands as the examples. Here, 2D descriptors are traditional ECFP4 fingerprints, 3D descriptors are the energy terms of the Smina and NNscore scoring functions and 4D descriptors contain the structural dynamic information derived from the trajectories based on molecular dynamics (MD) simulations. We systematically investigate the MD-refined binding affinity prediction performance of three classical ML algorithms (i.e. RF, SVR and XGB) as well as two common virtual screening methods, namely Glide docking and MM/PBSA. The outcomes of the ML models built using various dimensional descriptors and their combinations reveal that the MD refinement with the optimized protocol can improve the predictive performance on the TAF1-BD2 target with considerable structural flexibility, but not for the less flexible JAK1 and DDR1 targets, when taking docking poses as the initial structure instead of the crystal structures. The results highlight the importance of the initial structures to the final performance of the model through conformational analysis on the three targets with different flexibility.
Collapse
Affiliation(s)
- Shukai Gu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jiahui Yu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Hong Zhao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, Macao, SAR, China
| | - Liwei Liu
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd., Shenzhen 518129, Guangdong, China
| | - Rong Sheng
- Health Technology Development Dept, Huawei Device Co., Ltd., Dongguan 523808, Guangdong, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
13
|
Jiang D, Ye Z, Hsieh CY, Yang Z, Zhang X, Kang Y, Du H, Wu Z, Wang J, Zeng Y, Zhang H, Wang X, Wang M, Yao X, Zhang S, Wu J, Hou T. MetalProGNet: a structure-based deep graph model for metalloprotein-ligand interaction predictions. Chem Sci 2023; 14:2054-2069. [PMID: 36845922 PMCID: PMC9945430 DOI: 10.1039/d2sc06576b] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 01/11/2023] [Indexed: 01/21/2023] Open
Abstract
Metalloproteins play indispensable roles in various biological processes ranging from reaction catalysis to free radical scavenging, and they are also pertinent to numerous pathologies including cancer, HIV infection, neurodegeneration, and inflammation. Discovery of high-affinity ligands for metalloproteins powers the treatment of these pathologies. Extensive efforts have been made to develop in silico approaches, such as molecular docking and machine learning (ML)-based models, for fast identification of ligands binding to heterogeneous proteins, but few of them have exclusively concentrated on metalloproteins. In this study, we first compiled the largest metalloprotein-ligand complex dataset containing 3079 high-quality structures, and systematically evaluated the scoring and docking powers of three competitive docking tools (i.e., PLANTS, AutoDock Vina and Glide SP) for metalloproteins. Then, a structure-based deep graph model called MetalProGNet was developed to predict metalloprotein-ligand interactions. In the model, the coordination interactions between metal ions and protein atoms and the interactions between metal ions and ligand atoms were explicitly modelled through graph convolution. The binding features were then predicted by the informative molecular binding vector learned from a noncovalent atom-atom interaction network. The evaluation on the internal metalloprotein test set, the independent ChEMBL dataset towards 22 different metalloproteins and the virtual screening dataset indicated that MetalProGNet outperformed various baselines. Finally, a noncovalent atom-atom interaction masking technique was employed to interpret MetalProGNet, and the learned knowledge accords with our understanding of physics.
Collapse
Affiliation(s)
- Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China .,Tencent Quantum Laboratory, Tencent Shenzhen 518057 Guangdong China .,College of Computer Science and Technology, Zhejiang University Hangzhou 310006 Zhejiang China
| | - Zhaofeng Ye
- Tencent Quantum Laboratory, Tencent Shenzhen 518057 Guangdong China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Ziyi Yang
- Tencent Quantum Laboratory, Tencent Shenzhen 518057 Guangdong China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yundian Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Haotian Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xiaorui Wang
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and TechnologyMacao
| | - Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and TechnologyMacao
| | - Shengyu Zhang
- Tencent Quantum Laboratory, Tencent Shenzhen 518057 Guangdong China
| | - Jian Wu
- College of Computer Science and Technology, Zhejiang University Hangzhou 310006 Zhejiang China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
14
|
Yu Y, Xu S, He R, Liang G. Application of Molecular Simulation Methods in Food Science: Status and Prospects. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:2684-2703. [PMID: 36719790 DOI: 10.1021/acs.jafc.2c06789] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Molecular simulation methods, such as molecular docking, molecular dynamic (MD) simulation, and quantum chemical (QC) calculation, have become popular as characterization and/or virtual screening tools because they can visually display interaction details that in vitro experiments can not capture and quickly screen bioactive compounds from large databases with millions of molecules. Currently, interdisciplinary research has expanded molecular simulation technology from computer aided drug design (CADD) to food science. More food scientists are supporting their hypotheses/results with this technology. To understand better the use of molecular simulation methods, it is necessary to systematically summarize the latest applications and usage trends of molecular simulation methods in the research field of food science. However, this type of review article is rare. To bridge this gap, we have comprehensively summarized the principle, combination usage, and application of molecular simulation methods in food science. We also analyzed the limitations and future trends and offered valuable strategies with the latest technologies to help food scientists use molecular simulation methods.
Collapse
Affiliation(s)
- Yuandong Yu
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing400030, China
| | - Shiqi Xu
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing400030, China
| | - Ran He
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing400030, China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing400030, China
| |
Collapse
|
15
|
New avenues in artificial-intelligence-assisted drug discovery. Drug Discov Today 2023; 28:103516. [PMID: 36736583 DOI: 10.1016/j.drudis.2023.103516] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 12/08/2022] [Accepted: 01/26/2023] [Indexed: 02/05/2023]
Abstract
Over the past decade, the amount of biomedical data available has grown at unprecedented rates. Increased automation technology and larger data volumes have encouraged the use of machine learning (ML) or artificial intelligence (AI) techniques for mining such data and extracting useful patterns. Because the identification of chemical entities with desired biological activity is a crucial task in drug discovery, AI technologies have the potential to accelerate this process and support decision making. In addition, the advent of deep learning (DL) has shown great promise in addressing diverse problems in drug discovery, such as de novo molecular design. Herein, we will appraise the current state-of-the-art in AI-assisted drug discovery, discussing the recent applications covering generative models for chemical structure generation, scoring functions to improve binding affinity and pose prediction, and molecular dynamics to assist in the parametrization, featurization and generalization tasks. Finally, we will discuss current hurdles and the strategies to overcome them, as well as potential future directions.
Collapse
|
16
|
Singh N, Villoutreix BO. A Hybrid Docking and Machine Learning Approach to Enhance the Performance of Virtual Screening Carried out on Protein-Protein Interfaces. Int J Mol Sci 2022; 23:ijms232214364. [PMID: 36430841 PMCID: PMC9694378 DOI: 10.3390/ijms232214364] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 11/11/2022] [Accepted: 11/16/2022] [Indexed: 11/22/2022] Open
Abstract
The modulation of protein-protein interactions (PPIs) by small chemical compounds is challenging. PPIs play a critical role in most cellular processes and are involved in numerous disease pathways. As such, novel strategies that assist the design of PPI inhibitors are of major importance. We previously reported that the knowledge-based DLIGAND2 scoring tool was the best-rescoring function for improving receptor-based virtual screening (VS) performed with the Surflex docking engine applied to several PPI targets with experimentally known active and inactive compounds. Here, we extend our investigation by assessing the vs. potential of other types of scoring functions with an emphasis on docking-pose derived solvent accessible surface area (SASA) descriptors, with or without the use of machine learning (ML) classifiers. First, we explored rescoring strategies of Surflex-generated docking poses with five GOLD scoring functions (GoldScore, ChemScore, ASP, ChemPLP, ChemScore with Receptor Depth Scaling) and with consensus scoring. The top-ranked poses were post-processed to derive a set of protein and ligand SASA descriptors in the bound and unbound states, which were combined to derive descriptors of the docked protein-ligand complexes. Further, eight ML models (tree, bagged forest, random forest, Bayesian, support vector machine, logistic regression, neural network, and neural network with bagging) were trained using the derivatized SASA descriptors and validated on test sets. The results show that many SASA descriptors are better than Surflex and GOLD scoring functions in terms of overall performance and early recovery success on the used dataset. The ML models were superior to all scoring functions and rescoring approaches for most targets yielding up to a seven-fold increase in enrichment factors at 1% of the screened collections. In particular, the neural networks and random forest-based ML emerged as the best techniques for this PPI dataset, making them robust and attractive vs. tools for hit-finding efforts. The presented results suggest that exploring further docking-pose derived SASA descriptors could be valuable for structure-based virtual screening projects, and in the present case, to assist the rational design of small-molecule PPI inhibitors.
Collapse
|
17
|
Progress and Impact of Latin American Natural Product Databases. Biomolecules 2022; 12:biom12091202. [PMID: 36139041 PMCID: PMC9496143 DOI: 10.3390/biom12091202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 08/27/2022] [Accepted: 08/29/2022] [Indexed: 11/17/2022] Open
Abstract
Natural products (NPs) are a rich source of structurally novel molecules, and the chemical space they encompass is far from being fully explored. Over history, NPs have represented a significant source of bioactive molecules and have served as a source of inspiration for developing many drugs on the market. On the other hand, computer-aided drug design (CADD) has contributed to drug discovery research, mitigating costs and time. In this sense, compound databases represent a fundamental element of CADD. This work reviews the progress toward developing compound databases of natural origin, and it surveys computational methods, emphasizing chemoinformatic approaches to profile natural product databases. Furthermore, it reviews the present state of the art in developing Latin American NP databases and their practical applications to the drug discovery area.
Collapse
|
18
|
Shen C, Zhang X, Deng Y, Gao J, Wang D, Xu L, Pan P, Hou T, Kang Y. Boosting Protein-Ligand Binding Pose Prediction and Virtual Screening Based on Residue-Atom Distance Likelihood Potential and Graph Transformer. J Med Chem 2022; 65:10691-10706. [PMID: 35917397 DOI: 10.1021/acs.jmedchem.2c00991] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The past few years have witnessed enormous progress toward applying machine learning approaches to the development of protein-ligand scoring functions. However, the robust performance and wide applicability of scoring functions remain a big challenge for increasing the success rate of docking-based virtual screening. Herein, a novel scoring function named RTMScore was developed by introducing a tailored residue-based graph representation strategy and several graph transformer layers for the learning of protein and ligand representations, followed by a mixture density network to obtain residue-atom distance likelihood potential. Our approach was resolutely validated on the CASF-2016 benchmark, and the results indicate that RTMScore can outperform almost all of the other state-of-the-art methods in terms of both the docking and screening powers. Further evaluation confirms the robustness of our approach that can not only retain its docking power on cross-docked poses but also achieve improved performance as a rescoring tool in larger-scale virtual screening.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Junbo Gao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
19
|
McGibbon M, Money-Kyrle S, Blay V, Houston DR. SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation. J Adv Res 2022; 46:135-147. [PMID: 35901959 PMCID: PMC10105235 DOI: 10.1016/j.jare.2022.07.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Revised: 07/08/2022] [Accepted: 07/09/2022] [Indexed: 11/17/2022] Open
Abstract
INTRODUCTION The discovery of a new drug is a costly and lengthy endeavour. The computational prediction of which small molecules can bind to a protein target can accelerate this process if the predictions are fast and accurate enough. Recent machine-learning scoring functions re-evaluate the output of molecular docking to achieve more accurate predictions. However, previous scoring functions were trained on crystalised protein-ligand complexes and datasets of decoys. The limited availability of crystal structures and biases in the decoy datasets can lower the performance of scoring functions. OBJECTIVES To address key limitations of previous scoring functions and thus improve the predictive performance of structure-based virtual screening. METHODS A novel machine-learning scoring function was created, named SCORCH (Scoring COnsensus for RMSD-based Classification of Hits). To develop SCORCH, training data is augmented by considering multiple ligand poses and labelling poses based on their RMSD from the native pose. Decoy bias is addressed by generating property-matched decoys for each ligand and using the same methodology for preparing and docking decoys and ligands. A consensus of 3 different machine learning approaches is also used to improve performance. RESULTS We find that multi-pose augmentation in SCORCH improves its docking power and screening power on independent benchmark datasets. SCORCH outperforms an equivalent scoring function trained on single poses, with a 1% enrichment factor (EF) of 13.78 vs. 10.86 on 18 DEKOIS 2.0 targets and a mean native pose rank of 5.9 vs 30.4 on CSAR 2014. Additionally, SCORCH outperforms widely used scoring functions in virtual screening and pose prediction on independent benchmark datasets. CONCLUSION By rationally addressing key limitations of previous scoring functions, SCORCH improves the performance of virtual screening. SCORCH also provides an estimate of its uncertainty, which can help reduce the cost and time required for drug discovery.
Collapse
Affiliation(s)
- Miles McGibbon
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK
| | - Sam Money-Kyrle
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK
| | - Vincent Blay
- Department of Microbiology and Environmental Toxicology, University of California at Santa Cruz, Santa Cruz, CA 95064, USA; Institute for Integrative Systems Biology (I(2)SysBio), Universitat de València and Spanish Research Council (CSIC), 46980 Valencia, Spain.
| | - Douglas R Houston
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK.
| |
Collapse
|
20
|
Dong L, Qu X, Wang B. XLPFE: A Simple and Effective Machine Learning Scoring Function for Protein-Ligand Scoring and Ranking. ACS OMEGA 2022; 7:21727-21735. [PMID: 35785279 PMCID: PMC9245135 DOI: 10.1021/acsomega.2c01723] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 05/30/2022] [Indexed: 06/15/2023]
Abstract
Prediction of protein-ligand binding affinities is a central issue in structure-based computer-aided drug design. In recent years, much effort has been devoted to the prediction of the binding affinity in protein-ligand complexes using machine learning (ML). Due to the remarkable ability of ML methods in nonlinear fitting, ML-based scoring functions (SFs) can deliver much improved performance on a selected test set, such as the comparative assessment of scoring functions (CASF), when compared to the classical SFs. However, the performance of ML-based SFs heavily relies on the overall similarity of the training set and the test set. To improve the performance and transferability of an SF, we have tried to combine various features including energy terms from X-score and AutoDock Vina, the properties of ligands, and the statistical sequence-related information from either the binding site or the full protein. In conjunction with extreme trees (ET), an ML model, we have developed XLPFE, a new SF. Compared with other tested methods such as X-score, AutoDock Vina, ΔvinaXGB, PSH-ML, or CNN-score, XLPFE achieves consistently better scoring and ranking power for various types of protein-ligand complex structures beyond the CASF, suggesting that XLPFE has superior transferability. In particular, XLPFE performs better with metalloenzymes. With its faster speed, improved accuracy, and better transferability, XLPFE could be usefully applied to a diverse range of protein-ligand complexes.
Collapse
Affiliation(s)
- Lina Dong
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| | - Xiaoyang Qu
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| | - Binju Wang
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| |
Collapse
|
21
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
22
|
Zhang X, Shen C, Liao B, Jiang D, Wang J, Wu Z, Du H, Wang T, Huo W, Xu L, Cao D, Hsieh CY, Hou T. TocoDecoy: A New Approach to Design Unbiased Datasets for Training and Benchmarking Machine-Learning Scoring Functions. J Med Chem 2022; 65:7918-7932. [PMID: 35642777 DOI: 10.1021/acs.jmedchem.2c00460] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Development of accurate machine-learning-based scoring functions (MLSFs) for structure-based virtual screening against a given target requires a large unbiased dataset with structurally diverse actives and decoys. However, most datasets for the development of MLSFs were designed for traditional SFs and may suffer from hidden biases and data insufficiency. Hereby, we developed a new approach named Topology-based and Conformation-based decoys generation (TocoDecoy), which integrates two strategies to generate decoys by tweaking the actives for a specific target, to generate unbiased and expandable datasets for training and benchmarking MLSFs. For hidden bias evaluation, the performance of InteractionGraphNet (IGN) trained on the TocoDecoy, LIT-PCBA, and DUD-E-like datasets was assessed. The results illustrate that the IGN model trained on the TocoDecoy dataset is competitive with that trained on the LIT-PCBA dataset but remarkably outperforms that trained on the DUD-E dataset, suggesting that the decoys in TocoDecoy are unbiased for training and benchmarking MLSFs.
Collapse
Affiliation(s)
- Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, China.,Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Ben Liao
- Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China.,National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan 430072, Hubei, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tianyue Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Wenbo Huo
- Tsinghua AI Drug Discovery group, Research Institute of Tsinghua University in Shenzhen, Shenzhen 518057, Guangdong, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
23
|
Pan X, Wang H, Zhang Y, Wang X, Li C, Ji C, Zhang JZH. AA-Score: a New Scoring Function Based on Amino Acid-Specific Interaction for Molecular Docking. J Chem Inf Model 2022; 62:2499-2509. [PMID: 35452230 DOI: 10.1021/acs.jcim.1c01537] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The protein-ligand scoring function plays an important role in computer-aided drug discovery and is heavily used in virtual screening and lead optimization. In this study, we developed a new empirical protein-ligand scoring function with amino acid-specific interaction components for hydrogen bond, van der Waals, and electrostatic interactions. In addition, hydrophobic, π-stacking, π-cation, and metal-ligand interactions are also included in the new scoring function. To better evaluate the performance of the AA-Score, we generated several new test sets for evaluation of scoring, ranking, and docking performances, respectively. Extensive tests show that AA-Score performs well on scoring, docking, and ranking as compared to other widely used traditional scoring functions. The performance improvement of AA-Score benefits from the decomposition of individual interaction into amino acid-specific types. To facilitate applications, we developed an easy-to-use tool to analyze protein-ligand interaction fingerprint and predict binding affinity using the AA-Score. The source code and associated running examples can be found at https://github.com/xundrug/AA-Score-Tool.
Collapse
Affiliation(s)
- Xiaolin Pan
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Hao Wang
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Yueqing Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Xingyu Wang
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Cuiyu Li
- Advanced Computing East China Sub-center, Suma Technology Co., Ltd., Kunshan 215300, China
| | - Changge Ji
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - John Z H Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China.,Department of Chemistry, New York University, New York 10003, United States.,Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan Shanxi 030006, China
| |
Collapse
|
24
|
Choudhury C, Arul Murugan N, Deva Priyakumar U. Structure-based drug repurposing: traditional and advanced AI/ML-aided methods. Drug Discov Today 2022; 27:1847-1861. [PMID: 35301148 PMCID: PMC8920090 DOI: 10.1016/j.drudis.2022.03.006] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 02/16/2022] [Accepted: 03/10/2022] [Indexed: 02/08/2023]
Abstract
The current global health emergency in the form of the Coronavirus 2019 (COVID-19) pandemic has highlighted the need for fast, accurate, and efficient drug discovery pipelines. Traditional drug discovery projects relying on in vitro high-throughput screening (HTS) involve large investments and sophisticated experimental set-ups, affordable only to big biopharmaceutical companies. In this scenario, application of efficient state-of-the-art computational methods and modern artificial intelligence (AI)-based algorithms for rapid screening of repurposable chemical space [approved drugs and natural products (NPs) with proven pharmacokinetic profiles] to identify the initial leads is a powerful option to save resources and time. Structure-based drug repurposing is a popular in silico repurposing approach. In this review, we discuss traditional and modern AI-based computational methods and tools applied at various stages for structure-based drug discovery (SBDD) pipelines. Additionally, we highlight the role of generative models in generating molecules with scaffolds from repurposable chemical space. Teaser: This review highlights the importance of repurposable chemical space, and the contributions of conventional in silico approaches and modern machine-learning algorithms for rapid structure-based drug repurposing.
Collapse
Affiliation(s)
- Chinmayee Choudhury
- Department of Experimental Medicine and Biotechnology, Postgraduate Institute of Medical Education and Research, Sector-12, Chandigarh 160012, India
| | - N Arul Murugan
- Department of Computer Science, School of Electrical Engineering and Computer Sciences, KTH Royal Institute of Technology, S-100 44, Stockholm, Sweden; Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi 110020, India.
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| |
Collapse
|
25
|
Yang ZY, Ye ZF, Xiao YJ, Hsieh CY, Zhang SY. SPLDExtraTrees: robust machine learning approach for predicting kinase inhibitor resistance. Brief Bioinform 2022; 23:6543900. [PMID: 35262669 DOI: 10.1093/bib/bbac050] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 01/17/2022] [Accepted: 01/31/2022] [Indexed: 12/25/2022] Open
Abstract
Drug resistance is a major threat to the global health and a significant concern throughout the clinical treatment of diseases and drug development. The mutation in proteins that is related to drug binding is a common cause for adaptive drug resistance. Therefore, quantitative estimations of how mutations would affect the interaction between a drug and the target protein would be of vital significance for the drug development and the clinical practice. Computational methods that rely on molecular dynamics simulations, Rosetta protocols, as well as machine learning methods have been proven to be capable of predicting ligand affinity changes upon protein mutation. However, the severely limited sample size and heavy noise induced overfitting and generalization issues have impeded wide adoption of machine learning for studying drug resistance. In this paper, we propose a robust machine learning method, termed SPLDExtraTrees, which can accurately predict ligand binding affinity changes upon protein mutation and identify resistance-causing mutations. Especially, the proposed method ranks training data following a specific scheme that starts with easy-to-learn samples and gradually incorporates harder and diverse samples into the training, and then iterates between sample weight recalculations and model updates. In addition, we calculate additional physics-based structural features to provide the machine learning model with the valuable domain knowledge on proteins for these data-limited predictive tasks. The experiments substantiate the capability of the proposed method for predicting kinase inhibitor resistance under three scenarios and achieve predictive accuracy comparable with that of molecular dynamics and Rosetta methods with much less computational costs.
Collapse
Affiliation(s)
- Zi-Yi Yang
- Tencent Quantum Laboratory, Shenzhen, 518057, Guangdong, China
| | - Zhao-Feng Ye
- Tencent Quantum Laboratory, Shenzhen, 518057, Guangdong, China
| | - Yi-Jia Xiao
- Tencent Quantum Laboratory, Shenzhen, 518057, Guangdong, China.,Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Shenzhen, 518057, Guangdong, China
| | - Sheng-Yu Zhang
- Tencent Quantum Laboratory, Shenzhen, 518057, Guangdong, China
| |
Collapse
|
26
|
Can docking scoring functions guarantee success in virtual screening? VIRTUAL SCREENING AND DRUG DOCKING 2022. [DOI: 10.1016/bs.armc.2022.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
27
|
Dong L, Qu X, Zhao Y, Wang B. Prediction of Binding Free Energy of Protein-Ligand Complexes with a Hybrid Molecular Mechanics/Generalized Born Surface Area and Machine Learning Method. ACS OMEGA 2021; 6:32938-32947. [PMID: 34901645 PMCID: PMC8655939 DOI: 10.1021/acsomega.1c04996] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 11/10/2021] [Indexed: 06/14/2023]
Abstract
Accurate prediction of protein-ligand binding free energies is important in enzyme engineering and drug discovery. The molecular mechanics/generalized Born surface area (MM/GBSA) approach is widely used to estimate ligand-binding affinities, but its performance heavily relies on the accuracy of its energy components. A hybrid strategy combining MM/GBSA and machine learning (ML) has been developed to predict the binding free energies of protein-ligand systems. Based on the MM/GBSA energy terms and several features associated with protein-ligand interactions, our ML-based scoring function, GXLE, shows much better performance than MM/GBSA without entropy. In particular, the good transferability of the GXLE model is highlighted by its good performance in ranking power for prediction of the binding affinity of different ligands for either the docked structures or crystal structures. The GXLE scoring function and its code are freely available and can be used to correct the binding free energies computed by MM/GBSA.
Collapse
Affiliation(s)
- Lina Dong
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| | - Xiaoyang Qu
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| | - Yuan Zhao
- The
Key Laboratory of Natural Medicine and Immuno-Engineering, Henan University, Kaifeng 475004, P. R.
China
| | - Binju Wang
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| |
Collapse
|
28
|
Vijayan RSK, Kihlberg J, Cross JB, Poongavanam V. Enhancing preclinical drug discovery with artificial intelligence. Drug Discov Today 2021; 27:967-984. [PMID: 34838731 DOI: 10.1016/j.drudis.2021.11.023] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 10/15/2021] [Accepted: 11/19/2021] [Indexed: 12/14/2022]
Abstract
Artificial intelligence (AI) is becoming an integral part of drug discovery. It has the potential to deliver across the drug discovery and development value chain, starting from target identification and reaching through clinical development. In this review, we provide an overview of current AI technologies and a glimpse of how AI is reimagining preclinical drug discovery by highlighting examples where AI has made a real impact. Considering the excitement and hyperbole surrounding AI in drug discovery, we aim to present a realistic view by discussing both opportunities and challenges in adopting AI in drug discovery.
Collapse
Affiliation(s)
- R S K Vijayan
- Institute for Applied Cancer Science, MD Anderson Cancer Center, Houston, TX, USA
| | - Jan Kihlberg
- Department of Chemistry-BMC, Uppsala University, Uppsala, Sweden
| | - Jason B Cross
- Institute for Applied Cancer Science, MD Anderson Cancer Center, Houston, TX, USA.
| | | |
Collapse
|
29
|
Ricci-Lopez J, Aguila SA, Gilson MK, Brizuela CA. Improving Structure-Based Virtual Screening with Ensemble Docking and Machine Learning. J Chem Inf Model 2021; 61:5362-5376. [PMID: 34652141 DOI: 10.1021/acs.jcim.1c00511] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
One of the main challenges of structure-based virtual screening (SBVS) is the incorporation of the receptor's flexibility, as its explicit representation in every docking run implies a high computational cost. Therefore, a common alternative to include the receptor's flexibility is the approach known as ensemble docking. Ensemble docking consists of using a set of receptor conformations and performing the docking assays over each of them. However, there is still no agreement on how to combine the ensemble docking results to obtain the final ligand ranking. A common choice is to use consensus strategies to aggregate the ensemble docking scores, but these strategies exhibit slight improvement regarding the single-structure approach. Here, we claim that using machine learning (ML) methodologies over the ensemble docking results could improve the predictive power of SBVS. To test this hypothesis, four proteins were selected as study cases: CDK2, FXa, EGFR, and HSP90. Protein conformational ensembles were built from crystallographic structures, whereas the evaluated compound library comprised up to three benchmarking data sets (DUD, DEKOIS 2.0, and CSAR-2012) and cocrystallized molecules. Ensemble docking results were processed through 30 repetitions of 4-fold cross-validation to train and validate two ML classifiers: logistic regression and gradient boosting trees. Our results indicate that the ML classifiers significantly outperform traditional consensus strategies and even the best performance case achieved with single-structure docking. We provide statistical evidence that supports the effectiveness of ML to improve the ensemble docking performance.
Collapse
Affiliation(s)
- Joel Ricci-Lopez
- Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California C.P. 22860, Mexico.,Centro de Nanociencias y Nanotecnología, Universidad Nacional Autónoma de México (UNAM), Ensenada, Baja California C.P. 22860, Mexico
| | - Sergio A Aguila
- Centro de Nanociencias y Nanotecnología, Universidad Nacional Autónoma de México (UNAM), Ensenada, Baja California C.P. 22860, Mexico
| | - Michael K Gilson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, La Jolla, San Diego, California 92093, United States
| | - Carlos A Brizuela
- Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California C.P. 22860, Mexico
| |
Collapse
|
30
|
Meng J, Zhang L, Wang L, Li S, Xie D, Zhang Y, Liu H. TSSF-hERG: A machine-learning-based hERG potassium channel-specific scoring function for chemical cardiotoxicity prediction. Toxicology 2021; 464:153018. [PMID: 34757159 DOI: 10.1016/j.tox.2021.153018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 10/15/2021] [Accepted: 10/26/2021] [Indexed: 11/27/2022]
Abstract
The human ether-à-go-go-related gene (hERG) encodes the Kv11.1 voltage-gated potassium ion (K+) channel that conducts the rapidly activating delayed rectifier current (IKr) in cardiomyocytes to regulate the repolarization process. Some drugs, as blockers of hERG potassium channels, cannot be marketed due to prolonged QT intervals, as well known as cardiotoxicity. Predetermining the binding affinity values between drugs and hERG through in silico methods can greatly reduce the time and cost required for experimental verification. In this study, we collected 9,215 compounds with AutoDock Vina's docking structures as training set, and collected compounds from four references as test sets. A series of models for predicting the binding affinities of hERG blockers were built based on five machine learning algorithms and combinations of interaction features and ligand features. The model built by support vector regression (SVR) using the combination of all features achieved the best performance on both tenfold cross-validation and external verification, which was selected and named as TSSF-hERG (target-specific scoring function for hERG). TSSF-hERG is more accurate than the classic scoring function of AutoDock Vina and the machine-learning-based generic scoring function RF-Score, with a Pearson's correlation coefficient (Rp) of 0.765, a Spearman's rank correlation coefficient (Rs) of 0.757, a root-mean-square error (RMSE) of 0.585 in a tenfold cross-validation study. All results demonstrated that TSSF-hERG would be useful for improving the power of binding affinity prediction between hERG and compounds, which can be further used for prediction or virtual screening of the hERG-related cardiotoxicity of drug candidates.
Collapse
Affiliation(s)
- Jinhui Meng
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Li Zhang
- School of Life Science, Liaoning University, Shenyang, 110036, China; Technology Innovation Center for Computer Simulating and Information Processing of Bio-macromolecules of Liaoning Province, Shenyang, 110036, China; Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Shenyang, 110036, China
| | - Lianxin Wang
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Shimeng Li
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Di Xie
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Yuxi Zhang
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Hongsheng Liu
- Technology Innovation Center for Computer Simulating and Information Processing of Bio-macromolecules of Liaoning Province, Shenyang, 110036, China; Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Shenyang, 110036, China; School of Pharmacy, Liaoning University, Shenyang, 110036, China.
| |
Collapse
|
31
|
Li H, Lu G, Sze KH, Su X, Chan WY, Leung KS. Machine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmark. Brief Bioinform 2021; 22:bbab225. [PMID: 34169324 PMCID: PMC8575004 DOI: 10.1093/bib/bbab225] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 04/27/2021] [Accepted: 05/23/2021] [Indexed: 11/12/2022] Open
Abstract
The superior performance of machine-learning scoring functions for docking has caused a series of debates on whether it is due to learning knowledge from training data that are similar in some sense to the test data. With a systematically revised methodology and a blind benchmark realistically mimicking the process of prospective prediction of binding affinity, we have evaluated three broadly used classical scoring functions and five machine-learning counterparts calibrated with both random forest and extreme gradient boosting using both solo and hybrid features, showing for the first time that machine-learning scoring functions trained exclusively on a proportion of as low as 8% complexes dissimilar to the test set already outperform classical scoring functions, a percentage that is far lower than what has been recently reported on all the three CASF benchmarks. The performance of machine-learning scoring functions is underestimated due to the absence of similar samples in some artificially created training sets that discard the full spectrum of complexes to be found in a prospective environment. Given the inevitability of any degree of similarity contained in a large dataset, the criteria for scoring function selection depend on which one can make the best use of all available materials. Software code and data are provided at https://github.com/cusdulab/MLSF for interested readers to rapidly rebuild the scoring functions and reproduce our results, even to make extended analyses on their own benchmarks.
Collapse
Affiliation(s)
| | - Gang Lu
- School of Biomedical Sciences, Chinese University of Hong Kong, Hong Kong
| | - Kam-Heung Sze
- Bioinformatics Unit, Hong Kong Medical Technology Institute, Hong Kong
| | - Xianwei Su
- Chinese University of Hong Kong, Hong Kong
| | - Wai-Yee Chan
- CUHK-SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences, Chinese University of Hong Kong, Hong Kong
| | - Kwong-Sak Leung
- Computer Science and Engineering in the Chinese University of Hong Kong, Hong Kong
| |
Collapse
|
32
|
Shen C, Hu X, Gao J, Zhang X, Zhong H, Wang Z, Xu L, Kang Y, Cao D, Hou T. The impact of cross-docked poses on performance of machine learning classifier for protein-ligand binding pose prediction. J Cheminform 2021; 13:81. [PMID: 34656169 PMCID: PMC8520186 DOI: 10.1186/s13321-021-00560-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 10/05/2021] [Indexed: 02/06/2023] Open
Abstract
Structure-based drug design depends on the detailed knowledge of the three-dimensional (3D) structures of protein-ligand binding complexes, but accurate prediction of ligand-binding poses is still a major challenge for molecular docking due to deficiency of scoring functions (SFs) and ignorance of protein flexibility upon ligand binding. In this study, based on a cross-docking dataset dedicatedly constructed from the PDBbind database, we developed several XGBoost-trained classifiers to discriminate the near-native binding poses from decoys, and systematically assessed their performance with/without the involvement of the cross-docked poses in the training/test sets. The calculation results illustrate that using Extended Connectivity Interaction Features (ECIF), Vina energy terms and docking pose ranks as the features can achieve the best performance, according to the validation through the random splitting or refined-core splitting and the testing on the re-docked or cross-docked poses. Besides, it is found that, despite the significant decrease of the performance for the threefold clustered cross-validation, the inclusion of the Vina energy terms can effectively ensure the lower limit of the performance of the models and thus improve their generalization capability. Furthermore, our calculation results also highlight the importance of the incorporation of the cross-docked poses into the training of the SFs with wide application domain and high robustness for binding pose prediction. The source code and the newly-developed cross-docking datasets can be freely available at https://github.com/sc8668/ml_pose_prediction and https://zenodo.org/record/5525936 , respectively, under an open-source license. We believe that our study may provide valuable guidance for the development and assessment of new machine learning-based SFs (MLSFs) for the predictions of protein-ligand binding poses.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Xueping Hu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Junbo Gao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Haiyang Zhong
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan, 410013, People's Republic of China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China. .,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.
| |
Collapse
|
33
|
Abstract
Molecular docking is one of the most widely used computational tools in structure-based drug design and is critically dependent on accuracy and robustness of the scoring function. In this work, we introduce a new scoring function Lin_F9, which is a linear combination of nine empirical terms, including a unified metal bond term to specifically describe metal-ligand interactions. Parameters in Lin_F9 are obtained with a multistage fitting protocol using explicit water-included structures. For the CASF-2016 benchmark test set, Lin_F9 achieves the top scoring power among all 34 classical scoring functions for both original crystal poses and locally optimized poses with Pearson correlation coefficients (R) of 0.680 and 0.687, respectively. Meanwhile, in comparison with Vina, Lin_F9 achieves consistently better scoring power and ranking power with various types of protein-ligand complex structures that mimic real docking applications, including end-to-end flexible docking for the CASF-2016 benchmark test set using a single or an ensemble of protein receptor structures, as well as for D3R Grand Challenge (GC4) test sets. Lin_F9 has been implemented in a fork of Smina as an optional built-in scoring function that can be used for docking applications as well as for further improvement of scoring functions and docking protocols. Lin_F9 is accessible through https://yzhang.hpc.nyu.edu/Lin_F9/.
Collapse
Affiliation(s)
- Chao Yang
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
34
|
Xiong G, Shen C, Yang Z, Jiang D, Liu S, Lu A, Chen X, Hou T, Cao D. Featurization strategies for protein–ligand interactions and their applications in scoring function development. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1567] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Guoli Xiong
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Ziyi Yang
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Dejun Jiang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
- College of Computer Science and Technology Zhejiang University Hangzhou China
| | - Shao Liu
- Department of Pharmacy Xiangya Hospital, Central South University Changsha China
| | - Aiping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis Xiangya Hospital, Central South University Changsha China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| |
Collapse
|
35
|
Stanzione F, Giangreco I, Cole JC. Use of molecular docking computational tools in drug discovery. PROGRESS IN MEDICINAL CHEMISTRY 2021; 60:273-343. [PMID: 34147204 DOI: 10.1016/bs.pmch.2021.01.004] [Citation(s) in RCA: 153] [Impact Index Per Article: 38.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Molecular docking has become an important component of the drug discovery process. Since first being developed in the 1980s, advancements in the power of computer hardware and the increasing number of and ease of access to small molecule and protein structures have contributed to the development of improved methods, making docking more popular in both industrial and academic settings. Over the years, the modalities by which docking is used to assist the different tasks of drug discovery have changed. Although initially developed and used as a standalone method, docking is now mostly employed in combination with other computational approaches within integrated workflows. Despite its invaluable contribution to the drug discovery process, molecular docking is still far from perfect. In this chapter we will provide an introduction to molecular docking and to the different docking procedures with a focus on several considerations and protocols, including protonation states, active site waters and consensus, that can greatly improve the docking results.
Collapse
Affiliation(s)
| | - Ilenia Giangreco
- Cambridge Crystallographic Data Centre, Cambridge, United Kingdom
| | - Jason C Cole
- Cambridge Crystallographic Data Centre, Cambridge, United Kingdom
| |
Collapse
|
36
|
Bao J, He X, Zhang JZH. DeepBSP-a Machine Learning Method for Accurate Prediction of Protein-Ligand Docking Structures. J Chem Inf Model 2021; 61:2231-2240. [PMID: 33979150 DOI: 10.1021/acs.jcim.1c00334] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
In recent years, machine-learning-based scoring functions have significantly improved the scoring power. However, many of these methods do not perform well in distinguishing the native structure from docked decoy poses due to the lack of decoy structural information in their training data. Here, we developed a machine-learning model, named DeepBSP, that can directly predict the root mean square deviation (rmsd) of a ligand docking pose with reference to its native binding pose. Unlike the binding affinity, the rmsd between the docking poses with reference to their native structures can be straightforwardly determined. By training on a generated data set with 11,925 native complexes and more than 165,000 docked poses, our model shows excellent docking power on our test set and also on the CASF-2016 docking decoy set compared to other major scoring functions. Thus, by combining molecular dockings that generate many poses with the application of DeepBSP, one can more accurately predict the best binding pose that is closest to the native complex structure. This DeepBSP model shall be very useful in picking out poses close to their natives from many poses generated from a dock application.
Collapse
Affiliation(s)
- Jingxiao Bao
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
| | - Xiao He
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China
| | - John Z H Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China.,Department of Chemistry, New York University, New York, New York 10003, United States.,Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi 030006, China
| |
Collapse
|
37
|
Wee J, Xia K. Forman persistent Ricci curvature (FPRC)-based machine learning models for protein-ligand binding affinity prediction. Brief Bioinform 2021; 22:6262241. [PMID: 33940588 DOI: 10.1093/bib/bbab136] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 03/14/2021] [Accepted: 03/23/2021] [Indexed: 01/01/2023] Open
Abstract
Artificial intelligence (AI) techniques have already been gradually applied to the entire drug design process, from target discovery, lead discovery, lead optimization and preclinical development to the final three phases of clinical trials. Currently, one of the central challenges for AI-based drug design is molecular featurization, which is to identify or design appropriate molecular descriptors or fingerprints. Efficient and transferable molecular descriptors are key to the success of all AI-based drug design models. Here we propose Forman persistent Ricci curvature (FPRC)-based molecular featurization and feature engineering, for the first time. Molecular structures and interactions are modeled as simplicial complexes, which are generalization of graphs to their higher dimensional counterparts. Further, a multiscale representation is achieved through a filtration process, during which a series of nested simplicial complexes at different scales are generated. Forman Ricci curvatures (FRCs) are calculated on the series of simplicial complexes, and the persistence and variation of FRCs during the filtration process is defined as FPRC. Moreover, persistent attributes, which are FPRC-based functions and properties, are employed as molecular descriptors, and combined with machine learning models, in particular, gradient boosting tree (GBT). Our FPRC-GBT models are extensively trained and tested on three most commonly-used datasets, including PDBbind-2007, PDBbind-2013 and PDBbind-2016. It has been found that our results are better than the ones from machine learning models with traditional molecular descriptors.
Collapse
Affiliation(s)
- JunJie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
| |
Collapse
|
38
|
Andrianov AM, Nikolaev GI, Shuldov NA, Bosko IP, Anischenko AI, Tuzikov AV. Application of deep learning and molecular modeling to identify small drug-like compounds as potential HIV-1 entry inhibitors. J Biomol Struct Dyn 2021; 40:7555-7573. [PMID: 33855929 DOI: 10.1080/07391102.2021.1905559] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
A generative adversarial autoencoder for the rational design of potential HIV-1 entry inhibitors able to block CD4-binding site of the viral envelope protein gp120 was developed. To do this, the following studies were carried out: (i) an autoencoder architecture was constructed; (ii) a virtual compound library of potential anti-HIV-1 agents for training the neural network was formed by the concept of click chemistry allowing one to generate a large number of drug candidates by their assembly from small modular units; (iii) molecular docking of all compounds from this library with gp120 was made and calculations of the values of binding free energy were performed; (iv) molecular fingerprints of chemical compounds from the training dataset were generated; (v) training of the developed autoencoder was implemented followed by the validation of this neural network using more than 21 million molecules from the ZINC15 database. As a result, three small drug-like compounds that exhibited the high-affinity binding to gp120 were identified. According to the data from molecular docking, machine learning, quantum chemical calculations, and molecular dynamics simulations, these compounds show the low values of binding free energy in the complexes with gp120 similar to those calculated using the same computational protocols for the HIV-1 entry inhibitors NBD-11021 and NBD-14010, highly potent and broad anti-HIV-1 agents presenting a new generation of the viral CD4 antagonists. The identified CD4-mimetic candidates are suggested to present good scaffolds for the design of novel antiviral drugs inhibiting the early stages of HIV-1 infection.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Alexander M Andrianov
- Institute of Bioorganic Chemistry, National Academy of Sciences of Belarus, Minsk, Republic of Belarus
| | - Grigory I Nikolaev
- United Institute of Informatics Problems, National Academy of Sciences of Belarus, Minsk, Republic of Belarus
| | - Nikita A Shuldov
- Faculty of Applied Mathematics & Computer Science, Belarusian State University, Minsk, Republic of Belarus
| | - Ivan P Bosko
- United Institute of Informatics Problems, National Academy of Sciences of Belarus, Minsk, Republic of Belarus
| | - Arseny I Anischenko
- Faculty of Applied Mathematics & Computer Science, Belarusian State University, Minsk, Republic of Belarus
| | - Alexander V Tuzikov
- United Institute of Informatics Problems, National Academy of Sciences of Belarus, Minsk, Republic of Belarus
| |
Collapse
|
39
|
Shen C, Weng G, Zhang X, Leung ELH, Yao X, Pang J, Chai X, Li D, Wang E, Cao D, Hou T. Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening? Brief Bioinform 2021; 22:6070382. [PMID: 33418562 DOI: 10.1093/bib/bbaa410] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Revised: 11/26/2020] [Accepted: 12/12/2020] [Indexed: 12/13/2022] Open
Abstract
Machine-learning (ML)-based scoring functions (MLSFs) have gradually emerged as a promising alternative for protein-ligand binding affinity prediction and structure-based virtual screening. However, clouds of doubts have still been raised against the benefits of this novel type of scoring functions (SFs). In this study, to benchmark the performance of target-specific MLSFs on a relatively unbiased dataset, the MLSFs trained from three representative protein-ligand interaction representations were assessed on the LIT-PCBA dataset, and the classical Glide SP SF and three types of ligand-based quantitative structure-activity relationship (QSAR) models were also utilized for comparison. Two major aspects in virtual screening campaigns, including prediction accuracy and hit novelty, were systematically explored. The calculation results illustrate that the tested target-specific MLSFs yielded generally superior performance over the classical Glide SP SF, but they could hardly outperform the 2D fingerprint-based QSAR models. Although substantial improvements could be achieved by integrating multiple types of protein-ligand interaction features, the MLSFs were still not sufficient to exceed MACCS-based QSAR models. In terms of the correlations between the hit ranks or the structures of the top-ranked hits, the MLSFs developed by different featurization strategies would have the ability to identify quite different hits. Nevertheless, it seems that target-specific MLSFs do not have the intrinsic attributes of a traditional SF and may not be a substitute for classical SFs. In contrast, MLSFs can be regarded as a new derivative of ligand-based QSAR models. It is expected that our study may provide valuable guidance for the assessment and further development of target-specific MLSFs.
Collapse
Affiliation(s)
- Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Gaoqi Weng
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Xujun Zhang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Elaine Lai-Han Leung
- State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macau, SAR, China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macau, SAR, China
| | - Jinping Pang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Xin Chai
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dan Li
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Ercheng Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| |
Collapse
|
40
|
Bao J, He X, Zhang JZ. Development of a New Scoring Function for Virtual Screening: APBScore. J Chem Inf Model 2020; 60:6355-6365. [DOI: 10.1021/acs.jcim.0c00474] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Jingxiao Bao
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
| | - Xiao He
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China
| | - John Z.H. Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China
- Department of Chemistry, New York University, New York, New York 10003, United States
- Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi 030006, China
| |
Collapse
|
41
|
Singh N, Chaput L, Villoutreix BO. Fast Rescoring Protocols to Improve the Performance of Structure-Based Virtual Screening Performed on Protein-Protein Interfaces. J Chem Inf Model 2020; 60:3910-3934. [PMID: 32786511 DOI: 10.1021/acs.jcim.0c00545] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Protein-protein interactions (PPIs) are attractive targets for drug design because of their essential role in numerous cellular processes and disease pathways. However, in general, PPIs display exposed binding pockets at the interface, and as such, have been largely unexploited for therapeutic interventions with low-molecular weight compounds. Here, we used docking and various rescoring strategies in an attempt to recover PPI inhibitors from a set of active and inactive molecules for 11 targets collected in ChEMBL and PubChem. Our focus is on the screening power of the various developed protocols and on using fast approaches so as to be able to apply such a strategy to the screening of ultralarge libraries in the future. First, we docked compounds into each target using the fast "pscreen" mode of the structure-based virtual screening (VS) package Surflex. Subsequently, the docking poses were postprocessed to derive a set of 3D topological descriptors: (i) shape similarity and (ii) interaction fingerprint similarity with a co-crystallized inhibitor, (iii) solvent-accessible surface area, and (iv) extent of deviation from the geometric center of a reference inhibitor. The derivatized descriptors, together with descriptor-scaled scoring functions, were utilized to investigate possible impacts on VS performance metrics. Moreover, four standalone scoring functions, RF-Score-VS (machine-learning), DLIGAND2 (knowledge-based), Vinardo (empirical), and X-SCORE (empirical), were employed to rescore the PPI compounds. Collectively, the results indicate that the topological scoring algorithms could be valuable both at a global level, with up to 79% increase in areas under the receiver operating characteristic curve for some targets, and in early stages, with up to a 4-fold increase in enrichment factors at 1% of the screened collections. Outstandingly, DLIGAND2 emerged as the best scoring function on this data set, outperforming all rescoring techniques in terms of VS metrics. The described methodology could help in the rational design of small-molecule PPI inhibitors and has direct applications in many therapeutic areas, including cancer, CNS, and infectious diseases such as COVID-19.
Collapse
Affiliation(s)
- Natesh Singh
- Université de Lille, Inserm, Institut Pasteur de Lille, U1177-Drugs and Molecules for Living Systems, F-59000 Lille, France
| | - Ludovic Chaput
- Université de Lille, Inserm, Institut Pasteur de Lille, U1177-Drugs and Molecules for Living Systems, F-59000 Lille, France
| | - Bruno O Villoutreix
- Université de Lille, Inserm, Institut Pasteur de Lille, U1177-Drugs and Molecules for Living Systems, F-59000 Lille, France
| |
Collapse
|
42
|
Ye WL, Shen C, Xiong GL, Ding JJ, Lu AP, Hou TJ, Cao DS. Improving Docking-Based Virtual Screening Ability by Integrating Multiple Energy Auxiliary Terms from Molecular Docking Scoring. J Chem Inf Model 2020; 60:4216-4230. [PMID: 32352294 DOI: 10.1021/acs.jcim.9b00977] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Virtual Screening (VS) based on molecular docking is an efficient method used for retrieving novel hit compounds in drug discovery. However, the accuracy of the current docking scoring function (SF) is usually insufficient. In this study, in order to improve the screening power of SF, a novel approach named EAT-Score was proposed by directly utilizing the energy auxiliary terms (EAT) provided by molecular docking scoring through eXtreme Gradient Boosting (XGBoost). Here, EAT specifically refers to the output of the Molecular Operating Environment (MOE) scoring, including the energy scores of five different classical SFs and the Protein-Ligand Interaction Fingerprint (PLIF) terms. The performance of EAT-Score to discriminate actives from decoys was strictly validated on the DUD-E diverse subset by using different performance metrics. The results showed that EAT-Score performed much better than classical SFs in VS, with its AUC values exhibiting an improvement of around 0.3. Meanwhile, EAT-Score could achieve comparable even better prediction performance compared with other state-of-the-art VS methods, such as some machine learning (ML)-based SFs and classical SFs implemented in docking programs, in terms of AUC, LogAUC, or BEDROC. Furthermore, the EAT-Score model can capture important binding pattern information from protein-ligand complexes by Shapley additive explanations (SHAP) analysis, which may be very helpful in interpreting the ligand binding mechanism for a certain target and thereby guiding drug design.
Collapse
Affiliation(s)
- Wen-Ling Ye
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, P. R. China
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Guo-Li Xiong
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, P. R. China
| | - Jun-Jie Ding
- Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, P. R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P. R. China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, P. R. China.,Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P. R. China
| |
Collapse
|