1
|
Lam HYI, Guan JS, Ong XE, Pincket R, Mu Y. Protein language models are performant in structure-free virtual screening. Brief Bioinform 2024; 25:bbae480. [PMID: 39327890 PMCID: PMC11427677 DOI: 10.1093/bib/bbae480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 08/17/2024] [Accepted: 09/12/2024] [Indexed: 09/28/2024] Open
Abstract
Hitherto virtual screening (VS) has been typically performed using a structure-based drug design paradigm. Such methods typically require the use of molecular docking on high-resolution three-dimensional structures of a target protein-a computationally-intensive and time-consuming exercise. This work demonstrates that by employing protein language models and molecular graphs as inputs to a novel graph-to-transformer cross-attention mechanism, a screening power comparable to state-of-the-art structure-based models can be achieved. The implications thereof include highly expedited VS due to the greatly reduced compute required to run this model, and the ability to perform early stages of computer-aided drug design in the complete absence of 3D protein structures.
Collapse
Affiliation(s)
- Hilbert Yuen In Lam
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, Singapore 637551, Singapore, Republic of Singapore
- MagMol Pte. Ltd., 68 Circular Road, #02-01, Singapore 049422, Singapore, Republic of Singapore
| | - Jia Sheng Guan
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, Singapore 637551, Singapore, Republic of Singapore
| | - Xing Er Ong
- MagMol Pte. Ltd., 68 Circular Road, #02-01, Singapore 049422, Singapore, Republic of Singapore
| | - Robbe Pincket
- Heliovision, Asstraat 5, 3000 Leuven, Leuven, Kingdom of Belgium
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, Singapore 637551, Singapore, Republic of Singapore
- MagMol Pte. Ltd., 68 Circular Road, #02-01, Singapore 049422, Singapore, Republic of Singapore
| |
Collapse
|
2
|
Musa MS, Miah MS, Munni YA, Patwary MAM, Kazi M, Matin MM. Synthesis and elucidation of strained galactopyronose esters as selective cyclooxygenase-2 inhibitor: a comprehensive computational approach. RSC Adv 2024; 14:30469-30481. [PMID: 39318455 PMCID: PMC11421415 DOI: 10.1039/d4ra03520h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 09/17/2024] [Indexed: 09/26/2024] Open
Abstract
Cyclooxygenase-2 (COX-2) is critically implicated in various pathologies, including inflammation, cancer, disorders involving the nervous system, and multidrug resistance. In both academic and pharmaceutical research, the development of COX-2 selective drugs as anti-inflammatory and anti-tumor therapeutics is a key focus. Traditional nonsteroidal anti-inflammatory drugs (NSAIDs) have ulcerogenic, gastrointestinal adverse effects, and myocardial infarction risk, which resulted in their limited applications. In response to this challenge, we synthesized a series of glycoconjugates featuring six-membered sugar rings and acyl chains of varying lengths attached at the C-6 position. Using molecular docking techniques, we identified galactose esters with optimal acyl chain lengths that selectively and effectively bind to the active site of COX-2 over COX-1. These compounds exhibited enhanced binding affinity and superior inhibition constants (pK i) for COX-2, thereby offering selective inhibition with potentially reduced ulcerogenic risks, as COX-1 inhibition is thought to contribute to these side effects. The molecular docking study identified two potential compounds, G6 and G8, which were validated via MD simulation for the assessment of their stability and were compared to the complex of the standard drugs, aspirin and rofecoxib. In addition, compound structures were optimized using the DFT method under the B3LYP/6-31+g(d,p) basis set to study their physio-spectral properties, frontier molecular orbitals (HOMO-LUMO), and their energy gap that correlates to their reactivity and stability. ADMET, drug-likeness, and PASS analyses were also carried out to assess their drug-ability and toxicity profiling.
Collapse
Affiliation(s)
- Mohammed Sakib Musa
- Department of Applied Chemistry and Chemical Engineering, Faculty of Science, University of Chittagong Chittagong 4331 Bangladesh
| | - Md Sopon Miah
- Bioorganic and Medicinal Chemistry Laboratory, Department of Chemistry, Faculty of Science, University of Chittagong Chittagong 4331 Bangladesh +880 1716 839689
| | - Yeasmin Akter Munni
- Department of Anatomy, College of Medicine, Dongguk University Gyeongju 38066 Republic of Korea
| | | | - Mohsin Kazi
- Department of Pharmaceutics, College of Pharmacy, King Saud University P. O. Box 2457 Riyadh 11451 Saudi Arabia
| | - Mohammed Mahbubul Matin
- Bioorganic and Medicinal Chemistry Laboratory, Department of Chemistry, Faculty of Science, University of Chittagong Chittagong 4331 Bangladesh +880 1716 839689
| |
Collapse
|
3
|
Venkatraman V, Gaiser J, Demekas D, Roy A, Xiong R, Wheeler TJ. Do Molecular Fingerprints Identify Diverse Active Drugs in Large-Scale Virtual Screening? (No). Pharmaceuticals (Basel) 2024; 17:992. [PMID: 39204097 PMCID: PMC11356940 DOI: 10.3390/ph17080992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 07/18/2024] [Accepted: 07/23/2024] [Indexed: 09/03/2024] Open
Abstract
Computational approaches for small-molecule drug discovery now regularly scale to the consideration of libraries containing billions of candidate small molecules. One promising approach to increased the speed of evaluating billion-molecule libraries is to develop succinct representations of each molecule that enable the rapid identification of molecules with similar properties. Molecular fingerprints are thought to provide a mechanism for producing such representations. Here, we explore the utility of commonly used fingerprints in the context of predicting similar molecular activity. We show that fingerprint similarity provides little discriminative power between active and inactive molecules for a target protein based on a known active-while they may sometimes provide some enrichment for active molecules in a drug screen, a screened data set will still be dominated by inactive molecules. We also demonstrate that high-similarity actives appear to share a scaffold with the query active, meaning that they could more easily be identified by structural enumeration. Furthermore, even when limited to only active molecules, fingerprint similarity values do not correlate with compound potency. In sum, these results highlight the need for a new wave of molecular representations that will improve the capacity to detect biologically active molecules based on their similarity to other such molecules.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Chemistry, Norwegian University of Science and Technology, 7034 Trondheim, Norway
| | - Jeremiah Gaiser
- School of Information, University of Arizona, Tucson, AZ 85721, USA
| | - Daphne Demekas
- R. Ken Coit College Pharmacy, University of Arizona, Tucson, AZ 85721, USA
| | - Amitava Roy
- Rocky Mountain Laboratories, Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Hamilton, MT 59840, USA;
- Department of Biomedical and Pharmaceutical Sciences, University of Montana, Missoula, MT 59812, USA
| | - Rui Xiong
- Department of Pharmacology & Toxicology, University of Arizona, Tucson, AZ 85721, USA
| | - Travis J. Wheeler
- R. Ken Coit College Pharmacy, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
4
|
Zhou H, Skolnick J. Utility of the Morgan Fingerprint in Structure-Based Virtual Ligand Screening. J Phys Chem B 2024; 128:5363-5370. [PMID: 38783525 PMCID: PMC11163432 DOI: 10.1021/acs.jpcb.4c01875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 05/10/2024] [Accepted: 05/14/2024] [Indexed: 05/25/2024]
Abstract
In modern drug discovery, virtual ligand screening (VLS) is frequently applied to identify possible hits before experimental testing and refinement due to its cost-effective nature for large compound libraries. For decades, efforts have been devoted to developing VLS methods with high accuracy. These include the state-of-the-art FINDSITE suite of approaches FINDSITEcomb2.0, FRAGSITE, and FRAGSITE2 and the meta version FRAGSITEcomb that were developed in our lab. These methods combine ligand homology modeling (LHM), traditional ligand similarity methods, and more recently machine learning approaches to rank ligands and have proven to be superior to most recent deep learning and large language model-based approaches. Here, we describe further improvements to our previous best methods by combining the Morgan fingerprint (MF) with the originally used PubChem fingerprint and FP2 fingerprint. We then benchmarked FINDSITEcomb2.0M, FRAGSITEM, FRAGSITE2M, and the composite meta-approach FRAGSITEcombM. On the 102 target DUD-E set, the 1% enrichment factor (EF1%) and area under the precision-recall curve (AUPR) of FRAGSITEcomb increased from 42.0/0.59 to 47.6/0.72. This 0.72 AUPR is significantly better than that of the state-of-the-art deep learning-based method DenseFS's AUPR of 0.443. An independent test on the 81 targets DEKOIS2.0 set shows that EF1%/AUPR increases from 18.3/0.520 to 23.1/0.683. An ablation investigation shows that the MF contributes to most of the improvement of all four approaches. Thus, the MF is a useful addition to structure-based VLS.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems
Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Jeffrey Skolnick
- Center for the Study of Systems
Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| |
Collapse
|
5
|
Gu S, Yang Y, Zhao Y, Qiu J, Wang X, Tong HHY, Liu L, Wan X, Liu H, Hou T, Kang Y. Evaluation of AlphaFold2 Structures for Hit Identification across Multiple Scenarios. J Chem Inf Model 2024; 64:3630-3639. [PMID: 38630855 DOI: 10.1021/acs.jcim.3c01976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
The introduction of AlphaFold2 (AF2) has sparked significant enthusiasm and generated extensive discussion within the scientific community, particularly among drug discovery researchers. Although previous studies have addressed the performance of AF2 structures in virtual screening (VS), a more comprehensive investigation is still necessary considering the paramount importance of structural accuracy in drug design. In this study, we evaluate the performance of AF2 structures in VS across three common drug discovery scenarios: targets with holo, apo, and AF2 structures; targets with only apo and AF2 structures; and targets exclusively with AF2 structures. We utilized both the traditional physics-based Glide and the deep-learning-based scoring function RTMscore to rank the compounds in the DUD-E, DEKOIS 2.0, and DECOY data sets. The results demonstrate that, overall, the performance of VS on AF2 structures is comparable to that on apo structures but notably inferior to that on holo structures across diverse scenarios. Moreover, when a target has solely AF2 structure, selecting the holo structure of the target from different subtypes within the same protein family produces comparable results with the AF2 structure for VS on the data set of the AF2 structures, and significantly better results than the AF2 structures on its own data set. This indicates that utilizing AF2 structures for docking-based VS may not yield most satisfactory outcomes, even when solely AF2 structures are available. Moreover, we rule out the possibility that the variations in VS performance between the binding pockets of AF2 and holo structures arise from the differences in their biological assembly composition.
Collapse
Affiliation(s)
- Shukai Gu
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yuwei Yang
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Yihao Zhao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jiayue Qiu
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Xiaorui Wang
- State Key Laboratory of Quality Re-search in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao 999078, China
| | - Henry Hoi Yee Tong
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Liwei Liu
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd., Nanjing 210000, Jiangsu, China
| | - Xiaozhe Wan
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd., Nanjing 210000, Jiangsu, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
6
|
Brocidiacono M, Francoeur P, Aggarwal R, Popov KI, Koes DR, Tropsha A. BigBind: Learning from Nonstructural Data for Structure-Based Virtual Screening. J Chem Inf Model 2024; 64:2488-2495. [PMID: 38113513 DOI: 10.1021/acs.jcim.3c01211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
Deep learning methods that predict protein-ligand binding have recently been used for structure-based virtual screening. Many such models have been trained using protein-ligand complexes with known crystal structures and activities from the PDBBind data set. However, because PDBbind only includes 20K complexes, models typically fail to generalize to new targets, and model performance is on par with models trained with only ligand information. Conversely, the ChEMBL database contains a wealth of chemical activity information but includes no information about binding poses. We introduce BigBind, a data set that maps ChEMBL activity data to proteins from the CrossDocked data set. BigBind comprises 583 K ligand activities and includes 3D structures of the protein binding pockets. Additionally, we augmented the data by adding an equal number of putative inactives for each target. Using this data, we developed Banana (basic neural network for binding affinity), a neural network-based model to classify active from inactive compounds, defined by a 10 μM cutoff. Our model achieved an AUC of 0.72 on BigBind's test set, while a ligand-only model achieved an AUC of 0.59. Furthermore, Banana achieved competitive performance on the LIT-PCBA benchmark (median EF1% 1.81) while running 16,000 times faster than molecular docking with Gnina. We suggest that Banana, as well as other models trained on this data set, will significantly improve the outcomes of prospective virtual screening tasks.
Collapse
Affiliation(s)
- Michael Brocidiacono
- Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Paul Francoeur
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Rishal Aggarwal
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Konstantin I Popov
- Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - David Ryan Koes
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Alexander Tropsha
- Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
7
|
Qu X, Dong L, Luo D, Si Y, Wang B. Water Network-Augmented Two-State Model for Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2024; 64:2263-2274. [PMID: 37433009 DOI: 10.1021/acs.jcim.3c00567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2023]
Abstract
Water network rearrangement from the ligand-unbound state to the ligand-bound state is known to have significant effects on the protein-ligand binding interactions, but most of the current machine learning-based scoring functions overlook these effects. In this study, we endeavor to construct a comprehensive and realistic deep learning model by incorporating water network information into both ligand-unbound and -bound states. In particular, extended connectivity interaction features were integrated into graph representation, and graph transformer operator was employed to extract features of the ligand-unbound and -bound states. Through these efforts, we developed a water network-augmented two-state model called ECIFGraph::HM-Holo-Apo. Our new model exhibits satisfactory performance in terms of scoring, ranking, docking, screening, and reverse screening power tests on the CASF-2016 benchmark. In addition, it can achieve superior performance in large-scale docking-based virtual screening tests on the DEKOIS2.0 data set. Our study highlights that the use of a water network-augmented two-state model can be an effective strategy to bolster the robustness and applicability of machine learning-based scoring functions, particularly for targets with hydrophilic or solvent-exposed binding pockets.
Collapse
Affiliation(s)
- Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yubing Si
- College of Chemistry, Zhengzhou University, Zhengzhou 450001, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, P. R. China
| |
Collapse
|
8
|
Long X, Zhang S, Wang Y, Chen J, Lu Y, Hou H, Lin B, Li X, Shen C, Yang R, Zhu H, Cui R, Cao D, Chen G, Wang D, Chen Y, Zhai S, Zeng Z, Wu S, Lou M, Chen J, Zou J, Zheng M, Qin J, Wang X. Targeting JMJD1C to selectively disrupt tumor T reg cell fitness enhances antitumor immunity. Nat Immunol 2024; 25:525-536. [PMID: 38356061 DOI: 10.1038/s41590-024-01746-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 01/09/2024] [Indexed: 02/16/2024]
Abstract
Regulatory T (Treg) cells are critical for immune tolerance but also form a barrier to antitumor immunity. As therapeutic strategies involving Treg cell depletion are limited by concurrent autoimmune disorders, identification of intratumoral Treg cell-specific regulatory mechanisms is needed for selective targeting. Epigenetic modulators can be targeted with small compounds, but intratumoral Treg cell-specific epigenetic regulators have been unexplored. Here, we show that JMJD1C, a histone demethylase upregulated by cytokines in the tumor microenvironment, is essential for tumor Treg cell fitness but dispensable for systemic immune homeostasis. JMJD1C deletion enhanced AKT signals in a manner dependent on histone H3 lysine 9 dimethylation (H3K9me2) demethylase and STAT3 signals independently of H3K9me2 demethylase, leading to robust interferon-γ production and tumor Treg cell fragility. We have also developed an oral JMJD1C inhibitor that suppresses tumor growth by targeting intratumoral Treg cells. Overall, this study identifies JMJD1C as an epigenetic hub that can integrate signals to establish tumor Treg cell fitness, and we present a specific JMJD1C inhibitor that can target tumor Treg cells without affecting systemic immune homeostasis.
Collapse
Affiliation(s)
- Xuehui Long
- Department of Immunology, Key Laboratory of Immune Microenvironment and Diseases, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi People's Hospital, Wuxi Medical Center, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Personalized Cancer Medicine, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Yuliang Wang
- Department of Immunology, Key Laboratory of Immune Microenvironment and Diseases, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi People's Hospital, Wuxi Medical Center, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Personalized Cancer Medicine, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Jingjing Chen
- Department of Immunology, Key Laboratory of Immune Microenvironment and Diseases, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi People's Hospital, Wuxi Medical Center, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Personalized Cancer Medicine, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Yanlai Lu
- Department of Immunology, Key Laboratory of Immune Microenvironment and Diseases, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi People's Hospital, Wuxi Medical Center, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Personalized Cancer Medicine, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Hui Hou
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Bichun Lin
- Department of Immunology, Key Laboratory of Immune Microenvironment and Diseases, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi People's Hospital, Wuxi Medical Center, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Personalized Cancer Medicine, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Chang Shen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Ruirui Yang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Huamin Zhu
- Department of Immunology, Key Laboratory of Immune Microenvironment and Diseases, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi People's Hospital, Wuxi Medical Center, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Personalized Cancer Medicine, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Rongrong Cui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Duanhua Cao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Geng Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Dan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Yun Chen
- Department of Immunology, Key Laboratory of Immune Microenvironment and Diseases, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi People's Hospital, Wuxi Medical Center, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Personalized Cancer Medicine, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Sulan Zhai
- Department of Immunology, Key Laboratory of Immune Microenvironment and Diseases, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi People's Hospital, Wuxi Medical Center, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Personalized Cancer Medicine, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Zhiqin Zeng
- Department of Immunology, Key Laboratory of Immune Microenvironment and Diseases, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi People's Hospital, Wuxi Medical Center, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Personalized Cancer Medicine, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Shusheng Wu
- Department of Immunology, Key Laboratory of Immune Microenvironment and Diseases, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi People's Hospital, Wuxi Medical Center, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Personalized Cancer Medicine, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Mengting Lou
- Department of Immunology, Key Laboratory of Immune Microenvironment and Diseases, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi People's Hospital, Wuxi Medical Center, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Personalized Cancer Medicine, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Junhong Chen
- Department of Immunology, Key Laboratory of Immune Microenvironment and Diseases, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi People's Hospital, Wuxi Medical Center, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Personalized Cancer Medicine, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Jian Zou
- Department of Laboratory Medicine, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi People's Hospital, Wuxi Medical Center, Nanjing Medical University, Wuxi, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China.
| | - Jun Qin
- CAS Key Laboratory of Tissue Microenvironment and Tumor, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
| | - Xiaoming Wang
- Department of Immunology, Key Laboratory of Immune Microenvironment and Diseases, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi People's Hospital, Wuxi Medical Center, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Personalized Cancer Medicine, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou School of Clinical Medicine, Nanjing Medical University, Nanjing, China.
| |
Collapse
|
9
|
Cai H, Shen C, Jian T, Zhang X, Chen T, Han X, Yang Z, Dang W, Hsieh CY, Kang Y, Pan P, Ji X, Song J, Hou T, Deng Y. CarsiDock: a deep learning paradigm for accurate protein-ligand docking and screening based on large-scale pre-training. Chem Sci 2024; 15:1449-1471. [PMID: 38274053 PMCID: PMC10806797 DOI: 10.1039/d3sc05552c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 12/18/2023] [Indexed: 01/27/2024] Open
Abstract
The expertise accumulated in deep neural network-based structure prediction has been widely transferred to the field of protein-ligand binding pose prediction, thus leading to the emergence of a variety of deep learning-guided docking models for predicting protein-ligand binding poses without relying on heavy sampling. However, their prediction accuracy and applicability are still far from satisfactory, partially due to the lack of protein-ligand binding complex data. To this end, we create a large-scale complex dataset containing ∼9 M protein-ligand docking complexes for pre-training, and propose CarsiDock, the first deep learning-guided docking approach that leverages pre-training of millions of predicted protein-ligand complexes. CarsiDock contains two main stages, i.e., a deep learning model for the prediction of protein-ligand atomic distance matrices, and a translation, rotation and torsion-guided geometry optimization procedure to reconstruct the matrices into a credible binding pose. The pre-training and multiple innovative architectural designs facilitate the dramatically improved docking accuracy of our approach over the baselines in terms of multiple docking scenarios, thereby contributing to its outstanding early recognition performance in several retrospective virtual screening campaigns. Further explorations demonstrate that CarsiDock can not only guarantee the topological reliability of the binding poses but also successfully reproduce the crucial interactions in crystalized structures, highlighting its superior applicability.
Collapse
Affiliation(s)
- Heng Cai
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Chao Shen
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tianye Jian
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tong Chen
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Xiaoqi Han
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Zhuo Yang
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Wei Dang
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Chang-Yu Hsieh
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xiangyang Ji
- Department of Automation, Tsinghua University Beijing 100084 China
| | - Jianfei Song
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Tingjun Hou
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| |
Collapse
|
10
|
Zhou H, Skolnick J. FRAGSITE2: A structure and fragment-based approach for virtual ligand screening. Protein Sci 2024; 33:e4869. [PMID: 38100293 PMCID: PMC10751727 DOI: 10.1002/pro.4869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 12/06/2023] [Accepted: 12/09/2023] [Indexed: 12/17/2023]
Abstract
Protein function annotation and drug discovery often involve finding small molecule binders. In the early stages of drug discovery, virtual ligand screening (VLS) is frequently applied to identify possible hits before experimental testing. While our recent ligand homology modeling (LHM)-machine learning VLS method FRAGSITE outperformed approaches that combined traditional docking to generate protein-ligand poses and deep learning scoring functions to rank ligands, a more robust approach that could identify a more diverse set of binding ligands is needed. Here, we describe FRAGSITE2 that shows significant improvement on protein targets lacking known small molecule binders and no confident LHM identified template ligands when benchmarked on two commonly used VLS datasets: For both the DUD-E set and DEKOIS2.0 set and ligands having a Tanimoto coefficient (TC) < 0.7 to the template ligands, the 1% enrichment factor (EF1% ) of FRAGSITE2 is significantly better than those for FINDSITEcomb2.0 , an earlier LHM algorithm. For the DUD-E set, FRAGSITE2 also shows better ROC enrichment factor and AUPR (area under the precision-recall curve) than the deep learning DenseFS scoring function. Comparison with the RF-score-VS on the 76 target subset of DEKOIS2.0 and a TC < 0.99 to training DUD-E ligands, FRAGSITE2 has double the EF1% . Its boosted tree regression method provides for more robust performance than a deep learning multiple layer perceptron method. When compared with the pretrained language model for protein target features, FRAGSITE2 also shows much better performance. Thus, FRAGSITE2 is a promising approach that can discover novel hits for protein targets. FRAGSITE2's web service is freely available to academic users at http://sites.gatech.edu/cssb/FRAGSITE2.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of TechnologyAtlantaGeorgiaUSA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of TechnologyAtlantaGeorgiaUSA
| |
Collapse
|
11
|
Libouban PY, Aci-Sèche S, Gómez-Tamayo JC, Tresadern G, Bonnet P. The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks. Int J Mol Sci 2023; 24:16120. [PMID: 38003312 PMCID: PMC10671244 DOI: 10.3390/ijms242216120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 11/26/2023] Open
Abstract
Artificial intelligence (AI) has gained significant traction in the field of drug discovery, with deep learning (DL) algorithms playing a crucial role in predicting protein-ligand binding affinities. Despite advancements in neural network architectures, system representation, and training techniques, the performance of DL affinity prediction has reached a plateau, prompting the question of whether it is truly solved or if the current performance is overly optimistic and reliant on biased, easily predictable data. Like other DL-related problems, this issue seems to stem from the training and test sets used when building the models. In this work, we investigate the impact of several parameters related to the input data on the performance of neural network affinity prediction models. Notably, we identify the size of the binding pocket as a critical factor influencing the performance of our statistical models; furthermore, it is more important to train a model with as much data as possible than to restrict the training to only high-quality datasets. Finally, we also confirm the bias in the typically used current test sets. Therefore, several types of evaluation and benchmarking are required to understand models' decision-making processes and accurately compare the performance of models.
Collapse
Affiliation(s)
- Pierre-Yves Libouban
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Samia Aci-Sèche
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Jose Carlos Gómez-Tamayo
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Gary Tresadern
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Pascal Bonnet
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| |
Collapse
|
12
|
Tran-Nguyen VK, Junaid M, Simeon S, Ballester PJ. A practical guide to machine-learning scoring for structure-based virtual screening. Nat Protoc 2023; 18:3460-3511. [PMID: 37845361 DOI: 10.1038/s41596-023-00885-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 07/03/2023] [Indexed: 10/18/2023]
Abstract
Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol , can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.
Collapse
Affiliation(s)
| | - Muhammad Junaid
- Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | - Saw Simeon
- Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | | |
Collapse
|
13
|
Wu K, Karapetyan E, Schloss J, Vadgama J, Wu Y. Advancements in small molecule drug design: A structural perspective. Drug Discov Today 2023; 28:103730. [PMID: 37536390 PMCID: PMC10543554 DOI: 10.1016/j.drudis.2023.103730] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Revised: 07/19/2023] [Accepted: 07/27/2023] [Indexed: 08/05/2023]
Abstract
In this review, we outline recent advancements in small molecule drug design from a structural perspective. We compare protein structure prediction methods and explore the role of the ligand binding pocket in structure-based drug design. We examine various structural features used to optimize drug candidates, including functional groups, stereochemistry, and molecular weight. Computational tools such as molecular docking and virtual screening are discussed for predicting and optimizing drug candidate structures. We present examples of drug candidates designed based on their molecular structure and discuss future directions in the field. By effectively integrating structural information with other valuable data sources, we can improve the drug discovery process, leading to the identification of novel therapeutics with improved efficacy, specificity, and safety profiles.
Collapse
Affiliation(s)
- Ke Wu
- Division of Cancer Research and Training, Department of Internal Medicine, Charles R. Drew University of Medicine and Science, David Geffen UCLA School of Medicine and UCLA Jonsson Comprehensive Cancer Center, Los Angeles, CA 90095, USA
| | - Eduard Karapetyan
- Division of Cancer Research and Training, Department of Internal Medicine, Charles R. Drew University of Medicine and Science, David Geffen UCLA School of Medicine and UCLA Jonsson Comprehensive Cancer Center, Los Angeles, CA 90095, USA
| | - John Schloss
- Division of Cancer Research and Training, Department of Internal Medicine, Charles R. Drew University of Medicine and Science, David Geffen UCLA School of Medicine and UCLA Jonsson Comprehensive Cancer Center, Los Angeles, CA 90095, USA; School of Pharmacy, American University of Health Sciences, Signal Hill, CA 90755, USA
| | - Jaydutt Vadgama
- Division of Cancer Research and Training, Department of Internal Medicine, Charles R. Drew University of Medicine and Science, David Geffen UCLA School of Medicine and UCLA Jonsson Comprehensive Cancer Center, Los Angeles, CA 90095, USA; School of Pharmacy, American University of Health Sciences, Signal Hill, CA 90755, USA.
| | - Yong Wu
- Division of Cancer Research and Training, Department of Internal Medicine, Charles R. Drew University of Medicine and Science, David Geffen UCLA School of Medicine and UCLA Jonsson Comprehensive Cancer Center, Los Angeles, CA 90095, USA.
| |
Collapse
|
14
|
Zhang X, Shen C, Wang T, Deng Y, Kang Y, Li D, Hou T, Pan P. ML-PLIC: a web platform for characterizing protein-ligand interactions and developing machine learning-based scoring functions. Brief Bioinform 2023; 24:bbad295. [PMID: 37738401 DOI: 10.1093/bib/bbad295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 07/17/2023] [Accepted: 07/31/2023] [Indexed: 09/24/2023] Open
Abstract
Cracking the entangling code of protein-ligand interaction (PLI) is of great importance to structure-based drug design and discovery. Different physical and biochemical representations can be used to describe PLI such as energy terms and interaction fingerprints, which can be analyzed by machine learning (ML) algorithms to create ML-based scoring functions (MLSFs). Here, we propose the ML-based PLI capturer (ML-PLIC), a web platform that automatically characterizes PLI and generates MLSFs to identify the potential binders of a specific protein target through virtual screening (VS). ML-PLIC comprises five modules, including Docking for ligand docking, Descriptors for PLI generation, Modeling for MLSF training, Screening for VS and Pipeline for the integration of the aforementioned functions. We validated the MLSFs constructed by ML-PLIC in three benchmark datasets (Directory of Useful Decoys-Enhanced, Active as Decoys and TocoDecoy), demonstrating accuracy outperforming traditional docking tools and competitive performance to the deep learning-based SF, and provided a case study of the Serine/threonine-protein kinase WEE1 in which MLSFs were developed by using the ML-based VS pipeline in ML-PLIC. Underpinning the latest version of ML-PLIC is a powerful platform that incorporates physical and biological knowledge about PLI, leveraging PLI characterization and MLSF generation into the design of structure-based VS pipeline. The ML-PLIC web platform is now freely available at http://cadd.zju.edu.cn/plic/.
Collapse
Affiliation(s)
- Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Tianyue Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
15
|
Zhang X, Zhang O, Shen C, Qu W, Chen S, Cao H, Kang Y, Wang Z, Wang E, Zhang J, Deng Y, Liu F, Wang T, Du H, Wang L, Pan P, Chen G, Hsieh CY, Hou T. Efficient and accurate large library ligand docking with KarmaDock. NATURE COMPUTATIONAL SCIENCE 2023; 3:789-804. [PMID: 38177786 DOI: 10.1038/s43588-023-00511-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 08/08/2023] [Indexed: 01/06/2024]
Abstract
Ligand docking is one of the core technologies in structure-based virtual screening for drug discovery. However, conventional docking tools and existing deep learning tools may suffer from limited performance in terms of speed, pose quality and binding affinity accuracy. Here we propose KarmaDock, a deep learning approach for ligand docking that integrates the functions of docking acceleration, binding pose generation and correction, and binding strength estimation. The three-stage model consists of the following components: (1) encoders for the protein and ligand to learn the representations of intramolecular interactions; (2) E(n) equivariant graph neural networks with self-attention to update the ligand pose based on both protein-ligand and intramolecular interactions, followed by post-processing to ensure chemically plausible structures; (3) a mixture density network for scoring the binding strength. KarmaDock was validated on four benchmark datasets and tested in a real-world virtual screening project that successfully identified experiment-validated active inhibitors of leukocyte tyrosine kinase (LTK).
Collapse
Affiliation(s)
- Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Odin Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Wanglin Qu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Shicheng Chen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Hanqun Cao
- Department of Mathematics, Chinese University of Hong Kong, Hong Kong, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | | | - Jintu Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou, Zhejiang, China
| | - Furui Liu
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Tianyue Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Langcheng Wang
- Department of Pathology, New York University Medical Center, New York, NY, USA
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China.
| | | | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China.
| |
Collapse
|
16
|
Shen C, Zhang X, Hsieh CY, Deng Y, Wang D, Xu L, Wu J, Li D, Kang Y, Hou T, Pan P. A generalized protein-ligand scoring framework with balanced scoring, docking, ranking and screening powers. Chem Sci 2023; 14:8129-8146. [PMID: 37538816 PMCID: PMC10395315 DOI: 10.1039/d3sc02044d] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 07/03/2023] [Indexed: 08/05/2023] Open
Abstract
Applying machine learning algorithms to protein-ligand scoring functions has aroused widespread attention in recent years due to the high predictive accuracy and affordable computational cost. Nevertheless, most machine learning-based scoring functions are only applicable to a specific task, e.g., binding affinity prediction, binding pose prediction or virtual screening, suggesting that the development of a scoring function with balanced performance in all critical tasks remains a grand challenge. To this end, we propose a novel parameterization strategy by introducing an adjustable binding affinity term that represents the correlation between the predicted outcomes and experimental data into the training of mixture density network. The resulting residue-atom distance likelihood potential not only retains the superior docking and screening power over all the other state-of-the-art approaches, but also achieves a remarkable improvement in scoring and ranking performance. We emphatically explore the impacts of several key elements on prediction accuracy as well as the task preference, and demonstrate that the performance of scoring/ranking and docking/screening tasks of a certain model could be well balanced through an appropriate manner. Overall, our study highlights the potential utility of our innovative parameterization strategy as well as the resulting scoring framework in future structure-based drug design.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- State Key Lab of CAD&CG, Zhejiang University Hangzhou 310058 Zhejiang China
- School of Public Health, Zhejiang University Hangzhou 310058 Zhejiang China
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology Changzhou 213001 China
| | - Jian Wu
- School of Public Health, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- State Key Lab of CAD&CG, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
17
|
Chen L, Fan Z, Chang J, Yang R, Hou H, Guo H, Zhang Y, Yang T, Zhou C, Sui Q, Chen Z, Zheng C, Hao X, Zhang K, Cui R, Zhang Z, Ma H, Ding Y, Zhang N, Lu X, Luo X, Jiang H, Zhang S, Zheng M. Sequence-based drug design as a concept in computational drug design. Nat Commun 2023; 14:4217. [PMID: 37452028 PMCID: PMC10349078 DOI: 10.1038/s41467-023-39856-w] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 06/27/2023] [Indexed: 07/18/2023] Open
Abstract
Drug development based on target proteins has been a successful approach in recent decades. However, the conventional structure-based drug design (SBDD) pipeline is a complex, human-engineered process with multiple independently optimized steps. Here, we propose a sequence-to-drug concept for computational drug design based on protein sequence information by end-to-end differentiable learning. We validate this concept in three stages. First, we design TransformerCPI2.0 as a core tool for the concept, which demonstrates generalization ability across proteins and compounds. Second, we interpret the binding knowledge that TransformerCPI2.0 learned. Finally, we use TransformerCPI2.0 to discover new hits for challenging drug targets, and identify new target for an existing drug based on an inverse application of the concept. Overall, this proof-of-concept study shows that the sequence-to-drug concept adds a perspective on drug design. It can serve as an alternative method to SBDD, particularly for proteins that do not yet have high-quality 3D structures available.
Collapse
Affiliation(s)
- Lifan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Zisheng Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Shanghai, 200031, China
| | - Jie Chang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
| | - Ruirui Yang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Shanghai, 200031, China
| | - Hui Hou
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Hao Guo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Yinghui Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Tianbiao Yang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Chenmao Zhou
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
| | - Qibang Sui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Zhengyang Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Chen Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xinyue Hao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
| | - Keke Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
| | - Rongrong Cui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Zehong Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Hudson Ma
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Yiluan Ding
- Department of Analytical Chemistry, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Naixia Zhang
- Department of Analytical Chemistry, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xiaojie Lu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Shanghai, 200031, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, 1 Sub-lane Xiangshan, Hangzhou, 310024, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China.
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Shanghai, 200031, China.
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, 1 Sub-lane Xiangshan, Hangzhou, 310024, China.
| |
Collapse
|
18
|
Lu R, Wang J, Li P, Li Y, Tan S, Pan Y, Liu H, Gao P, Xie G, Yao X. Improving drug-target affinity prediction via feature fusion and knowledge distillation. Brief Bioinform 2023; 24:7142721. [PMID: 37099690 DOI: 10.1093/bib/bbad145] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 02/15/2023] [Accepted: 03/27/2023] [Indexed: 04/28/2023] Open
Abstract
Rapid and accurate prediction of drug-target affinity can accelerate and improve the drug discovery process. Recent studies show that deep learning models may have the potential to provide fast and accurate drug-target affinity prediction. However, the existing deep learning models still have their own disadvantages that make it difficult to complete the task satisfactorily. Complex-based models rely heavily on the time-consuming docking process, and complex-free models lacks interpretability. In this study, we introduced a novel knowledge-distillation insights drug-target affinity prediction model with feature fusion inputs to make fast, accurate and explainable predictions. We benchmarked the model on public affinity prediction and virtual screening dataset. The results show that it outperformed previous state-of-the-art models and achieved comparable performance to previous complex-based models. Finally, we study the interpretability of this model through visualization and find it can provide meaningful explanations for pairwise interaction. We believe this model can further improve the drug-target affinity prediction for its higher accuracy and reliable interpretability.
Collapse
Affiliation(s)
- Ruiqiang Lu
- College of Chemistry and Chemical Engineering, Lanzhou University, 730000 Gansu, China
- Ping An Healthcare Technology, 100027 Beijing, China
| | - Jun Wang
- Ping An Healthcare Technology, 100027 Beijing, China
| | - Pengyong Li
- School of Computer Science and Technology, Xidian University, 710126 Shaanxi, China
| | - Yuquan Li
- College of Chemistry and Chemical Engineering, Lanzhou University, 730000 Gansu, China
| | - Shuoyan Tan
- College of Chemistry and Chemical Engineering, Lanzhou University, 730000 Gansu, China
- Ping An Healthcare Technology, 100027 Beijing, China
| | - Yiting Pan
- College of Chemistry and Chemical Engineering, Lanzhou University, 730000 Gansu, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, 999078 Macau, China
| | - Peng Gao
- Ping An Healthcare Technology, 100027 Beijing, China
| | - Guotong Xie
- Ping An Healthcare Technology, 100027 Beijing, China
- Ping An Health Cloud Company Limited, 100027 Beijing, China
- Ping An International Smart City Technology Co., Ltd., 100027 Beijing, China
| | - Xiaojun Yao
- College of Chemistry and Chemical Engineering, Lanzhou University, 730000 Gansu, China
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, 999078 Macau, China
| |
Collapse
|
19
|
Tran-Nguyen VK, Ballester PJ. Beware of Simple Methods for Structure-Based Virtual Screening: The Critical Importance of Broader Comparisons. J Chem Inf Model 2023; 63:1401-1405. [PMID: 36848585 PMCID: PMC10015451 DOI: 10.1021/acs.jcim.3c00218] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
We discuss how data unbiasing and simple methods such as protein-ligand Interaction FingerPrint (IFP) can overestimate virtual screening performance. We also show that IFP is strongly outperformed by target-specific machine-learning scoring functions, which were not considered in a recent report concluding that simple methods were better than machine-learning scoring functions at virtual screening.
Collapse
Affiliation(s)
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, U.K
| |
Collapse
|
20
|
Jiang D, Ye Z, Hsieh CY, Yang Z, Zhang X, Kang Y, Du H, Wu Z, Wang J, Zeng Y, Zhang H, Wang X, Wang M, Yao X, Zhang S, Wu J, Hou T. MetalProGNet: a structure-based deep graph model for metalloprotein-ligand interaction predictions. Chem Sci 2023; 14:2054-2069. [PMID: 36845922 PMCID: PMC9945430 DOI: 10.1039/d2sc06576b] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 01/11/2023] [Indexed: 01/21/2023] Open
Abstract
Metalloproteins play indispensable roles in various biological processes ranging from reaction catalysis to free radical scavenging, and they are also pertinent to numerous pathologies including cancer, HIV infection, neurodegeneration, and inflammation. Discovery of high-affinity ligands for metalloproteins powers the treatment of these pathologies. Extensive efforts have been made to develop in silico approaches, such as molecular docking and machine learning (ML)-based models, for fast identification of ligands binding to heterogeneous proteins, but few of them have exclusively concentrated on metalloproteins. In this study, we first compiled the largest metalloprotein-ligand complex dataset containing 3079 high-quality structures, and systematically evaluated the scoring and docking powers of three competitive docking tools (i.e., PLANTS, AutoDock Vina and Glide SP) for metalloproteins. Then, a structure-based deep graph model called MetalProGNet was developed to predict metalloprotein-ligand interactions. In the model, the coordination interactions between metal ions and protein atoms and the interactions between metal ions and ligand atoms were explicitly modelled through graph convolution. The binding features were then predicted by the informative molecular binding vector learned from a noncovalent atom-atom interaction network. The evaluation on the internal metalloprotein test set, the independent ChEMBL dataset towards 22 different metalloproteins and the virtual screening dataset indicated that MetalProGNet outperformed various baselines. Finally, a noncovalent atom-atom interaction masking technique was employed to interpret MetalProGNet, and the learned knowledge accords with our understanding of physics.
Collapse
Affiliation(s)
- Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China .,Tencent Quantum Laboratory, Tencent Shenzhen 518057 Guangdong China .,College of Computer Science and Technology, Zhejiang University Hangzhou 310006 Zhejiang China
| | - Zhaofeng Ye
- Tencent Quantum Laboratory, Tencent Shenzhen 518057 Guangdong China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Ziyi Yang
- Tencent Quantum Laboratory, Tencent Shenzhen 518057 Guangdong China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yundian Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Haotian Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xiaorui Wang
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and TechnologyMacao
| | - Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and TechnologyMacao
| | - Shengyu Zhang
- Tencent Quantum Laboratory, Tencent Shenzhen 518057 Guangdong China
| | - Jian Wu
- College of Computer Science and Technology, Zhejiang University Hangzhou 310006 Zhejiang China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
21
|
Hassan HHA, Ismail MI, Abourehab MAS, Boeckler FM, Ibrahim TM, Arafa RK. In Silico Targeting of Fascin Protein for Cancer Therapy: Benchmarking, Virtual Screening and Molecular Dynamics Approaches. Molecules 2023; 28:molecules28031296. [PMID: 36770963 PMCID: PMC9921211 DOI: 10.3390/molecules28031296] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 01/12/2023] [Accepted: 01/15/2023] [Indexed: 01/31/2023] Open
Abstract
Fascin is an actin-bundling protein overexpressed in various invasive metastatic carcinomas through promoting cell migration and invasion. Therefore, blocking Fascin binding sites is considered a vital target for antimetastatic drugs. This inspired us to find new Fascin binding site blockers. First, we built an active compound set by collecting reported small molecules binding to Fascin's binding site 2. Consequently, a high-quality decoys set was generated employing DEKOIS 2.0 protocol to be applied in conducting the benchmarking analysis against the selected Fascin structures. Four docking programs, MOE, AutoDock Vina, VinaXB, and PLANTS were evaluated in the benchmarking study. All tools indicated better-than-random performance reflected by their pROC-AUC values against the Fascin crystal structure (PDB: ID 6I18). Interestingly, PLANTS exhibited the best screening performance and recognized potent actives at early enrichment. Accordingly, PLANTS was utilized in the prospective virtual screening effort for repurposing FDA-approved drugs (DrugBank database) and natural products (NANPDB). Further assessment via molecular dynamics simulations for 100 ns endorsed Remdesivir (DrugBank) and NANPDB3 (NANPDB) as potential binders to Fascin binding site 2. In conclusion, this study delivers a model for implementing a customized DEKOIS 2.0 benchmark set to enhance the VS success rate against new potential targets for cancer therapies.
Collapse
Affiliation(s)
- Heba H. A. Hassan
- Drug Design and Discovery Laboratory, Zewail City of Science and Technology, October Gardens, 6th of October City, Giza 12578, Egypt
| | - Muhammad I. Ismail
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, The British University in Egypt, Al-Sherouk City, Cairo-Suez Desert Road, Cairo 11837, Egypt
| | - Mohammed A. S. Abourehab
- Department of Pharmaceutics, College of Pharmacy, Umm Al-Qura University, Makkah 21955, Saudi Arabia
| | - Frank M. Boeckler
- Lab for Molecular Design and Pharmaceutical Biophysics, Department of Pharmacy and Biochemistry, Institute of Pharmaceutical Sciences, University of Tübingen, Auf der Morgenstelle 8, 72076 Tübingen, Germany
| | - Tamer M. Ibrahim
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Kafrelsheikh University, Kafrelsheikh 33516, Egypt
- Correspondence: or (T.M.I.); (R.K.A.)
| | - Reem K. Arafa
- Drug Design and Discovery Laboratory, Zewail City of Science and Technology, October Gardens, 6th of October City, Giza 12578, Egypt
- Biomedical Sciences Program, University of Science and Technology, Zewail City of Science and Technology, October Gardens, 6th of October City, Giza 12578, Egypt
- Correspondence: or (T.M.I.); (R.K.A.)
| |
Collapse
|
22
|
Wang L, Shi SH, Li H, Zeng XX, Liu SY, Liu ZQ, Deng YF, Lu AP, Hou TJ, Cao DS. Reducing false positive rate of docking-based virtual screening by active learning. Brief Bioinform 2023; 24:6987822. [PMID: 36642412 DOI: 10.1093/bib/bbac626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/10/2022] [Accepted: 12/20/2022] [Indexed: 01/17/2023] Open
Abstract
Machine learning-based scoring functions (MLSFs) have become a very favorable alternative to classical scoring functions because of their potential superior screening performance. However, the information of negative data used to construct MLSFs was rarely reported in the literature, and meanwhile the putative inactive molecules recorded in existing databases usually have obvious bias from active molecules. Here we proposed an easy-to-use method named AMLSF that combines active learning using negative molecular selection strategies with MLSF, which can iteratively improve the quality of inactive sets and thus reduce the false positive rate of virtual screening. We chose energy auxiliary terms learning as the MLSF and validated our method on eight targets in the diverse subset of DUD-E. For each target, we screened the IterBioScreen database by AMLSF and compared the screening results with those of the four control models. The results illustrate that the number of active molecules in the top 1000 molecules identified by AMLSF was significantly higher than those identified by the control models. In addition, the free energy calculation results for the top 10 molecules screened out by the AMLSF, null model and control models based on DUD-E also proved that more active molecules can be identified, and the false positive rate can be reduced by AMLSF.
Collapse
Affiliation(s)
- Lei Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China
| | - Shao-Hua Shi
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
| | - Hui Li
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China
| | - Xiang-Xiang Zeng
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China
| | - Su-You Liu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China
| | - Zhao-Qian Liu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China
| | - Ya-Feng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China.,Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
| |
Collapse
|
23
|
Pinel P, Guichaoua G, Najm M, Labouille S, Drizard N, Gaston-Mathé Y, Hoffmann B, Stoven V. Exploring isofunctional molecules: Design of a benchmark and evaluation of prediction performance. Mol Inform 2023; 42:e2200216. [PMID: 36633361 DOI: 10.1002/minf.202200216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 12/19/2022] [Accepted: 01/11/2023] [Indexed: 01/13/2023]
Abstract
Identification of novel chemotypes with biological activity similar to a known active molecule is an important challenge in drug discovery called 'scaffold hopping'. Small-, medium-, and large-step scaffold hopping efforts may lead to increasing degrees of chemical structure novelty with respect to the parent compound. In the present paper, we focus on the problem of large-step scaffold hopping. We assembled a high quality and well characterized dataset of scaffold hopping examples comprising pairs of active molecules and including a variety of protein targets. This dataset was used to build a benchmark corresponding to the setting of real-life applications: one active molecule is known, and the second active is searched among a set of decoys chosen in a way to avoid statistical bias. This allowed us to evaluate the performance of computational methods for solving large-step scaffold hopping problems. In particular, we assessed how difficult these problems are, particularly for classical 2D and 3D ligand-based methods. We also showed that a machine-learning chemogenomic algorithm outperforms classical methods and we provided some useful hints for future improvements.
Collapse
Affiliation(s)
- Philippe Pinel
- Center for Computational Biology, Mines Paris-PSL, PSL Research University, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75428, Paris, France.,Iktos SAS, 75017, Paris, France
| | - Gwenn Guichaoua
- Center for Computational Biology, Mines Paris-PSL, PSL Research University, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75428, Paris, France
| | - Matthieu Najm
- Center for Computational Biology, Mines Paris-PSL, PSL Research University, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75428, Paris, France
| | | | | | | | | | - Véronique Stoven
- Center for Computational Biology, Mines Paris-PSL, PSL Research University, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75428, Paris, France
| |
Collapse
|
24
|
Thangavel N, Albratty M. Benchmarked molecular docking integrated molecular dynamics stability analysis for prediction of SARS-CoV-2 papain-like protease inhibition by olive secoiridoids. JOURNAL OF KING SAUD UNIVERSITY. SCIENCE 2023; 35:102402. [PMID: 36338939 PMCID: PMC9617799 DOI: 10.1016/j.jksus.2022.102402] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 09/23/2022] [Accepted: 10/24/2022] [Indexed: 05/28/2023]
Abstract
Objectives We performed a virtual screening of olive secoiridoids of the OliveNetTM library to predict SARS-CoV-2 PLpro inhibition. Benchmarked molecular docking protocol that evaluated the performance of two docking programs was applied to execute virtual screening. Molecular dynamics stability analysis of the top-ranked olive secoiridoid docked to PLpro was also carried out. Methods Benchmarking virtual screening used two freely available docking programs, AutoDock Vina 1.1.2. and AutoDock 4.2.1. for molecular docking of olive secoiridoids to a single PLpro structure. Screening also included benchmark structures of known active and decoy molecules from the DEKOIS 2.0 library. Based on the predicted binding energies, the docking programs ranked the screened molecules. We applied the usual performance evaluation metrices to evaluate the docking programs using the predicted ranks. Molecular dynamics of the top-ranked olive secoiridoid bound to PLpro and computation of MM-GBSA energy using three iterations during the last 50 ps of the analysis of the dynamics in Desmond supported the stability prediction. Results and discussions Predictiveness curves suggested that AutoDock Vina has a better predictive ability than AutoDock, although there was a moderate correlation between the active molecules rankings (Kendall's correlation of rank (τ) = 0.581). Interestingly, two same molecules, Demethyloleuropein aglycone, and Oleuroside enriched the top 1 % ranked olive secoiridoids predicted by both programs. Demethyloleuropein aglycone bound to PLpro obtained by docking in AutoDock Vina when analyzed for stability by molecular dynamics simulation for 50 ns displayed an RMSD, RMSF<2 Å, and MM-GBSA energy of -94.54 ± 6.05 kcal/mol indicating good stability. Molecular dynamics also revealed the interactions of Demethyloleuropein aglycone with binding sites 2 and 3 of PLpro, suggesting a potent inhibition. In addition, for 98 % of the simulation time, two phenolic hydroxy groups of Demethyloleuropein aglycone maintained two hydrogen bonds with Asp302 of PLpro, specifying the significance of the groups in receptor binding. Conclusion AutoDock Vina retrieved the active molecules accurately and predicted Demethyloleuropein aglycone as the best inhibitor of PLpro. The Arabian diet consisting of olive products rich in secoiridoids benefits from the PLpro inhibition property and reduces the risk of viral infection.
Collapse
Key Words
- AD, AutoDock 4.2.1
- ADV, AutoDock Vina 1.1.2
- BEDROC, Boltzmann enhanced discrimination of ROC
- Benchmarking docking
- DEKOIS, Demanding evaluation kits for objective in-silico screening
- EF, Enrichment factor
- M, Moles
- MD, Molecular dynamics
- MM-GBSA, Molecular mechanics generalized Born surface area
- MW, Molecular weight
- Molecular docking
- Molecular dynamics
- OS, Olive secoiridoids
- Olive secoiridoids
- PC, Predictiveness curve
- PLpro
- PLpro, Papain-like protease
- RIE, Robust initial enhancement
- RMSD, Root mean square deviation
- RMSF, Root mean square fluctuation
- ROC, Receiver operating characteristic curve
- ROC-AUC, Area under ROC
- SARS-CoV-2
- SARS-CoV-2, Severe acute respiratory syndrome coronavirus-2
- TG, Total gain
- g/mol, Grams/mole
- kcal/mol, Kilocalorie/mole
- ns, nanoseconds
- pAUC, partial area under ROC
- pTG, Partial total gain
- ps, picoseconds
Collapse
Affiliation(s)
- Neelaveni Thangavel
- Department of Pharmaceutical Chemistry & Pharmacognosy, College of Pharmacy, Jazan University, Jazan, Saudi Arabia
| | - Mohammed Albratty
- Department of Pharmaceutical Chemistry & Pharmacognosy, College of Pharmacy, Jazan University, Jazan, Saudi Arabia
| |
Collapse
|
25
|
Blanes-Mira C, Fernández-Aguado P, de Andrés-López J, Fernández-Carvajal A, Ferrer-Montiel A, Fernández-Ballester G. Comprehensive Survey of Consensus Docking for High-Throughput Virtual Screening. Molecules 2022; 28:molecules28010175. [PMID: 36615367 PMCID: PMC9821981 DOI: 10.3390/molecules28010175] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/19/2022] [Accepted: 12/21/2022] [Indexed: 12/28/2022] Open
Abstract
The rapid advances of 3D techniques for the structural determination of proteins and the development of numerous computational methods and strategies have led to identifying highly active compounds in computer drug design. Molecular docking is a method widely used in high-throughput virtual screening campaigns to filter potential ligands targeted to proteins. A great variety of docking programs are currently available, which differ in the algorithms and approaches used to predict the binding mode and the affinity of the ligand. All programs heavily rely on scoring functions to accurately predict ligand binding affinity, and despite differences in performance, none of these docking programs is preferable to the others. To overcome this problem, consensus scoring methods improve the outcome of virtual screening by averaging the rank or score of individual molecules obtained from different docking programs. The successful application of consensus docking in high-throughput virtual screening highlights the need to optimize the predictive power of molecular docking methods.
Collapse
|
26
|
Chang Y, Hawkins BA, Du JJ, Groundwater PW, Hibbs DE, Lai F. A Guide to In Silico Drug Design. Pharmaceutics 2022; 15:pharmaceutics15010049. [PMID: 36678678 PMCID: PMC9867171 DOI: 10.3390/pharmaceutics15010049] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/16/2022] [Accepted: 12/17/2022] [Indexed: 12/28/2022] Open
Abstract
The drug discovery process is a rocky path that is full of challenges, with the result that very few candidates progress from hit compound to a commercially available product, often due to factors, such as poor binding affinity, off-target effects, or physicochemical properties, such as solubility or stability. This process is further complicated by high research and development costs and time requirements. It is thus important to optimise every step of the process in order to maximise the chances of success. As a result of the recent advancements in computer power and technology, computer-aided drug design (CADD) has become an integral part of modern drug discovery to guide and accelerate the process. In this review, we present an overview of the important CADD methods and applications, such as in silico structure prediction, refinement, modelling and target validation, that are commonly used in this area.
Collapse
Affiliation(s)
- Yiqun Chang
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Bryson A. Hawkins
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Jonathan J. Du
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Paul W. Groundwater
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - David E. Hibbs
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Felcia Lai
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
- Correspondence:
| |
Collapse
|
27
|
miDruglikeness: Subdivisional Drug-Likeness Prediction Models Using Active Ensemble Learning Strategies. Biomolecules 2022; 13:biom13010029. [PMID: 36671415 PMCID: PMC9855665 DOI: 10.3390/biom13010029] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 12/17/2022] [Accepted: 12/20/2022] [Indexed: 12/24/2022] Open
Abstract
The drug development pipeline involves several stages including in vitro assays, in vivo assays, and clinical trials. For candidate selection, it is important to consider that a compound will successfully pass through these stages. Using graph neural networks, we developed three subdivisional models to individually predict the capacity of a compound to enter in vivo testing, clinical trials, and market approval stages. Furthermore, we proposed a strategy combing both active learning and ensemble learning to improve the quality of the models. The models achieved satisfactory performance in the internal test datasets and four self-collected external test datasets. We also employed the models as a general index to make an evaluation on a widely known benchmark dataset DEKOIS 2.0, and surprisingly found a powerful ability on virtual screening tasks. Our model system (termed as miDruglikeness) provides a comprehensive drug-likeness prediction tool for drug discovery and development.
Collapse
|
28
|
Pharmacophore model-aided virtual screening combined with comparative molecular docking and molecular dynamics for identification of marine natural products as SARS-CoV-2 papain-like protease inhibitors. ARAB J CHEM 2022; 15:104334. [PMID: 36246784 PMCID: PMC9554199 DOI: 10.1016/j.arabjc.2022.104334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 10/03/2022] [Indexed: 11/24/2022] Open
Abstract
Targeting SARS-CoV-2 papain-like protease using inhibitors is a suitable approach for inhibition of virus replication and dysregulation of host anti-viral immunity. Engaging all five binding sites far from the catalytic site of PLpro is essential for developing a potent inhibitor. We developed and validated a structure-based pharmacophore model with 9 features of a potent PLpro inhibitor. The pharmacophore model-aided virtual screening of the comprehensive marine natural product database predicted 66 initial hits. This hit library was downsized by filtration through a molecular weight filter of ≤ 500 g/mol. The 50 resultant hits were screened by comparative molecular docking using AutoDock and AutoDock Vina. Comparative molecular docking enables benchmarking docking and relieves the disparities in the search and scoring functions of docking engines. Both docking engines retrieved 3 same compounds at different positions in the top 1 % rank, hence consensus scoring was applied, through which CMNPD28766, aspergillipeptide F emerged as the best PLpro inhibitor. Aspergillipeptide F topped the 50-hit library with a pharmacophore-fit score of 75.916. Favorable binding interactions were predicted between aspergillipeptide F and PLpro similar to the native ligand XR8-24. Aspergillipeptide F was able to engage all the 5 binding sites including the newly discovered BL2 groove, site V. Molecular dynamics for quantification of Cα-atom movements of PLpro after ligand binding indicated that it exhibits highly correlated domain movements contributing to the low free energy of binding and a stable conformation. Thus, aspergillipeptide F is a promising candidate for pharmaceutical and clinical development as a potent SARS-CoV-2 PLpro inhibitor.
Collapse
Key Words
- CMNPD, comprehensive marine natural product database
- Consensus scoring
- DCCM, dynamic cross-correlation matrix
- H, hydrophobic
- HBA, hydrogen bond acceptor
- HBD, hydrogen bond donor
- MD, molecular dynamics
- MMGBSA, molecular mechanics generalized Born and surface area continuum solvation
- MW, molecular weight
- Marine natural products
- Molecular docking
- Molecular dynamics
- PCA, principal component analysis
- PI, positive ionization
- PLpro, SARS-CoV-2 papain-like protease
- Pharmacophore model
- SARS-CoV-2 PLpro
- TG, Total gain
- ns, nanoseconds
- ps, picoseconds
Collapse
|
29
|
Domiati SA, Abd El Galil KH, Abourehab MAS, Ibrahim TM, Ragab HM. Structure-guided approach on the role of substitution on amide-linked bipyrazoles and its effect on their anti-inflammatory activity. J Enzyme Inhib Med Chem 2022; 37:2179-2190. [PMID: 35950562 PMCID: PMC9377232 DOI: 10.1080/14756366.2022.2109025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
Abstract
A structure-guided modelling approach using COX-2 as a template was used to investigate the effect of replacing the chloro atom located at the chlorophenyl ring of amide-linked bipyrazole moieties, aiming at attaining better anti-inflammatory effect with a good safety profile. Bromo, fluoro, nitro, and methyl groups were revealed to be ideal candidates. Consequently, new bipyrazole derivatives were synthesised. The in vitro inhibitory COX-1/COX-2 activity of the synthesised compounds exhibited promising selectivity. The fluoro and methyl derivatives were the most active candidates. The in vivo formalin-induced paw edoema model confirmed the anti-inflammatory activity of the synthesised compounds. All the tested derivatives had a good ulcerogenic safety profile except for the methyl substituted compound. In silico molecular dynamics simulations of the fluoro and methyl poses complexed with COX-2 for 50 ns indicated stable binding to COX-2. Generally, our approach delivers a fruitful matrix for the development of further amide-linked bipyrazole anti-inflammatory candidates.
Collapse
Affiliation(s)
- Souraya A Domiati
- Department of Pharmacology and Therapeutics, Faculty of Pharmacy, Beirut Arab University, Beirut, Lebanon
| | - Khaled H Abd El Galil
- Department of Pharmaceutical Sciences, Faculty of Pharmacy, Beirut Arab University, Beirut, Lebanon.,Department of Microbiology and Immunology, Faculty of Pharmacy, Mansoura University
| | - Mohammed A S Abourehab
- Department of Pharmaceutics College of Pharmacy, Umm Al-Qura University, Makkah, Saudi Arabia.,Department of Pharmaceutics and Industrial Pharmacy, Faculty of Pharmacy, Minia University, Minia, Egypt
| | - Tamer M Ibrahim
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Kafrelsheikh University, Kafrelsheikh, Egypt
| | - Hanan M Ragab
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Alexandria University, Alexandria, Egypt
| |
Collapse
|
30
|
Li Y, Zhou D, Zheng G, Li X, Wu D, Yuan Y. DyScore: A Boosting Scoring Method with Dynamic Properties for Identifying True Binders and Nonbinders in Structure-Based Drug Discovery. J Chem Inf Model 2022; 62:5550-5567. [PMID: 36327102 PMCID: PMC9983328 DOI: 10.1021/acs.jcim.2c00926] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The accurate prediction of protein-ligand binding affinity is critical for the success of computer-aided drug discovery. However, the accuracy of current scoring functions is usually unsatisfactory due to their rough approximation or sometimes even omittance of many factors involved in protein-ligand binding. For instance, the intrinsic dynamics of the protein-ligand binding state is usually disregarded in scoring function because these rapid binding affinity prediction approaches are only based on a representative complex structure of the protein and ligand in the binding state. That is, the dynamic protein-ligand binding complex ensembles are simplified as a static snapshot in calculation. In this study, two novel features were proposed for characterizing the dynamic properties of protein-ligand binding based on the static structure of the complex, which is expected to be a valuable complement to the current scoring functions. The two features demonstrate the geometry-shape matching between a protein and a ligand as well as the dynamic stability of protein-ligand binding. We further combined these two novel features with several classical scoring functions to develop a binary classification model called DyScore that uses the Extreme Gradient Boosting algorithm to classify compound poses as binders or non-binders. We have found that DyScore achieves state-of-the-art performance in distinguishing active and decoy ligands on both enhanced DUD data set and external test sets with both proposed novel features showing significant contributions to the improved performance. Especially, DyScore exhibits superior performance on early recognition, a crucial requirement for success in virtual screening and de novo drug design. The standalone version of DyScore and Dyscore-MF are freely available to all at: https://github.com/YanjunLi-CS/dyscore.
Collapse
Affiliation(s)
- Yanjun Li
- NSF Center for Big Learning, University of Florida, Gainesville, Florida 32611, United States; Baidu Research USA, Sunnyvale, California 94089, United States
| | - Daohong Zhou
- Department of Pharmacodynamics, Univerity of Florida, Gainesville, Florida 32611, United States
| | - Guangrong Zheng
- Department of Medicinal Chemistry, University of Florida, Gainesville, Florida 32611, United States
| | - Xiaolin Li
- Cognization Lab, Palo Alto, California 94306, United States
| | - Dapeng Wu
- NSF Center for Big Learning, University of Florida, Gainesville, Florida 32611, United States
| | - Yaxia Yuan
- Department of Pharmacodynamics, Univerity of Florida, Gainesville, Florida 32611, United States
| |
Collapse
|
31
|
Shen C, Zhang X, Deng Y, Gao J, Wang D, Xu L, Pan P, Hou T, Kang Y. Boosting Protein-Ligand Binding Pose Prediction and Virtual Screening Based on Residue-Atom Distance Likelihood Potential and Graph Transformer. J Med Chem 2022; 65:10691-10706. [PMID: 35917397 DOI: 10.1021/acs.jmedchem.2c00991] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The past few years have witnessed enormous progress toward applying machine learning approaches to the development of protein-ligand scoring functions. However, the robust performance and wide applicability of scoring functions remain a big challenge for increasing the success rate of docking-based virtual screening. Herein, a novel scoring function named RTMScore was developed by introducing a tailored residue-based graph representation strategy and several graph transformer layers for the learning of protein and ligand representations, followed by a mixture density network to obtain residue-atom distance likelihood potential. Our approach was resolutely validated on the CASF-2016 benchmark, and the results indicate that RTMScore can outperform almost all of the other state-of-the-art methods in terms of both the docking and screening powers. Further evaluation confirms the robustness of our approach that can not only retain its docking power on cross-docked poses but also achieve improved performance as a rescoring tool in larger-scale virtual screening.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Junbo Gao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
32
|
Ash JR, Hughes-Oliver JM. Confidence bands and hypothesis tests for hit enrichment curves. J Cheminform 2022; 14:50. [PMID: 35902962 PMCID: PMC9334420 DOI: 10.1186/s13321-022-00629-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 06/28/2022] [Indexed: 11/24/2022] Open
Abstract
In virtual screening for drug discovery, hit enrichment curves are widely used to assess the performance of ranking algorithms with regard to their ability to identify early enrichment. Unfortunately, researchers almost never consider the uncertainty associated with estimating such curves before declaring differences between performance of competing algorithms. Uncertainty is often large because the testing fractions of interest to researchers are small. Appropriate inference is complicated by two sources of correlation that are often overlooked: correlation across different testing fractions within a single algorithm, and correlation between competing algorithms. Additionally, researchers are often interested in making comparisons along the entire curve, not only at a few testing fractions. We develop inferential procedures to address both the needs of those interested in a few testing fractions, as well as those interested in the entire curve. For the former, four hypothesis testing and (pointwise) confidence intervals are investigated, and a newly developed EmProc approach is found to be most effective. For inference along entire curves, EmProc-based confidence bands are recommended for simultaneous coverage and minimal width. While we focus on the hit enrichment curve, this work is also appropriate for lift curves that are used throughout the machine learning community. Our inferential procedures trivially extend to enrichment factors, as well.
Collapse
Affiliation(s)
- Jeremy R Ash
- Department of Statistics, Bioinformatics Research Center, North Carolina State University, Raleigh, USA. .,JMP Division, SAS Institute, Cary, USA.
| | | |
Collapse
|
33
|
McGibbon M, Money-Kyrle S, Blay V, Houston DR. SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation. J Adv Res 2022; 46:135-147. [PMID: 35901959 PMCID: PMC10105235 DOI: 10.1016/j.jare.2022.07.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Revised: 07/08/2022] [Accepted: 07/09/2022] [Indexed: 11/17/2022] Open
Abstract
INTRODUCTION The discovery of a new drug is a costly and lengthy endeavour. The computational prediction of which small molecules can bind to a protein target can accelerate this process if the predictions are fast and accurate enough. Recent machine-learning scoring functions re-evaluate the output of molecular docking to achieve more accurate predictions. However, previous scoring functions were trained on crystalised protein-ligand complexes and datasets of decoys. The limited availability of crystal structures and biases in the decoy datasets can lower the performance of scoring functions. OBJECTIVES To address key limitations of previous scoring functions and thus improve the predictive performance of structure-based virtual screening. METHODS A novel machine-learning scoring function was created, named SCORCH (Scoring COnsensus for RMSD-based Classification of Hits). To develop SCORCH, training data is augmented by considering multiple ligand poses and labelling poses based on their RMSD from the native pose. Decoy bias is addressed by generating property-matched decoys for each ligand and using the same methodology for preparing and docking decoys and ligands. A consensus of 3 different machine learning approaches is also used to improve performance. RESULTS We find that multi-pose augmentation in SCORCH improves its docking power and screening power on independent benchmark datasets. SCORCH outperforms an equivalent scoring function trained on single poses, with a 1% enrichment factor (EF) of 13.78 vs. 10.86 on 18 DEKOIS 2.0 targets and a mean native pose rank of 5.9 vs 30.4 on CSAR 2014. Additionally, SCORCH outperforms widely used scoring functions in virtual screening and pose prediction on independent benchmark datasets. CONCLUSION By rationally addressing key limitations of previous scoring functions, SCORCH improves the performance of virtual screening. SCORCH also provides an estimate of its uncertainty, which can help reduce the cost and time required for drug discovery.
Collapse
Affiliation(s)
- Miles McGibbon
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK
| | - Sam Money-Kyrle
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK
| | - Vincent Blay
- Department of Microbiology and Environmental Toxicology, University of California at Santa Cruz, Santa Cruz, CA 95064, USA; Institute for Integrative Systems Biology (I(2)SysBio), Universitat de València and Spanish Research Council (CSIC), 46980 Valencia, Spain.
| | - Douglas R Houston
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK.
| |
Collapse
|
34
|
Zhang X, Shen C, Liao B, Jiang D, Wang J, Wu Z, Du H, Wang T, Huo W, Xu L, Cao D, Hsieh CY, Hou T. TocoDecoy: A New Approach to Design Unbiased Datasets for Training and Benchmarking Machine-Learning Scoring Functions. J Med Chem 2022; 65:7918-7932. [PMID: 35642777 DOI: 10.1021/acs.jmedchem.2c00460] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Development of accurate machine-learning-based scoring functions (MLSFs) for structure-based virtual screening against a given target requires a large unbiased dataset with structurally diverse actives and decoys. However, most datasets for the development of MLSFs were designed for traditional SFs and may suffer from hidden biases and data insufficiency. Hereby, we developed a new approach named Topology-based and Conformation-based decoys generation (TocoDecoy), which integrates two strategies to generate decoys by tweaking the actives for a specific target, to generate unbiased and expandable datasets for training and benchmarking MLSFs. For hidden bias evaluation, the performance of InteractionGraphNet (IGN) trained on the TocoDecoy, LIT-PCBA, and DUD-E-like datasets was assessed. The results illustrate that the IGN model trained on the TocoDecoy dataset is competitive with that trained on the LIT-PCBA dataset but remarkably outperforms that trained on the DUD-E dataset, suggesting that the decoys in TocoDecoy are unbiased for training and benchmarking MLSFs.
Collapse
Affiliation(s)
- Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, China.,Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Ben Liao
- Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China.,National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan 430072, Hubei, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tianyue Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Wenbo Huo
- Tsinghua AI Drug Discovery group, Research Institute of Tsinghua University in Shenzhen, Shenzhen 518057, Guangdong, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
35
|
Jiang D, Hsieh CY, Wu Z, Kang Y, Wang J, Wang E, Liao B, Shen C, Xu L, Wu J, Cao D, Hou T. InteractionGraphNet: A Novel and Efficient Deep Graph Representation Learning Framework for Accurate Protein-Ligand Interaction Predictions. J Med Chem 2021; 64:18209-18232. [PMID: 34878785 DOI: 10.1021/acs.jmedchem.1c01830] [Citation(s) in RCA: 74] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Accurate quantification of protein-ligand interactions remains a key challenge to structure-based drug design. However, traditional machine learning (ML)-based methods based on handcrafted descriptors, one-dimensional protein sequences, and/or two-dimensional graph representations limit their capability to learn the generalized molecular interactions in 3D space. Here, we proposed a novel deep graph representation learning framework named InteractionGraphNet (IGN) to learn the protein-ligand interactions from the 3D structures of protein-ligand complexes. In IGN, two independent graph convolution modules were stacked to sequentially learn the intramolecular and intermolecular interactions, and the learned intermolecular interactions can be efficiently used for subsequent tasks. Extensive binding affinity prediction, large-scale structure-based virtual screening, and pose prediction experiments demonstrated that IGN achieved better or competitive performance against other state-of-the-art ML-based baselines and docking programs. More importantly, such state-of-the-art performance was proven from the successful learning of the key features in protein-ligand interactions instead of just memorizing certain biased patterns from data.
Collapse
Affiliation(s)
- Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China.,State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, China
| | - Zhenxing Wu
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jike Wang
- School of Computer Science, Wuhan University, Wuhan 430072, Hubei, China
| | - Ercheng Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Ben Liao
- Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, China
| | - Chao Shen
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Jian Wu
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
36
|
Ricci-Lopez J, Aguila SA, Gilson MK, Brizuela CA. Improving Structure-Based Virtual Screening with Ensemble Docking and Machine Learning. J Chem Inf Model 2021; 61:5362-5376. [PMID: 34652141 DOI: 10.1021/acs.jcim.1c00511] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
One of the main challenges of structure-based virtual screening (SBVS) is the incorporation of the receptor's flexibility, as its explicit representation in every docking run implies a high computational cost. Therefore, a common alternative to include the receptor's flexibility is the approach known as ensemble docking. Ensemble docking consists of using a set of receptor conformations and performing the docking assays over each of them. However, there is still no agreement on how to combine the ensemble docking results to obtain the final ligand ranking. A common choice is to use consensus strategies to aggregate the ensemble docking scores, but these strategies exhibit slight improvement regarding the single-structure approach. Here, we claim that using machine learning (ML) methodologies over the ensemble docking results could improve the predictive power of SBVS. To test this hypothesis, four proteins were selected as study cases: CDK2, FXa, EGFR, and HSP90. Protein conformational ensembles were built from crystallographic structures, whereas the evaluated compound library comprised up to three benchmarking data sets (DUD, DEKOIS 2.0, and CSAR-2012) and cocrystallized molecules. Ensemble docking results were processed through 30 repetitions of 4-fold cross-validation to train and validate two ML classifiers: logistic regression and gradient boosting trees. Our results indicate that the ML classifiers significantly outperform traditional consensus strategies and even the best performance case achieved with single-structure docking. We provide statistical evidence that supports the effectiveness of ML to improve the ensemble docking performance.
Collapse
Affiliation(s)
- Joel Ricci-Lopez
- Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California C.P. 22860, Mexico.,Centro de Nanociencias y Nanotecnología, Universidad Nacional Autónoma de México (UNAM), Ensenada, Baja California C.P. 22860, Mexico
| | - Sergio A Aguila
- Centro de Nanociencias y Nanotecnología, Universidad Nacional Autónoma de México (UNAM), Ensenada, Baja California C.P. 22860, Mexico
| | - Michael K Gilson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, La Jolla, San Diego, California 92093, United States
| | - Carlos A Brizuela
- Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California C.P. 22860, Mexico
| |
Collapse
|
37
|
Guo J, Janet JP, Bauer MR, Nittinger E, Giblin KA, Papadopoulos K, Voronov A, Patronov A, Engkvist O, Margreitter C. DockStream: a docking wrapper to enhance de novo molecular design. J Cheminform 2021; 13:89. [PMID: 34789335 PMCID: PMC8596819 DOI: 10.1186/s13321-021-00563-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 10/29/2021] [Indexed: 01/09/2023] Open
Abstract
Recently, we have released the de novo design platform REINVENT in version 2.0. This improved and extended iteration supports far more features and scoring function components, which allows bespoke and tailor-made protocols to maximize impact in small molecule drug discovery projects. A major obstacle of generative models is producing active compounds, in which predictive (QSAR) models have been applied to enrich target activity. However, QSAR models are inherently limited by their applicability domains. To overcome these limitations, we introduce a structure-based scoring component for REINVENT. DockStream is a flexible, stand-alone molecular docking wrapper that provides access to a collection of ligand embedders and docking backends. Using the benchmarking and analysis workflow provided in DockStream, execution and subsequent analysis of a variety of docking configurations can be automated. Docking algorithms vary greatly in performance depending on the target and the benchmarking and analysis workflow provides a streamlined solution to identifying productive docking configurations. We show that an informative docking configuration can inform the REINVENT agent to optimize towards improving docking scores using public data. With docking activated, REINVENT is able to retain key interactions in the binding site, discard molecules which do not fit the binding cavity, harness unused (sub-)pockets, and improve overall performance in the scaffold-hopping scenario. The code is freely available at https://github.com/MolecularAI/DockStream .
Collapse
Affiliation(s)
- Jeff Guo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Jon Paul Janet
- Medicinal Chemistry, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Matthias R Bauer
- Structure & Biophysics, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Eva Nittinger
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Kathryn A Giblin
- Medicinal Chemistry, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK
| | | | - Alexey Voronov
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Atanas Patronov
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | | |
Collapse
|
38
|
Di Filippo JI, Cavasotto CN. Guided structure-based ligand identification and design via artificial intelligence modeling. Expert Opin Drug Discov 2021; 17:71-78. [PMID: 34544293 DOI: 10.1080/17460441.2021.1979514] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
INTRODUCTION The implementation of Artificial Intelligence (AI) methodologies to drug discovery (DD) are on the rise. Several applications have been developed for structure-based DD, where AI methods provide an alternative framework for the identification of ligands for validated therapeutic targets, as well as the de novo design of ligands through generative models. AREAS COVERED Herein, the authors review the contributions between the 2019 to present period regarding the application of AI methods to structure-based virtual screening (SBVS) which encompasses mainly molecular docking applications - binding pose prediction and binary classification for ligand or hit identification-, as well as de novo drug design driven by machine learning (ML) generative models, and the validation of AI models in structure-based screening. Studies are reviewed in terms of their main objective, used databases, implemented methodology, input and output, and key results . EXPERT OPINION More profound analyses regarding the validity and applicability of AI methods in DD have begun to appear. In the near future, we expect to see more structure-based generative models- which are scarce in comparison to ligand-based generative models-, the implementation of standard guidelines for validating the generated structures, and more analyses regarding the validation of AI methods in structure-based DD.
Collapse
Affiliation(s)
- Juan I Di Filippo
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Pilar, Buenos Aires, Argentina.,Facultad de Ciencias Biomédicas, and Facultad de Ingeniería, Universidad Austral, Pilar, Buenos Aires, Argentina.,Austral Institute for Applied Artificial Intelligence, Universidad Austral, Pilar, Buenos Aires, Argentina
| | - Claudio N Cavasotto
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Pilar, Buenos Aires, Argentina.,Facultad de Ciencias Biomédicas, and Facultad de Ingeniería, Universidad Austral, Pilar, Buenos Aires, Argentina.,Austral Institute for Applied Artificial Intelligence, Universidad Austral, Pilar, Buenos Aires, Argentina
| |
Collapse
|
39
|
Kim J, Park S, Min D, Kim W. Comprehensive Survey of Recent Drug Discovery Using Deep Learning. Int J Mol Sci 2021; 22:9983. [PMID: 34576146 PMCID: PMC8470987 DOI: 10.3390/ijms22189983] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/09/2021] [Accepted: 09/10/2021] [Indexed: 02/07/2023] Open
Abstract
Drug discovery based on artificial intelligence has been in the spotlight recently as it significantly reduces the time and cost required for developing novel drugs. With the advancement of deep learning (DL) technology and the growth of drug-related data, numerous deep-learning-based methodologies are emerging at all steps of drug development processes. In particular, pharmaceutical chemists have faced significant issues with regard to selecting and designing potential drugs for a target of interest to enter preclinical testing. The two major challenges are prediction of interactions between drugs and druggable targets and generation of novel molecular structures suitable for a target of interest. Therefore, we reviewed recent deep-learning applications in drug-target interaction (DTI) prediction and de novo drug design. In addition, we introduce a comprehensive summary of a variety of drug and protein representations, DL models, and commonly used benchmark datasets or tools for model training and testing. Finally, we present the remaining challenges for the promising future of DL-based DTI prediction and de novo drug design.
Collapse
Affiliation(s)
- Jintae Kim
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
| | - Sera Park
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
| | - Dongbo Min
- Computer Vision Lab, Department of Computer Science and Engineering, Ewha Womans University, Seoul 03760, Korea
| | - Wankyu Kim
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
- System Pharmacology Lab, Department of Life Sciences, Ewha Womans University, Seoul 03760, Korea
| |
Collapse
|
40
|
Vásquez AF, Muñoz AR, Duitama J, González Barrios A. Non-Extensive Fragmentation of Natural Products and Pharmacophore-Based Virtual Screening as a Practical Approach to Identify Novel Promising Chemical Scaffolds. Front Chem 2021; 9:700802. [PMID: 34422762 PMCID: PMC8377161 DOI: 10.3389/fchem.2021.700802] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 06/28/2021] [Indexed: 11/25/2022] Open
Abstract
Fragment-based drug design (FBDD) and pharmacophore modeling have proven to be efficient tools to discover novel drugs. However, these approaches may become limited if the collection of fragments is highly repetitive, poorly diverse, or excessively simple. In this article, combining pharmacophore modeling and a non-classical type of fragmentation (herein called non-extensive) to screen a natural product (NP) library may provide fragments predicted as potent, diverse, and developable. Initially, we applied retrosynthetic combinatorial analysis procedure (RECAP) rules in two versions, extensive and non-extensive, in order to deconstruct a virtual library of NPs formed by the databases Traditional Chinese Medicine (TCM), AfroDb (African Medicinal Plants database), NuBBE (Nuclei of Bioassays, Biosynthesis, and Ecophysiology of Natural Products), and UEFS (Universidade Estadual de Feira de Santana). We then developed a virtual screening (VS) using two groups of natural-product-derived fragments (extensive and non-extensive NPDFs) and two overlapping pharmacophore models for each of 20 different proteins of therapeutic interest. Molecular weight, lipophilicity, and molecular complexity were estimated and compared for both types of NPDFs (and their original NPs) before and after the VS proceedings. As a result, we found that non-extensive NPDFs exhibited a much higher number of chemical entities compared to extensive NPDFs (45,355 vs. 11,525 compounds), accounting for the larger part of the hits recovered and being far less repetitive than extensive NPDFs. The structural diversity of both types of NPDFs and the NPs was shown to diminish slightly after VS procedures. Finally, and most interestingly, the pharmacophore fit score of the non-extensive NPDFs proved to be not only higher, on average, than extensive NPDFs (56% of cases) but also higher than their original NPs (69% of cases) when all of them were also recognized as hits after the VS. The findings obtained in this study indicated that the proposed cascade approach was useful to enhance the probability of identifying innovative chemical scaffolds, which deserve further development to become drug-sized candidate compounds. We consider that the knowledge about the deconstruction degree required to produce NPDFs of interest represents a good starting point for eventual synthesis, characterization, and biological activity studies.
Collapse
Affiliation(s)
- Andrés Felipe Vásquez
- Grupo de Diseño de Productos y Procesos (GDPP), Department of Chemical Engineering, Universidad de Los Andes, Bogotá, Colombia.,Naturalius S.A.S, Bogotá, Colombia
| | - Alejandro Reyes Muñoz
- Grupo de Biología Computacional y Ecología Microbiana (BCEM), Department of Biological Sciences, Universidad de Los Andes, Bogotá, Colombia.,Max Planck Tandem Group in Computational Biology, Universidad de Los Andes, Bogotá, Colombia
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de Los Andes, Bogotá, Colombia
| | - Andrés González Barrios
- Grupo de Diseño de Productos y Procesos (GDPP), Department of Chemical Engineering, Universidad de Los Andes, Bogotá, Colombia
| |
Collapse
|
41
|
Chukwuemeka PO, Umar HI, Iwaloye O, Oretade OM, Olowosoke CB, Elabiyi MO, Igbe FO, Oretade OJ, Eigbe JO, Adeojo FJ. Targeting p53-MDM2 interactions to identify small molecule inhibitors for cancer therapy: beyond "Failure to rescue". J Biomol Struct Dyn 2021; 40:9158-9176. [PMID: 33988074 DOI: 10.1080/07391102.2021.1924267] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
At present, disrupting p53-MDM2 interactions through small molecule ligands is a promising approach to safe treatment and management of human cancer. Tumor cells unlike the normal cells, are rapidly evolving affecting the efficacy of many approved anti-cancer agents due to drug resistance. Therefore, identifying a potential anticancer compound is crucial. Pharmacophore based virtual screening, followed by molecular docking, ADMET evaluation, and molecular dynamics studies against MDM2 protein was investigated to identify potential ligands that may act as inhibitors. The model (AHRR_1) with survival score (4.176) was selected among the top ranked generated Pharmacophore hypothesis. Validation of the model hypothesis by an external dataset of actives and inactive compounds produced significant validation attributes including; AUC = 0.85, BEDROC = 0.56 at α = 20.0, RIE = 8.18, AUAC = 0.88, and EF of 6.2 at the top 2% of the dataset. The model was use for screening the ZINC database, and the top 1375 hits satisfying the model hypothesis were subjected to molecular docking studies to understand the molecular and structural basis of selectivity of compounds for MDM2 protein. A sub-set of 25 compounds with binding energy lower than the reference inhibitors were evaluated for pharmacokinetic properties. Four compounds (ZINC02639178, ZINC06752762, ZINC38933175, and ZINC77969611) showed the most desired pharmacokinetic profile. Lastly, investigation of the dynamic behaviour of leads-protein complexes through MD simulation showed similar RMSD, RMSF, and H-bond occupancy profile compared to a reference inhibitor, suggesting stability throughout the simulation time. However, ZINC02639178 was found to satisfy the molecular enumeration the most compared to the other three leads. It may emerge as potential treatment option after extensive experimental studies. Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Prosper Obed Chukwuemeka
- Department of Biotechnology, School of Sciences (SOS), Federal University of Technology Akure, Akure, Nigeria
| | - Haruna Isiyaku Umar
- Department of Biochemistry, School of Sciences (SOS), Federal University of Technology Akure, Akure, Nigeria
| | - Opeyemi Iwaloye
- Bioinformatics and Molecular biology unit, Department of Biochemistry, School of Sciences (SOS), Federal University of Technology Akure, Akure, Nigeria
| | - Oluwaseyi Matthew Oretade
- Department of Biotechnology, School of Sciences (SOS), Federal University of Technology Akure, Akure, Nigeria
| | | | - Michael Omoniyi Elabiyi
- Department of Microbiology, School of Sciences (SOS), Federal University of Technology Akure, Akure, Nigeria
| | - Festus Omotere Igbe
- Department of Biochemistry, School of Sciences (SOS), Federal University of Technology Akure, Akure, Nigeria
| | - Oyeyemi Janet Oretade
- Department of Physiology, College of Health Science (CHS), Osun State University, Osogbo, Nigeria
| | - Joy Oseme Eigbe
- Department of Biomedical Technology, School of Health and Health Technology (SHHT), Federal University of Technology Akure, Akure, Nigeria
| | - Funmilayo Janet Adeojo
- Department of Biotechnology, School of Sciences (SOS), Federal University of Technology Akure, Akure, Nigeria
| |
Collapse
|
42
|
Elghoneimy LK, Ismail MI, Boeckler FM, Azzazy HME, Ibrahim TM. Facilitating SARS CoV-2 RNA-Dependent RNA polymerase (RdRp) drug discovery by the aid of HCV NS5B palm subdomain binders: In silico approaches and benchmarking. Comput Biol Med 2021; 134:104468. [PMID: 34015671 PMCID: PMC8111889 DOI: 10.1016/j.compbiomed.2021.104468] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 04/25/2021] [Accepted: 04/30/2021] [Indexed: 01/18/2023]
Abstract
Corona Virus 2019 Disease (COVID-19) is a rapidly emerging pandemic caused by a newly discovered beta coronavirus, called Sever Acute Respiratory Syndrome Coronavirus 2 (SARS CoV-2). SARS CoV-2 is an enveloped, single stranded RNA virus that depends on RNA-dependent RNA polymerase (RdRp) to replicate. Therefore, SARS CoV-2 RdRp is considered as a promising target to cease virus replication. SARS CoV-2 polymerase shows high structural similarity to Hepatitis C Virus-1b genotype (HCV-1b) polymerase. Arising from the high similarity between SARS CoV-2 RdRp and HCV NS5B, we utilized the reported small-molecule binders to the palm subdomain of HCV NS5B (genotype 1b) to generate a high-quality DEKOIS 2.0 benchmark set and conducted a benchmarking analysis against HCV NS5B. The three highly cited and publicly available docking tools AutoDock Vina, FRED and PLANTS were benchmarked. Based on the benchmarking results and analysis via pROC-Chemotype plot, PLANTS showed the best screening performance and can recognize potent binders at the early enrichment. Accordingly, we used PLANTS in a prospective virtual screening to repurpose both the FDA-approved drugs (DrugBank) and the HCV-NS5B palm subdomain binders (BindingDB) for SARS CoV-2 RdRp palm subdomain. Further assessment by molecular dynamics simulations for 50 ns recommended diosmin (from DrugBank) and compound 3 (from BindingDB) to be the best potential binders to SARS CoV-2 RdRp palm subdomain. The best predicted compounds are recommended to be biologically investigated against COVID-19. In conclusion, this work provides in-silico analysis to propose possible SARS CoV-2 RdRp palm subdomain binders recommended as a remedy for COVID-19. Up-to-our knowledge, this study is the first to propose binders at the palm subdomain of SARS CoV2 RdRp. Furthermore, this study delivers an example of how to make use of a high quality custom-made DEKOIS 2.0 benchmark set as a procedure to elevate the virtual screening success rate against a vital target of the rapidly emerging pandemic.
Collapse
Affiliation(s)
- Laila K Elghoneimy
- Department of Chemistry, School of Sciences and Engineering, American University in Cairo, AUC Avenue, SSE # 1184, P.O. Box 74, New Cairo, 11835, Egypt
| | - Muhammad I Ismail
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, The British University in Egypt, Al-Sherouk City, Cairo-Suez Desert Road, 11837, Cairo, Egypt
| | - Frank M Boeckler
- Department of Pharmacy, Eberhard-Karls University, Auf der Morgenstelle 8, 72076, Tuebingen, Germany
| | - Hassan M E Azzazy
- Department of Chemistry, School of Sciences and Engineering, American University in Cairo, AUC Avenue, SSE # 1184, P.O. Box 74, New Cairo, 11835, Egypt
| | - Tamer M Ibrahim
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Kafrelsheikh University, Kafrelsheikh, 33516, Egypt.
| |
Collapse
|
43
|
Ismail MI, Ragab HM, Bekhit AA, Ibrahim TM. Targeting multiple conformations of SARS-CoV2 Papain-Like Protease for drug repositioning: An in-silico study. Comput Biol Med 2021; 131:104295. [PMID: 33662683 PMCID: PMC7902231 DOI: 10.1016/j.compbiomed.2021.104295] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 02/15/2021] [Accepted: 02/19/2021] [Indexed: 12/16/2022]
Abstract
Papain-Like Protease (PLpro) is a key protein for SARS-CoV-2 viral replication which is the cause of the emerging COVID-19 pandemic. Targeting PLpro can suppress viral replication and provide treatment options for COVID-19. Due to the dynamic nature of its binding site loop, PLpro multiple conformations were generated through a long-range 1 micro-second molecular dynamics (MD) simulation. Clustering the MD trajectory enabled us to extract representative structures for the conformational space generated. Adding to the MD representative structures, X-ray structures were involved in an ensemble docking approach to screen the FDA approved drugs for a drug repositioning endeavor. Guided by our recent benchmarking study of SARS-CoV-2 PLpro, FRED docking software was selected for such a virtual screening task. The results highlighted potential consensus binders to many of the MD clusters as well as the newly introduced X-ray structure of PLpro complexed with a small molecule. For instance, three drugs Benserazide, Dobutamine and Masoprocol showed a superior consensus enrichment against the PLpro conformations. Further MD simulations for these drugs complexed with PLpro suggested the superior stability and binding of dobutamine and masoprocol inside the binding site compared to Benserazide. Generally, this approach can facilitate identifying drugs for repositioning via targeting multiple conformations of a crucial target for the rapidly emerging COVID-19 pandemic.
Collapse
Affiliation(s)
- Muhammad I Ismail
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, The British University in Egypt, Al-Sherouk City, Cairo-Suez Desert Road, 11837, Cairo, Egypt
| | - Hanan M Ragab
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Alexandria University, Alexandria, 21521, Egypt
| | - Adnan A Bekhit
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Alexandria University, Alexandria, 21521, Egypt; Pharmacy Program, Allied Health Department, College of Health and Sport Sciences, University of Bahrain, P.O. Box 32038, Bahrain
| | - Tamer M Ibrahim
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Kafrelsheikh University, Kafrelsheikh, 33516, Egypt.
| |
Collapse
|
44
|
Zhou H, Cao H, Skolnick J. FRAGSITE: A Fragment-Based Approach for Virtual Ligand Screening. J Chem Inf Model 2021; 61:2074-2089. [PMID: 33724022 DOI: 10.1021/acs.jcim.0c01160] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
To reduce time and cost, virtual ligand screening (VLS) often precedes experimental ligand screening in modern drug discovery. Traditionally, high-resolution structure-based docking approaches rely on experimental structures, while ligand-based approaches need known binders to the target protein and only explore their nearby chemical space. In contrast, our structure-based FINDSITEcomb2.0 approach takes advantage of predicted, low-resolution structures and information from ligands that bind distantly related proteins whose binding sites are similar to the target protein. Using a boosted tree regression machine learning framework, we significantly improved FINDSITEcomb2.0 by integrating ligand fragment scores as encoded by molecular fingerprints with the global ligand similarity scores of FINDSITEcomb2.0. The new approach, FRAGSITE, exploits our observation that ligand fragments, e.g., rings, tend to interact with stereochemically conserved protein subpockets that also occur in evolutionarily unrelated proteins. FRAGSITE was benchmarked on the 102 protein DUD-E set, where any template protein whose sequence identify >30% to the target was excluded. Within the top 100 ranked molecules, FRAGSITE improves VLS precision and recall by 14.3 and 18.5%, respectively, relative to FINDSITEcomb2.0. Moreover, the mean top 1% enrichment factor increases from 25.2 to 30.2. On average, both outperform state-of-the-art deep learning-based methods such as AtomNet. On the more challenging unbiased set LIT-PCBA, FRAGSITE also shows better performance than ligand similarity-based and docking approaches such as two-dimensional ECFP4 and Surflex-Dock v.3066. On a subset of 23 targets from DEKOIS 2.0, FRAGSITE shows much better performance than the boosted tree regression-based, vScreenML scoring function. Experimental testing of FRAGSITE's predictions shows that it has more hits and covers a more diverse region of chemical space than FINDSITEcomb2.0. For the two proteins that were experimentally tested, DHFR, a well-studied protein that catalyzes the conversion of dihydrofolate to tetrahydrofolate, and the kinase ACVR1, FRAGSITE identified new small-molecule nanomolar binders. Interestingly, one new binder of DHFR is a kinase inhibitor predicted to bind in a new subpocket. For ACVR1, FRAGSITE identified new molecules that have diverse scaffolds and estimated nanomolar to micromolar affinities. Thus, FRAGSITE shows significant improvement over prior state-of-the-art ligand virtual screening approaches. A web server is freely available for academic users at http:/sites.gatech.edu/cssb/FRAGSITE.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, Georgia 30332-2000, United States
| | - Hongnan Cao
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, Georgia 30332-2000, United States
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, Georgia 30332-2000, United States
| |
Collapse
|
45
|
Stein RM, Yang Y, Balius TE, O'Meara MJ, Lyu J, Young J, Tang K, Shoichet BK, Irwin JJ. Property-Unmatched Decoys in Docking Benchmarks. J Chem Inf Model 2021; 61:699-714. [PMID: 33494610 PMCID: PMC7913603 DOI: 10.1021/acs.jcim.0c00598] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Enrichment of ligands versus property-matched decoys is widely used to test and optimize docking library screens. However, the unconstrained optimization of enrichment alone can mislead, leading to false confidence in prospective performance. This can arise by over-optimizing for enrichment against property-matched decoys, without considering the full spectrum of molecules to be found in a true large library screen. Adding decoys representing charge extrema helps mitigate over-optimizing for electrostatic interactions. Adding decoys that represent the overall characteristics of the library to be docked allows one to sample molecules not represented by ligands and property-matched decoys but that one will encounter in a prospective screen. An optimized version of the DUD-E set (DUDE-Z), as well as Extrema and sets representing broad features of the library (Goldilocks), is developed here. We also explore the variability that one can encounter in enrichment calculations and how that can temper one's confidence in small enrichment differences. The new tools and new decoy sets are freely available at http://tldr.docking.org and http://dudez.docking.org.
Collapse
Affiliation(s)
- Reed M Stein
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| | - Ying Yang
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| | - Trent E Balius
- Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., P.O. Box B, Frederick, Maryland 21702, United States
| | - Matt J O'Meara
- Department of Computational Medicine and Bioinformatics, University of Michigan, Palmer Commons, 100 Washtenaw Ave. #2017, Ann Arbor, Michigan 48109, United States
| | - Jiankun Lyu
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| | - Jennifer Young
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| | - Khanh Tang
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| | - Brian K Shoichet
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| | - John J Irwin
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| |
Collapse
|
46
|
Imrie F, Bradley AR, Deane CM. Generating Property-Matched Decoy Molecules Using Deep Learning. Bioinformatics 2021; 37:2134-2141. [PMID: 33532838 PMCID: PMC8352508 DOI: 10.1093/bioinformatics/btab080] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 01/05/2021] [Accepted: 01/28/2021] [Indexed: 01/30/2023] Open
Abstract
MOTIVATION An essential step in the development of virtual screening methods is the use of established sets of actives and decoys for benchmarking and training. However, the decoy molecules in commonly used sets are biased meaning that methods often exploit these biases to separate actives and decoys, and do not necessarily learn to perform molecular recognition. This fundamental issue prevents generalisation and hinders virtual screening method development. RESULTS We have developed a deep learning method (DeepCoy) that generates decoys to a user's preferred specification in order to remove such biases or construct sets with a defined bias.We validated DeepCoy using two established benchmarks, DUD-E and DEKOIS 2.0. For all 102 DUD-E targets and 80 of the 81 DEKOIS 2.0 targets, our generated decoy molecules more closely matched the active molecules' physicochemical properties while introducing no discernible additional risk of false negatives. The DeepCoy decoys improved the Deviation from Optimal Embedding (DOE) score by an average of 81% and 66%, respectively, decreasing from 0.166 to 0.032 for DUD-E and from 0.109 to 0.038 for DEKOIS 2.0. Further, the generated decoys are harder to distinguish than the original decoy molecules via docking with Autodock Vina, with virtual screening performance falling from an AUC ROC of 0.70 to 0.63. AVAILABILITY AND IMPLEMENTATION The code is available at https://github.com/oxpig/DeepCoy. Generated molecules can be downloaded from http://opig.stats.ox.ac.uk/resources. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fergus Imrie
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
| | | | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
| |
Collapse
|
47
|
Chemoinformatics and QSAR. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
48
|
Artificial intelligence in the early stages of drug discovery. Arch Biochem Biophys 2020; 698:108730. [PMID: 33347838 DOI: 10.1016/j.abb.2020.108730] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 12/11/2020] [Accepted: 12/14/2020] [Indexed: 02/07/2023]
Abstract
Although the use of computational methods within the pharmaceutical industry is well established, there is an urgent need for new approaches that can improve and optimize the pipeline of drug discovery and development. In spite of the fact that there is no unique solution for this need for innovation, there has recently been a strong interest in the use of Artificial Intelligence for this purpose. As a matter of fact, not only there have been major contributions from the scientific community in this respect, but there has also been a growing partnership between the pharmaceutical industry and Artificial Intelligence companies. Beyond these contributions and efforts there is an underlying question, which we intend to discuss in this review: can the intrinsic difficulties within the drug discovery process be overcome with the implementation of Artificial Intelligence? While this is an open question, in this work we will focus on the advantages that these algorithms provide over the traditional methods in the context of early drug discovery.
Collapse
|
49
|
Ibrahim TM, Ismail MI, Bauer MR, Bekhit AA, Boeckler FM. Supporting SARS-CoV-2 Papain-Like Protease Drug Discovery: In silico Methods and Benchmarking. Front Chem 2020; 8:592289. [PMID: 33251185 PMCID: PMC7674952 DOI: 10.3389/fchem.2020.592289] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 09/29/2020] [Indexed: 12/17/2022] Open
Abstract
The coronavirus disease 19 (COVID-19) is a rapidly growing pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Its papain-like protease (SARS-CoV-2 PLpro) is a crucial target to halt virus replication. SARS-CoV PLpro and SARS-CoV-2 PLpro share an 82.9% sequence identity and a 100% sequence identity for the binding site reported to accommodate small molecules in SARS-CoV. The flexible key binding site residues Tyr269 and Gln270 for small-molecule recognition in SARS-CoV PLpro exist also in SARS-CoV-2 PLpro. This inspired us to use the reported small-molecule binders to SARS-CoV PLpro to generate a high-quality DEKOIS 2.0 benchmark set. Accordingly, we used them in a cross-benchmarking study against SARS-CoV-2 PLpro. As there is no SARS-CoV-2 PLpro structure complexed with a small-molecule ligand publicly available at the time of manuscript submission, we built a homology model based on the ligand-bound SARS-CoV structure for benchmarking and docking purposes. Three publicly available docking tools FRED, AutoDock Vina, and PLANTS were benchmarked. All showed better-than-random performances, with FRED performing best against the built model. Detailed performance analysis via pROC-Chemotype plots showed a strong enrichment of the most potent bioactives in the early docking ranks. Cross-benchmarking against the X-ray structure complexed with a peptide-like inhibitor confirmed that FRED is the best-performing tool. Furthermore, we performed cross-benchmarking against the newly introduced X-ray structure complexed with a small-molecule ligand. Interestingly, its benchmarking profile and chemotype enrichment were comparable to the built model. Accordingly, we used FRED in a prospective virtual screen of the DrugBank database. In conclusion, this study provides an example of how to harness a custom-made DEKOIS 2.0 benchmark set as an approach to enhance the virtual screening success rate against a vital target of the rapidly emerging pandemic.
Collapse
Affiliation(s)
- Tamer M. Ibrahim
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Kafrelsheikh University, Kafrelsheikh, Egypt
| | - Muhammad I. Ismail
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, The British University in Egypt, Cairo, Egypt
| | - Matthias R. Bauer
- Structure, Biophysics and Fragment-Based Lead Generation, Discovery Sciences, R&D, AstraZeneca, Cambridge, United Kingdom
- Department of Pharmacy, Eberhard-Karls University, Tuebingen, Germany
| | - Adnan A. Bekhit
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Alexandria University, Alexandria, Egypt
- Pharmacy Program, Allied Health Department, College of Health and Sport Sciences, University of Bahrain, Zallaq, Bahrain
| | - Frank M. Boeckler
- Department of Pharmacy, Eberhard-Karls University, Tuebingen, Germany
| |
Collapse
|
50
|
Selecting machine-learning scoring functions for structure-based virtual screening. DRUG DISCOVERY TODAY. TECHNOLOGIES 2020; 32-33:81-87. [PMID: 33386098 DOI: 10.1016/j.ddtec.2020.09.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 09/02/2020] [Accepted: 09/07/2020] [Indexed: 12/27/2022]
Abstract
Interest in docking technologies has grown parallel to the ever increasing number and diversity of 3D models for macromolecular therapeutic targets. Structure-Based Virtual Screening (SBVS) aims at leveraging these experimental structures to discover the necessary starting points for the drug discovery process. It is now established that Machine Learning (ML) can strongly enhance the predictive accuracy of scoring functions for SBVS by exploiting large datasets from targets, molecules and their associations. However, with greater choice, the question of which ML-based scoring function is the most suitable for prospective use on a given target has gained importance. Here we analyse two approaches to select an existing scoring function for the target along with a third approach consisting in generating a scoring function tailored to the target. These analyses required discussing the limitations of popular SBVS benchmarks, the alternatives to benchmark scoring functions for SBVS and how to generate them or use them using freely-available software.
Collapse
|