1
|
Dai J, Zhou Z, Zhao Y, Kong F, Zhai Z, Zhu Z, Cai J, Huang S, Xu Y, Sun T. Combined usage of ligand- and structure-based virtual screening in the artificial intelligence era. Eur J Med Chem 2025; 283:117162. [PMID: 39673863 DOI: 10.1016/j.ejmech.2024.117162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 11/27/2024] [Accepted: 12/09/2024] [Indexed: 12/16/2024]
Abstract
Drug design has always been pursuing techniques with time- and cost-benefits. Virtual screening, generally classified as ligand-based (LBVS) and structure-based (SBVS) approaches, could identify active compounds in the large chemical library to reduce time and cost. Owing to the intrinsic flaws and complementary nature of both approaches, continued efforts have been made to combine them to mitigate limitations. Meanwhile, the emergence of machine learning (ML) endows them with opportunities to leverage vast amounts of data to improve their defects. However, few discussions on how to merge ML-improved LBVS and SBVS have been conducted. Therefore, this review provides insights into combined usage of ML-improved LBVS and SBVS to enlighten medicinal chemists to utilize these joint strategies to lift the screening efficiency as well as AI professionals to design novel techniques.
Collapse
Affiliation(s)
- Jingyi Dai
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Ziyi Zhou
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Yanru Zhao
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Fanjing Kong
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Zhenwei Zhai
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Zhishan Zhu
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Jie Cai
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Sha Huang
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Ying Xu
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan, China.
| | - Tao Sun
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China; State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| |
Collapse
|
2
|
Meng J, Zhang L, He Z, Hu M, Liu J, Bao W, Tian Q, Feng H, Liu H. Development of a machine learning-based target-specific scoring function for structure-based binding affinity prediction for human dihydroorotate dehydrogenase inhibitors. J Comput Chem 2025; 46:e27510. [PMID: 39325045 DOI: 10.1002/jcc.27510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 08/21/2024] [Accepted: 09/11/2024] [Indexed: 09/27/2024]
Abstract
Human dihydroorotate dehydrogenase (hDHODH) is a flavin mononucleotide-dependent enzyme that can limit de novo pyrimidine synthesis, making it a therapeutic target for diseases such as autoimmune disorders and cancer. In this study, using the docking structures of complexes generated by AutoDock Vina, we integrate interaction features and ligand features, and employ support vector regression to develop a target-specific scoring function for hDHODH (TSSF-hDHODH). The Pearson correlation coefficient values of TSSF-hDHODH in the cross-validation and external validation are 0.86 and 0.74, respectively, both of which are far superior to those of classic scoring function AutoDock Vina and random forest (RF) based generic scoring function RF-Score. TSSF-hDHODH is further used for the virtual screening of potential inhibitors in the FDA-Approved & Pharmacopeia Drug Library. In conjunction with the results from molecular dynamics simulations, crizotinib is identified as a candidate for subsequent structural optimization. This study can be useful for the discovery of hDHODH inhibitors and the development of scoring functions for additional targets.
Collapse
Affiliation(s)
- Jinhui Meng
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Li Zhang
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
- Liaoning Provincial Key Laboratory of Computational Simulation and Information Processing of Biomacromolecules, Liaoning University, Shenyang, Liaoning, China
- Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Liaoning University, Shenyang, Liaoning, China
| | - Zhe He
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Mengfeng Hu
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Jinhan Liu
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Wenzhuo Bao
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Qifeng Tian
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Huawei Feng
- School of Pharmacy, Liaoning University, Shenyang, Liaoning, China
| | - Hongsheng Liu
- Liaoning Provincial Key Laboratory of Computational Simulation and Information Processing of Biomacromolecules, Liaoning University, Shenyang, Liaoning, China
- Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Liaoning University, Shenyang, Liaoning, China
- School of Pharmacy, Liaoning University, Shenyang, Liaoning, China
| |
Collapse
|
3
|
Schifferstein J, Bernatavicius A, Janssen APA. Docking-Informed Machine Learning for Kinome-wide Affinity Prediction. J Chem Inf Model 2024; 64:9196-9204. [PMID: 39657274 DOI: 10.1021/acs.jcim.4c01260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
Kinase inhibitors are an important class of anticancer drugs, with 80 inhibitors clinically approved and >100 in active clinical testing. Most bind competitively in the ATP-binding site, leading to challenges with selectivity for a specific kinase, resulting in risks for toxicity and general off-target effects. Assessing the binding of an inhibitor for the entire kinome is experimentally possible but expensive. A reliable and interpretable computational prediction of kinase selectivity would greatly benefit the inhibitor discovery and optimization process. Here, we use machine learning on docked poses to address this need. To this end, we aggregated all known inhibitor-kinase affinities and generated the complete accompanying 3D interactome by docking all inhibitors to the respective high-quality X-ray structures. We then used this resource to train a neural network as a kinase-specific scoring function, which achieved an overall performance (R2) of 0.63-0.74 on unseen inhibitors across the kinome. The entire pipeline from molecule to 3D-based affinity prediction has been fully automated and wrapped in a freely available package. This has a graphical user interface that is tightly integrated with PyMOL to allow immediate adoption in the medicinal chemistry practice.
Collapse
Affiliation(s)
- Jordy Schifferstein
- Department of Molecular Physiology, Leiden Institute of Chemistry, Leiden University, Leiden 2333CC, The Netherlands
- Oncode Institute, Utrecht 3521AL, The Netherlands
| | - Andrius Bernatavicius
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden 2333CC, The Netherlands
| | - Antonius P A Janssen
- Department of Molecular Physiology, Leiden Institute of Chemistry, Leiden University, Leiden 2333CC, The Netherlands
- Oncode Institute, Utrecht 3521AL, The Netherlands
| |
Collapse
|
4
|
Jiang YY, Yan ST, Zhang SZ, Wang M, Diao WM, Li J, Fang XM, Yin H. Discovery of pyrazolo[1,5-a]pyrimidine derivatives targeting TLR4-TLR4∗ homodimerization via AI-powered next-generation screening. Eur J Med Chem 2024; 280:116945. [PMID: 39388907 DOI: 10.1016/j.ejmech.2024.116945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 09/28/2024] [Accepted: 10/04/2024] [Indexed: 10/12/2024]
Abstract
TLR4 signaling is instrumental in orchestrating multiple aspects of innate immunity. Developing small molecule inhibitors targeting the TLR4 pathway holds potential therapeutic promise for TLR4-related disorders. Herein, an artificial intelligence (AI)-powered next-generation screening approach, employing HelixVS and HelixDock, was utilized to focus on the TLR4-TLR4∗ (a second copy of TLR4) homodimerization surface, leading to the identification of a potent pyrazolo[1,5-a]pyrimidine derivative, designated as compound 1. An extensive structure-activity relationship (SAR) exploration culminated in the discovery of the lead compound TH023, which effectively blocked the LPS-stimulated NF-κB activation and nitric oxide overproduction in HEK-Blue hTLR4 and RAW264.7 cells, with IC50 values of 0.354 and 1.61 μM, respectively. Molecular dynamic (MD) simulations indicated that TH023 stabilized TLR4-MD-2 and disrupted its association with TLR4∗. Moreover, TH023 alleviated the lung injury and decreased pro-inflammatory cytokine levels in LPS-induced septic mice. These findings not only illuminated the strategic advantage of HelixDock in advancing the frontiers of AI-driven drug discovery, but also provided valuable structural insights for the rational design of TLR4-TLR4∗ protein-protein interaction (PPI) inhibitors based on the pyrazolo[1,5-a]pyrimidine scaffold. Overall, this study validated a new strategy for TLR4 signaling regulation by targeting its dimerization, thereby underscoring the therapeutic promise of TH023 in treating TLR4-mediated inflammatory diseases.
Collapse
Affiliation(s)
- Yao-Yao Jiang
- State Key Laboratory of Membrane Biology, School of Pharmaceutical Sciences, Tsinghua-Peking Center for Life Sciences, Key Laboratory of Bioorganic Phosphorous Chemistry and Chemical Biology (Ministry of Education), Tsinghua University, Beijing, 100084, China
| | - Shuai-Ting Yan
- State Key Laboratory of Membrane Biology, School of Pharmaceutical Sciences, Tsinghua-Peking Center for Life Sciences, Key Laboratory of Bioorganic Phosphorous Chemistry and Chemical Biology (Ministry of Education), Tsinghua University, Beijing, 100084, China
| | | | - Meng Wang
- Toll Biotech Co., Ltd. (Beijing), Beijing, 102209, China
| | - Wei-Ming Diao
- Toll Biotech Co., Ltd. (Beijing), Beijing, 102209, China
| | - Jun Li
- PaddleHelix Team, Baidu Inc., Shenzhen, 518000, China
| | - Xiao-Min Fang
- PaddleHelix Team, Baidu Inc., Shenzhen, 518000, China
| | - Hang Yin
- State Key Laboratory of Membrane Biology, School of Pharmaceutical Sciences, Tsinghua-Peking Center for Life Sciences, Key Laboratory of Bioorganic Phosphorous Chemistry and Chemical Biology (Ministry of Education), Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
5
|
Yang Z, Zhong W, Lv Q, Dong T, Chen G, Chen CYC. Interaction-Based Inductive Bias in Graph Neural Networks: Enhancing Protein-Ligand Binding Affinity Predictions From 3D Structures. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:8191-8208. [PMID: 38739515 DOI: 10.1109/tpami.2024.3400515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Inductive bias in machine learning (ML) is the set of assumptions describing how a model makes predictions. Different ML-based methods for protein-ligand binding affinity (PLA) prediction have different inductive biases, leading to different levels of generalization capability and interpretability. Intuitively, the inductive bias of an ML-based model for PLA prediction should fit in with biological mechanisms relevant for binding to achieve good predictions with meaningful reasons. To this end, we propose an interaction-based inductive bias to restrict neural networks to functions relevant for binding with two assumptions: 1) A protein-ligand complex can be naturally expressed as a heterogeneous graph with covalent and non-covalent interactions; 2) The predicted PLA is the sum of pairwise atom-atom affinities determined by non-covalent interactions. The interaction-based inductive bias is embodied by an explainable heterogeneous interaction graph neural network (EHIGN) for explicitly modeling pairwise atom-atom interactions to predict PLA from 3D structures. Extensive experiments demonstrate that EHIGN achieves better generalization capability than other state-of-the-art ML-based baselines in PLA prediction and structure-based virtual screening. More importantly, comprehensive analyses of distance-affinity, pose-affinity, and substructure-affinity relations suggest that the interaction-based inductive bias can guide the model to learn atomic interactions that are consistent with physical reality. As a case study to demonstrate practical usefulness, our method is tested for predicting the efficacy of Nirmatrelvir against SARS-CoV-2 variants. EHIGN successfully recognizes the changes in the efficacy of Nirmatrelvir for different SARS-CoV-2 variants with meaningful reasons.
Collapse
|
6
|
Vittorio S, Lunghini F, Morerio P, Gadioli D, Orlandini S, Silva P, Jan Martinovic, Pedretti A, Bonanni D, Del Bue A, Palermo G, Vistoli G, Beccari AR. Addressing docking pose selection with structure-based deep learning: Recent advances, challenges and opportunities. Comput Struct Biotechnol J 2024; 23:2141-2151. [PMID: 38827235 PMCID: PMC11141151 DOI: 10.1016/j.csbj.2024.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/15/2024] [Accepted: 05/15/2024] [Indexed: 06/04/2024] Open
Abstract
Molecular docking is a widely used technique in drug discovery to predict the binding mode of a given ligand to its target. However, the identification of the near-native binding pose in docking experiments still represents a challenging task as the scoring functions currently employed by docking programs are parametrized to predict the binding affinity, and, therefore, they often fail to correctly identify the ligand native binding conformation. Selecting the correct binding mode is crucial to obtaining meaningful results and to conveniently optimizing new hit compounds. Deep learning (DL) algorithms have been an area of a growing interest in this sense for their capability to extract the relevant information directly from the protein-ligand structure. Our review aims to present the recent advances regarding the development of DL-based pose selection approaches, discussing limitations and possible future directions. Moreover, a comparison between the performances of some classical scoring functions and DL-based methods concerning their ability to select the correct binding mode is reported. In this regard, two novel DL-based pose selectors developed by us are presented.
Collapse
Affiliation(s)
- Serena Vittorio
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Filippo Lunghini
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| | - Pietro Morerio
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Davide Gadioli
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Sergio Orlandini
- SCAI, SuperComputing Applications and Innovation Department, CINECA, Via dei Tizii 6, Rome 00185, Italy
| | - Paulo Silva
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Jan Martinovic
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Alessandro Pedretti
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Domenico Bonanni
- Department of Physical and Chemical Sciences, University of L′Aquila, via Vetoio, L′Aquila 67010, Italy
| | - Alessio Del Bue
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Gianluca Palermo
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Giulio Vistoli
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Andrea R. Beccari
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| |
Collapse
|
7
|
Cao D, Chen M, Zhang R, Wang Z, Huang M, Yu J, Jiang X, Fan Z, Zhang W, Zhou H, Li X, Fu Z, Zhang S, Zheng M. SurfDock is a surface-informed diffusion generative model for reliable and accurate protein-ligand complex prediction. Nat Methods 2024:10.1038/s41592-024-02516-y. [PMID: 39604569 DOI: 10.1038/s41592-024-02516-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 10/16/2024] [Indexed: 11/29/2024]
Abstract
Accurately predicting protein-ligand interactions is crucial for understanding cellular processes. We introduce SurfDock, a deep-learning method that addresses this challenge by integrating protein sequence, three-dimensional structural graphs and surface-level features into an equivariant architecture. SurfDock employs a generative diffusion model on a non-Euclidean manifold, optimizing molecular translations, rotations and torsions to generate reliable binding poses. Our extensive evaluations across various benchmarks demonstrate SurfDock's superiority over existing methods in docking success rates and adherence to physical constraints. It also exhibits remarkable generalizability to unseen proteins and predicted apo structures, while achieving state-of-the-art performance in virtual screening tasks. In a real-world application, SurfDock identified seven novel hit molecules in a virtual screening project targeting aldehyde dehydrogenase 1B1, a key enzyme in cellular metabolism. This showcases SurfDock's ability to elucidate molecular mechanisms underlying cellular processes. These results highlight SurfDock's potential as a transformative tool in structural biology, offering enhanced accuracy, physical plausibility and practical applicability in understanding protein-ligand interactions.
Collapse
Affiliation(s)
- Duanhua Cao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Mingan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Runze Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhaokun Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Manlin Huang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- Nanchang University, Nanchang, China
| | - Jie Yu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- Lingang Laboratory, Shanghai, China
- School of Information Science and Technology, ShanghaiTech University, Shanghai, China
| | - Xinyu Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhehuan Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Hao Zhou
- Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Zunyun Fu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
8
|
Hong Y, Ha J, Sim J, Lim CJ, Oh KS, Chandrasekaran R, Kim B, Choi J, Ko J, Shin WH, Lee J. Accurate prediction of protein-ligand interactions by combining physical energy functions and graph-neural networks. J Cheminform 2024; 16:121. [PMID: 39497201 PMCID: PMC11536843 DOI: 10.1186/s13321-024-00912-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 10/07/2024] [Indexed: 11/07/2024] Open
Abstract
We introduce an advanced model for predicting protein-ligand interactions. Our approach combines the strengths of graph neural networks with physics-based scoring methods. Existing structure-based machine-learning models for protein-ligand binding prediction often fall short in practical virtual screening scenarios, hindered by the intricacies of binding poses, the chemical diversity of drug-like molecules, and the scarcity of crystallographic data for protein-ligand complexes. To overcome the limitations of existing machine learning-based prediction models, we propose a novel approach that fuses three independent neural network models. One classification model is designed to perform binary prediction of a given protein-ligand complex pose. The other two regression models are trained to predict the binding affinity and root-mean-square deviation of a ligand conformation from an input complex structure. We trained the model to account for both deviations in experimental and predicted binding affinities and pose prediction uncertainties. By effectively integrating the outputs of the triplet neural networks with a physics-based scoring function, our model showed a significantly improved performance in hit identification. The benchmark results with three independent decoy sets demonstrate that our model outperformed existing models in forward screening. Our model achieved top 1% enrichment factors of 32.7 and 23.1 with the CASF2016 and DUD-E benchmark sets, respectively. The benchmark results using the LIT-PCBA set further confirmed its higher average enrichment factors, emphasizing the model's efficiency and generalizability. The model's efficiency was further validated by identifying 23 active compounds from 63 candidates in experimental screening for autotaxin inhibitors, demonstrating its practical applicability in hit discovery.Scientific contributionOur work introduces a novel training strategy for a protein-ligand binding affinity prediction model by integrating the outputs of three independent sub-models and utilizing expertly crafted decoy sets. The model showcases exceptional performance across multiple benchmarks. The high enrichment factors in the LIT-PCBA benchmark demonstrate its potential to accelerate hit discovery.
Collapse
Affiliation(s)
- Yiyu Hong
- Arontier Co., 241, Gangnam-daero, Seocho-gu, Seoul, 06735, Republic of Korea
| | - Junsu Ha
- Arontier Co., 241, Gangnam-daero, Seocho-gu, Seoul, 06735, Republic of Korea
| | - Jaemin Sim
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, 08826, Republic of Korea
| | - Chae Jo Lim
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, Daejeon, 34114, Republic of Korea
| | - Kwang-Seok Oh
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, Daejeon, 34114, Republic of Korea
| | | | - Bomin Kim
- College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea
| | - Jieun Choi
- College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea
| | - Junsu Ko
- Arontier Co., 241, Gangnam-daero, Seocho-gu, Seoul, 06735, Republic of Korea.
| | - Woong-Hee Shin
- Arontier Co., 241, Gangnam-daero, Seocho-gu, Seoul, 06735, Republic of Korea.
- Department of Medicine, Korea University College of Medicine, Seoul, 02841, Republic of Korea.
| | - Juyong Lee
- Arontier Co., 241, Gangnam-daero, Seocho-gu, Seoul, 06735, Republic of Korea.
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, 08826, Republic of Korea.
- Research Institute of Pharmaceutical Science, College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea.
- College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea.
| |
Collapse
|
9
|
Ghislat G, Hernandez-Hernandez S, Piyawajanusorn C, Ballester PJ. Data-centric challenges with the application and adoption of artificial intelligence for drug discovery. Expert Opin Drug Discov 2024; 19:1297-1307. [PMID: 39316009 DOI: 10.1080/17460441.2024.2403639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 09/09/2024] [Indexed: 09/25/2024]
Abstract
INTRODUCTION Artificial intelligence (AI) is exhibiting tremendous potential to reduce the massive costs and long timescales of drug discovery. There are however important challenges currently limiting the impact and scope of AI models. AREAS COVERED In this perspective, the authors discuss a range of data issues (bias, inconsistency, skewness, irrelevance, small size, high dimensionality), how they challenge AI models, and which issue-specific mitigations have been effective. Next, they point out the challenges faced by uncertainty quantification techniques aimed at enhancing and trusting the predictions from these AI models. They also discuss how conceptual errors, unrealistic benchmarks and performance misestimation can confound the evaluation of models and thus their development. Lastly, the authors explain how human bias, whether from AI experts or drug discovery experts, constitutes another challenge that can be alleviated by gaining more prospective experience. EXPERT OPINION AI models are often developed to excel on retrospective benchmarks unlikely to anticipate their prospective performance. As a result, only a few of these models are ever reported to have prospective value (e.g. by discovering potent and innovative drug leads for a therapeutic target). The authors have discussed what can go wrong in practice with AI for drug discovery. The authors hope that this will help inform the decisions of editors, funders investors, and researchers working in this area.
Collapse
Affiliation(s)
- Ghita Ghislat
- Department of Life Sciences, Imperial College London, London, UK
| | | | | | | |
Collapse
|
10
|
Nieto-Fabregat F, Lenza MP, Marseglia A, Di Carluccio C, Molinaro A, Silipo A, Marchetti R. Computational toolbox for the analysis of protein-glycan interactions. Beilstein J Org Chem 2024; 20:2084-2107. [PMID: 39189002 PMCID: PMC11346309 DOI: 10.3762/bjoc.20.180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 08/01/2024] [Indexed: 08/28/2024] Open
Abstract
Protein-glycan interactions play pivotal roles in numerous biological processes, ranging from cellular recognition to immune response modulation. Understanding the intricate details of these interactions is crucial for deciphering the molecular mechanisms underlying various physiological and pathological conditions. Computational techniques have emerged as powerful tools that can help in drawing, building and visualising complex biomolecules and provide insights into their dynamic behaviour at atomic and molecular levels. This review provides an overview of the main computational tools useful for studying biomolecular systems, particularly glycans, both in free state and in complex with proteins, also with reference to the principles, methodologies, and applications of all-atom molecular dynamics simulations. Herein, we focused on the programs that are generally employed for preparing protein and glycan input files to execute molecular dynamics simulations and analyse the corresponding results. The presented computational toolbox represents a valuable resource for researchers studying protein-glycan interactions and incorporates advanced computational methods for building, visualising and predicting protein/glycan structures, modelling protein-ligand complexes, and analyse MD outcomes. Moreover, selected case studies have been reported to highlight the importance of computational tools in studying protein-glycan systems, revealing the capability of these tools to provide valuable insights into the binding kinetics, energetics, and structural determinants that govern specific molecular interactions.
Collapse
Affiliation(s)
- Ferran Nieto-Fabregat
- Department of Chemical Sciences, University of Naples Federico II, Via Cinthia 4, 80126, Italy
| | - Maria Pia Lenza
- Department of Chemical Sciences, University of Naples Federico II, Via Cinthia 4, 80126, Italy
| | - Angela Marseglia
- Department of Chemical Sciences, University of Naples Federico II, Via Cinthia 4, 80126, Italy
| | - Cristina Di Carluccio
- Department of Chemical Sciences, University of Naples Federico II, Via Cinthia 4, 80126, Italy
| | - Antonio Molinaro
- Department of Chemical Sciences, University of Naples Federico II, Via Cinthia 4, 80126, Italy
| | - Alba Silipo
- Department of Chemical Sciences, University of Naples Federico II, Via Cinthia 4, 80126, Italy
| | - Roberta Marchetti
- Department of Chemical Sciences, University of Naples Federico II, Via Cinthia 4, 80126, Italy
| |
Collapse
|
11
|
Zhang C, Zhai Y, Gong Z, Duan H, She YB, Yang YF, Su A. Transfer learning across different chemical domains: virtual screening of organic materials with deep learning models pretrained on small molecule and chemical reaction data. J Cheminform 2024; 16:89. [PMID: 39080777 PMCID: PMC11290278 DOI: 10.1186/s13321-024-00886-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 07/21/2024] [Indexed: 08/02/2024] Open
Abstract
Machine learning is becoming a preferred method for the virtual screening of organic materials due to its cost-effectiveness over traditional computationally demanding techniques. However, the scarcity of labeled data for organic materials poses a significant challenge for training advanced machine learning models. This study showcases the potential of utilizing databases of drug-like small molecules and chemical reactions to pretrain the BERT model, enhancing its performance in the virtual screening of organic materials. By fine-tuning the BERT models with data from five virtual screening tasks, the version pretrained with the USPTO-SMILES dataset achieved R2 scores exceeding 0.94 for three tasks and over 0.81 for two others. This performance surpasses that of models pretrained on the small molecule or organic materials databases and outperforms three traditional machine learning models trained directly on virtual screening data. The success of the USPTO-SMILES pretrained BERT model can be attributed to the diverse array of organic building blocks in the USPTO database, offering a broader exploration of the chemical space. The study further suggests that accessing a reaction database with a wider range of reactions than the USPTO could further enhance model performance. Overall, this research validates the feasibility of applying transfer learning across different chemical domains for the efficient virtual screening of organic materials.Scientific contributionThis study verifies the feasibility of applying transfer learning to large language models in different chemical fields to help organic materials perform virtual screening. Through the comparison of transfer learning from different chemical fields to a variety of organic material molecules, the high precision virtual screening of organic materials is realized.
Collapse
Affiliation(s)
- Chengwei Zhang
- State Key Laboratory Breeding Base of Green Chemistry-Synthesis Technology, Key Laboratory of Green Chemistry-Synthesis Technology of Zhejiang Province, College of Chemical Engineering, Zhejiang University of Technology, Hangzhou, 310014, Zhejiang, China
| | - Yushuang Zhai
- State Key Laboratory Breeding Base of Green Chemistry-Synthesis Technology, Key Laboratory of Green Chemistry-Synthesis Technology of Zhejiang Province, College of Chemical Engineering, Zhejiang University of Technology, Hangzhou, 310014, Zhejiang, China
| | - Ziyang Gong
- Key Laboratory of Pharmaceutical Engineering of Zhejiang Province, Key Laboratory for Green Pharmaceutical Technologies and Related Equipment of Ministry of Education, Collaborative Innovation Center of Yangtze River Delta Region Green Pharmaceuticals, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China
| | - Yuan-Bin She
- State Key Laboratory Breeding Base of Green Chemistry-Synthesis Technology, Key Laboratory of Green Chemistry-Synthesis Technology of Zhejiang Province, College of Chemical Engineering, Zhejiang University of Technology, Hangzhou, 310014, Zhejiang, China
| | - Yun-Fang Yang
- State Key Laboratory Breeding Base of Green Chemistry-Synthesis Technology, Key Laboratory of Green Chemistry-Synthesis Technology of Zhejiang Province, College of Chemical Engineering, Zhejiang University of Technology, Hangzhou, 310014, Zhejiang, China
| | - An Su
- State Key Laboratory Breeding Base of Green Chemistry-Synthesis Technology, Key Laboratory of Green Chemistry-Synthesis Technology of Zhejiang Province, College of Chemical Engineering, Zhejiang University of Technology, Hangzhou, 310014, Zhejiang, China.
- Key Laboratory of Pharmaceutical Engineering of Zhejiang Province, Key Laboratory for Green Pharmaceutical Technologies and Related Equipment of Ministry of Education, Collaborative Innovation Center of Yangtze River Delta Region Green Pharmaceuticals, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China.
| |
Collapse
|
12
|
Schmidt B, Hildebrandt A. From GPUs to AI and quantum: three waves of acceleration in bioinformatics. Drug Discov Today 2024; 29:103990. [PMID: 38663581 DOI: 10.1016/j.drudis.2024.103990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/05/2024] [Accepted: 04/17/2024] [Indexed: 05/01/2024]
Abstract
The enormous growth in the amount of data generated by the life sciences is continuously shifting the field from model-driven science towards data-driven science. The need for efficient processing has led to the adoption of massively parallel accelerators such as graphics processing units (GPUs). Consequently, the development of bioinformatics methods nowadays often heavily depends on the effective use of these powerful technologies. Furthermore, progress in computational techniques and architectures continues to be highly dynamic, involving novel deep neural network models and artificial intelligence (AI) accelerators, and potentially quantum processing units in the future. These are expected to be disruptive for the life sciences as a whole and for drug discovery in particular. Here, we identify three waves of acceleration and their applications in a bioinformatics context: (i) GPU computing, (ii) AI and (iii) next-generation quantum computers.
Collapse
Affiliation(s)
- Bertil Schmidt
- Institut für Informatik, Johannes Gutenberg University, Mainz, Germany.
| | | |
Collapse
|
13
|
Zhou Y, Chen SJ. Advances in machine-learning approaches to RNA-targeted drug design. ARTIFICIAL INTELLIGENCE CHEMISTRY 2024; 2:100053. [PMID: 38434217 PMCID: PMC10904028 DOI: 10.1016/j.aichem.2024.100053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
RNA molecules play multifaceted functional and regulatory roles within cells and have garnered significant attention in recent years as promising therapeutic targets. With remarkable successes achieved by artificial intelligence (AI) in different fields such as computer vision and natural language processing, there is a growing imperative to harness AI's potential in computer-aided drug design (CADD) to discover novel drug compounds that target RNA. Although machine-learning (ML) approaches have been widely adopted in the discovery of small molecules targeting proteins, the application of ML approaches to model interactions between RNA and small molecule is still in its infancy. Compared to protein-targeted drug discovery, the major challenges in ML-based RNA-targeted drug discovery stem from the scarcity of available data resources. With the growing interest and the development of curated databases focusing on interactions between RNA and small molecule, the field anticipates a rapid growth and the opening of a new avenue for disease treatment. In this review, we aim to provide an overview of recent advancements in computationally modeling RNA-small molecule interactions within the context of RNA-targeted drug discovery, with a particular emphasis on methodologies employing ML techniques.
Collapse
Affiliation(s)
- Yuanzhe Zhou
- Department of Physics and Astronomy, University of Missouri, Columbia, MO 65211-7010, USA
| | - Shi-Jie Chen
- Department of Physics and Astronomy, Department of Biochemistry, Institute of Data Sciences and Informatics, University of Missouri, Columbia, MO 65211-7010, USA
| |
Collapse
|
14
|
Caba K, Tran-Nguyen VK, Rahman T, Ballester PJ. Comprehensive machine learning boosts structure-based virtual screening for PARP1 inhibitors. J Cheminform 2024; 16:40. [PMID: 38582911 PMCID: PMC10999096 DOI: 10.1186/s13321-024-00832-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 03/23/2024] [Indexed: 04/08/2024] Open
Abstract
Poly ADP-ribose polymerase 1 (PARP1) is an attractive therapeutic target for cancer treatment. Machine-learning scoring functions constitute a promising approach to discovering novel PARP1 inhibitors. Cutting-edge PARP1-specific machine-learning scoring functions were investigated using semi-synthetic training data from docking activity-labelled molecules: known PARP1 inhibitors, hard-to-discriminate decoys property-matched to them with generative graph neural networks and confirmed inactives. We further made test sets harder by including only molecules dissimilar to those in the training set. Comprehensive analysis of these datasets using five supervised learning algorithms, and protein-ligand fingerprints extracted from docking poses and ligand only features revealed one highly predictive scoring function. This is the PARP1-specific support vector machine-based regressor, when employing PLEC fingerprints, which achieved a high Normalized Enrichment Factor at the top 1% on the hardest test set (NEF1% = 0.588, median of 10 repetitions), and was more predictive than any other investigated scoring function, especially the classical scoring function employed as baseline.
Collapse
Affiliation(s)
- Klaudia Caba
- Department of Bioengineering, Imperial College London, London, SW7 2AZ, UK
| | - Viet-Khoa Tran-Nguyen
- Unité de Biologie Fonctionnelle et Adaptative (BFA), UFR Sciences du Vivant, Université Paris Cité, 75013, Paris, France
| | - Taufiq Rahman
- Department of Pharmacology, University of Cambridge, Cambridge, CB2 1PD, UK
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
15
|
Guo L, Wang J. GSScore: a novel Graphormer-based shell-like scoring method for protein-ligand docking. Brief Bioinform 2024; 25:bbae201. [PMID: 38706316 PMCID: PMC11070652 DOI: 10.1093/bib/bbae201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 02/05/2024] [Accepted: 04/16/2024] [Indexed: 05/07/2024] Open
Abstract
Protein-ligand interactions (PLIs) are essential for cellular activities and drug discovery. But due to the complexity and high cost of experimental methods, there is a great demand for computational approaches to recognize PLI patterns, such as protein-ligand docking. In recent years, more and more models based on machine learning have been developed to directly predict the root mean square deviation (RMSD) of a ligand docking pose with reference to its native binding pose. However, new scoring methods are pressingly needed in methodology for more accurate RMSD prediction. We present a new deep learning-based scoring method for RMSD prediction of protein-ligand docking poses based on a Graphormer method and Shell-like graph architecture, named GSScore. To recognize near-native conformations from a set of poses, GSScore takes atoms as nodes and then establishes the docking interface of protein-ligand into multiple bipartite graphs within different shell ranges. Benefiting from the Graphormer and Shell-like graph architecture, GSScore can effectively capture the subtle differences between energetically favorable near-native conformations and unfavorable non-native poses without extra information. GSScore was extensively evaluated on diverse test sets including a subset of PDBBind version 2019, CASF2016 as well as DUD-E, and obtained significant improvements over existing methods in terms of RMSE, $R$ (Pearson correlation coefficient), Spearman correlation coefficient and Docking power.
Collapse
Affiliation(s)
- Linyuan Guo
- School of Computer Science and Engineering, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
| |
Collapse
|
16
|
Mohebbinia Z, Firouzi R, Karimi-Jafari MH. Improving protein-ligand docking results using the Semiempirical quantum mechanics: testing on the PDBbind 2016 core set. J Biomol Struct Dyn 2024:1-11. [PMID: 38165642 DOI: 10.1080/07391102.2023.2299742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 12/20/2023] [Indexed: 01/04/2024]
Abstract
Molecular docking techniques are routinely employed for predicting ligand binding conformations and affinities in the in silico phase of the drug design and development process. In this study, a reliable semiempirical quantum mechanics (SQM) method, PM7, was employed for geometry optimization of top-ranked poses obtained from two widely used docking programs, AutoDock4 and AutoDock Vina. The PDBbind core set (version 2016), which contains high-quality crystal protein - ligand complexes with their corresponding experimental binding affinities, was used as an initial dataset in this research. It was shown that docking pose optimization improves the accuracy of pose predictions and is very useful for the refinement of docked complexes via removing clashes between ligands and proteins. It was also demonstrated that AutoDock Vina achieves a higher sampling power than AutoDock4 in generating accurate ligand poses (RMSD ≤ 2.0 Å), while AutoDock4 exhibits a better ranking power than AutoDock Vina. Finally, a new protocol based on a combination of the results obtained from the two docking programs was proposed for structure-based virtual screening studies, which benefits from the robust sampling abilities of AutoDock Vina and the reliable ranking performance of AutoDock4.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Zainab Mohebbinia
- Department of Physical Chemistry, Chemistry and Chemical Engineering Research Center of Iran, Tehran, Iran
| | - Rohoullah Firouzi
- Department of Physical Chemistry, Chemistry and Chemical Engineering Research Center of Iran, Tehran, Iran
| | | |
Collapse
|
17
|
Talevi A. Computer-Aided Drug Discovery and Design: Recent Advances and Future Prospects. Methods Mol Biol 2024; 2714:1-20. [PMID: 37676590 DOI: 10.1007/978-1-0716-3441-7_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Computer-aided drug discovery and design involve the use of information technologies to identify and develop, on a rational ground, chemical compounds that align a set of desired physicochemical and biological properties. In its most common form, it involves the identification and/or modification of an active scaffold (or the combination of known active scaffolds), although de novo drug design from scratch is also possible. Traditionally, the drug discovery and design processes have focused on the molecular determinants of the interactions between drug candidates and their known or intended pharmacological target(s). Nevertheless, in modern times, drug discovery and design are conceived as a particularly complex multiparameter optimization task, due to the complicated, often conflicting, property requirements.This chapter provides an updated overview of in silico approaches for identifying active scaffolds and guiding the subsequent optimization process. Recent groundbreaking advances in the field have also analyzed the integration of state-of-the-art machine learning approaches in every step of the drug discovery process (from prediction of target structure to customized molecular docking scoring functions), integration of multilevel omics data, and the use of a diversity of computational approaches to assist target validation and assess plausible binding pockets.
Collapse
Affiliation(s)
- Alan Talevi
- Laboratory of Bioactive Compound Research and Development (LIDeB), Faculty of Exact Sciences, National University of La Plata (UNLP), La Plata, Argentina.
- Argentinean National Council of Scientific and Technical Research (CONICET), La Plata, Argentina.
| |
Collapse
|
18
|
Wu J, Lv J, Zhao L, Zhao R, Gao T, Xu Q, Liu D, Yu Q, Ma F. Exploring the role of microbial proteins in controlling environmental pollutants based on molecular simulation. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 905:167028. [PMID: 37704131 DOI: 10.1016/j.scitotenv.2023.167028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 09/03/2023] [Accepted: 09/10/2023] [Indexed: 09/15/2023]
Abstract
Molecular simulation has been widely used to study microbial proteins' structural composition and dynamic properties, such as volatility, flexibility, and stability at the microscopic scale. Herein, this review describes the key elements of molecular docking and molecular dynamics (MD) simulations in molecular simulation; reviews the techniques combined with molecular simulation, such as crystallography, spectroscopy, molecular biology, and machine learning, to validate simulation results and bridge information gaps in the structure, microenvironmental changes, expression mechanisms, and intensity quantification; illustrates the application of molecular simulation, in characterizing the molecular mechanisms of interaction of microbial proteins with four different types of contaminants, namely heavy metals (HMs), pesticides, dyes and emerging contaminants (ECs). Finally, the review outlines the important role of molecular simulations in the study of microbial proteins for controlling environmental contamination and provides ideas for the application of molecular simulation in screening microbial proteins and incorporating targeted mutagenesis to obtain more effective contaminant control proteins.
Collapse
Affiliation(s)
- Jieting Wu
- School of Environmental Science, Liaoning University, Shenyang 110036, China
| | - Jin Lv
- School of Environmental Science, Liaoning University, Shenyang 110036, China
| | - Lei Zhao
- State Key Laboratory of Urban Water Resources & Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Ruofan Zhao
- School of Environment, Beijing Normal University, Beijing 100875, China
| | - Tian Gao
- Key Laboratory of Integrated Regulation and Resource Development of Shallow Lakes, Ministry of Education, College of Environment, Hohai University, Xikang Road #1, Nanjing 210098, China
| | - Qi Xu
- PetroChina Fushun Petrochemical Company, Fushun 113000, China
| | - Dongbo Liu
- School of Environmental Science, Liaoning University, Shenyang 110036, China
| | - Qiqi Yu
- School of Environmental Science, Liaoning University, Shenyang 110036, China
| | - Fang Ma
- State Key Laboratory of Urban Water Resources & Environment, Harbin Institute of Technology, Harbin 150090, China.
| |
Collapse
|
19
|
Agarwal R, T RR, Smith JC. Comparative Assessment of Pose Prediction Accuracy in RNA-Ligand Docking. J Chem Inf Model 2023; 63:7444-7452. [PMID: 37972310 DOI: 10.1021/acs.jcim.3c01533] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
Structure-based virtual high-throughput screening is used in early-stage drug discovery. Over the years, docking protocols and scoring functions for protein-ligand complexes have evolved to improve the accuracy in the computation of binding strengths and poses. In the past decade, RNA has also emerged as a target class for new small-molecule drugs. However, most ligand docking programs have been validated and tested for proteins and not RNA. Here, we test the docking power (pose prediction accuracy) of three state-of-the-art docking protocols on 173 RNA-small molecule crystal structures. The programs are AutoDock4 (AD4) and AutoDock Vina (Vina), which were designed for protein targets, and rDock, which was designed for both protein and nucleic acid targets. AD4 performed relatively poorly. For RNA targets for which a crystal structure of a bound ligand used to limit the docking search space is available and for which the goal is to identify new molecules for the same pocket, rDock performs slightly better than Vina, with success rates of 48% and 63%, respectively. However, in the more common type of early-stage drug discovery setting, in which no structure of a ligand-target complex is known and for which a larger search space is defined, rDock performed similarly to Vina, with a low success rate of ∼27%. Vina was found to have bias for ligands with certain physicochemical properties, whereas rDock performs similarly for all ligand properties. Thus, for projects where no ligand-protein structure already exists, Vina and rDock are both applicable. However, the relatively poor performance of all methods relative to protein-target docking illustrates a need for further methods refinement.
Collapse
Affiliation(s)
- Rupesh Agarwal
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6309, United States
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee 37996-1939, United States
| | - Rajitha Rajeshwar T
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6309, United States
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee 37996-1939, United States
| | - Jeremy C Smith
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6309, United States
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee 37996-1939, United States
| |
Collapse
|
20
|
Li Y, Fan Z, Rao J, Chen Z, Chu Q, Zheng M, Li X. An overview of recent advances and challenges in predicting compound-protein interaction (CPI). MEDICAL REVIEW (2021) 2023; 3:465-486. [PMID: 38282802 PMCID: PMC10808869 DOI: 10.1515/mr-2023-0030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 08/30/2023] [Indexed: 01/30/2024]
Abstract
Compound-protein interactions (CPIs) are critical in drug discovery for identifying therapeutic targets, drug side effects, and repurposing existing drugs. Machine learning (ML) algorithms have emerged as powerful tools for CPI prediction, offering notable advantages in cost-effectiveness and efficiency. This review provides an overview of recent advances in both structure-based and non-structure-based CPI prediction ML models, highlighting their performance and achievements. It also offers insights into CPI prediction-related datasets and evaluation benchmarks. Lastly, the article presents a comprehensive assessment of the current landscape of CPI prediction, elucidating the challenges faced and outlining emerging trends to advance the field.
Collapse
Affiliation(s)
- Yanbei Li
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhehuan Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jingxin Rao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhiyi Chen
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Qinyu Chu
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
21
|
Xia S, Chen E, Zhang Y. Integrated Molecular Modeling and Machine Learning for Drug Design. J Chem Theory Comput 2023; 19:7478-7495. [PMID: 37883810 PMCID: PMC10653122 DOI: 10.1021/acs.jctc.3c00814] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Modern therapeutic development often involves several stages that are interconnected, and multiple iterations are usually required to bring a new drug to the market. Computational approaches have increasingly become an indispensable part of helping reduce the time and cost of the research and development of new drugs. In this Perspective, we summarize our recent efforts on integrating molecular modeling and machine learning to develop computational tools for modulator design, including a pocket-guided rational design approach based on AlphaSpace to target protein-protein interactions, delta machine learning scoring functions for protein-ligand docking as well as virtual screening, and state-of-the-art deep learning models to predict calculated and experimental molecular properties based on molecular mechanics optimized geometries. Meanwhile, we discuss remaining challenges and promising directions for further development and use a retrospective example of FDA approved kinase inhibitor Erlotinib to demonstrate the use of these newly developed computational tools.
Collapse
Affiliation(s)
- Song Xia
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Eric Chen
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
22
|
Tran-Nguyen VK, Junaid M, Simeon S, Ballester PJ. A practical guide to machine-learning scoring for structure-based virtual screening. Nat Protoc 2023; 18:3460-3511. [PMID: 37845361 DOI: 10.1038/s41596-023-00885-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 07/03/2023] [Indexed: 10/18/2023]
Abstract
Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol , can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.
Collapse
Affiliation(s)
| | - Muhammad Junaid
- Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | - Saw Simeon
- Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | | |
Collapse
|
23
|
Dragan P, Joshi K, Atzei A, Latek D. Keras/TensorFlow in Drug Design for Immunity Disorders. Int J Mol Sci 2023; 24:15009. [PMID: 37834457 PMCID: PMC10573944 DOI: 10.3390/ijms241915009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 09/21/2023] [Accepted: 09/29/2023] [Indexed: 10/15/2023] Open
Abstract
Homeostasis of the host immune system is regulated by white blood cells with a variety of cell surface receptors for cytokines. Chemotactic cytokines (chemokines) activate their receptors to evoke the chemotaxis of immune cells in homeostatic migrations or inflammatory conditions towards inflamed tissue or pathogens. Dysregulation of the immune system leading to disorders such as allergies, autoimmune diseases, or cancer requires efficient, fast-acting drugs to minimize the long-term effects of chronic inflammation. Here, we performed structure-based virtual screening (SBVS) assisted by the Keras/TensorFlow neural network (NN) to find novel compound scaffolds acting on three chemokine receptors: CCR2, CCR3, and one CXC receptor, CXCR3. Keras/TensorFlow NN was used here not as a typically used binary classifier but as an efficient multi-class classifier that can discard not only inactive compounds but also low- or medium-activity compounds. Several compounds proposed by SBVS and NN were tested in 100 ns all-atom molecular dynamics simulations to confirm their binding affinity. To improve the basic binding affinity of the compounds, new chemical modifications were proposed. The modified compounds were compared with known antagonists of these three chemokine receptors. Known CXCR3 compounds were among the top predicted compounds; thus, the benefits of using Keras/TensorFlow in drug discovery have been shown in addition to structure-based approaches. Furthermore, we showed that Keras/TensorFlow NN can accurately predict the receptor subtype selectivity of compounds, for which SBVS often fails. We cross-tested chemokine receptor datasets retrieved from ChEMBL and curated datasets for cannabinoid receptors. The NN model trained on the cannabinoid receptor datasets retrieved from ChEMBL was the most accurate in the receptor subtype selectivity prediction. Among NN models trained on the chemokine receptor datasets, the CXCR3 model showed the highest accuracy in differentiating the receptor subtype for a given compound dataset.
Collapse
Affiliation(s)
- Paulina Dragan
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-903 Warsaw, Poland; (P.D.); (A.A.)
| | - Kavita Joshi
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-903 Warsaw, Poland; (P.D.); (A.A.)
| | - Alessandro Atzei
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-903 Warsaw, Poland; (P.D.); (A.A.)
- Department of Life and Environmental Science, Food Toxicology Unit, University of Cagliari, University Campus of Monserrato, SS 554, 09042 Cagliari, Italy
| | - Dorota Latek
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-903 Warsaw, Poland; (P.D.); (A.A.)
| |
Collapse
|
24
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
25
|
Abstract
Drug development is a wide scientific field that faces many challenges these days. Among them are extremely high development costs, long development times, and a small number of new drugs that are approved each year. New and innovative technologies are needed to solve these problems that make the drug discovery process of small molecules more time and cost efficient, and that allow previously undruggable receptor classes to be targeted, such as protein-protein interactions. Structure-based virtual screenings (SBVSs) have become a leading contender in this context. In this review, we give an introduction to the foundations of SBVSs and survey their progress in the past few years with a focus on ultralarge virtual screenings (ULVSs). We outline key principles of SBVSs, recent success stories, new screening techniques, available deep learning-based docking methods, and promising future research directions. ULVSs have an enormous potential for the development of new small-molecule drugs and are already starting to transform early-stage drug discovery.
Collapse
Affiliation(s)
- Christoph Gorgulla
- Harvard Medical School and Physics Department, Harvard University, Boston, Massachusetts, USA;
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Current affiliation: Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, Tennessee, USA
| |
Collapse
|
26
|
Qureshi R, Irfan M, Gondal TM, Khan S, Wu J, Hadi MU, Heymach J, Le X, Yan H, Alam T. AI in drug discovery and its clinical relevance. Heliyon 2023; 9:e17575. [PMID: 37396052 PMCID: PMC10302550 DOI: 10.1016/j.heliyon.2023.e17575] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 06/17/2023] [Accepted: 06/21/2023] [Indexed: 07/04/2023] Open
Abstract
The COVID-19 pandemic has emphasized the need for novel drug discovery process. However, the journey from conceptualizing a drug to its eventual implementation in clinical settings is a long, complex, and expensive process, with many potential points of failure. Over the past decade, a vast growth in medical information has coincided with advances in computational hardware (cloud computing, GPUs, and TPUs) and the rise of deep learning. Medical data generated from large molecular screening profiles, personal health or pathology records, and public health organizations could benefit from analysis by Artificial Intelligence (AI) approaches to speed up and prevent failures in the drug discovery pipeline. We present applications of AI at various stages of drug discovery pipelines, including the inherently computational approaches of de novo design and prediction of a drug's likely properties. Open-source databases and AI-based software tools that facilitate drug design are discussed along with their associated problems of molecule representation, data collection, complexity, labeling, and disparities among labels. How contemporary AI methods, such as graph neural networks, reinforcement learning, and generated models, along with structure-based methods, (i.e., molecular dynamics simulations and molecular docking) can contribute to drug discovery applications and analysis of drug responses is also explored. Finally, recent developments and investments in AI-based start-up companies for biotechnology, drug design and their current progress, hopes and promotions are discussed in this article.
Collapse
Affiliation(s)
- Rizwan Qureshi
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
- Department of Imaging Physics, MD Anderson Cancer Center, The University of Texas, Houston, USA
| | - Muhammad Irfan
- Faculty of Electrical Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Swabi, Pakistan
| | | | - Sheheryar Khan
- School of Professional Education & Executive Development, The Hong Kong Polytechnic University, Hong Kong
| | - Jia Wu
- Department of Imaging Physics, MD Anderson Cancer Center, The University of Texas, Houston, USA
| | | | - John Heymach
- Department of Thoracic Head and Neck Medical Oncology, Division of Cancer Medicine, The University of Texas, MD Anderson Cancer Center, Houston, USA
| | - Xiuning Le
- Department of Thoracic Head and Neck Medical Oncology, Division of Cancer Medicine, The University of Texas, MD Anderson Cancer Center, Houston, USA
| | - Hong Yan
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
27
|
Purisima EO, Corbeil CR, Gaudreault F, Wei W, Deprez C, Sulea T. Solvated interaction energy: from small-molecule to antibody drug design. Front Mol Biosci 2023; 10:1210576. [PMID: 37351549 PMCID: PMC10282643 DOI: 10.3389/fmolb.2023.1210576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 05/26/2023] [Indexed: 06/24/2023] Open
Abstract
Scoring functions are ubiquitous in structure-based drug design as an aid to predicting binding modes and estimating binding affinities. Ideally, a scoring function should be broadly applicable, obviating the need to recalibrate and refit its parameters for every new target and class of ligands. Traditionally, drugs have been small molecules, but in recent years biologics, particularly antibodies, have become an increasingly important if not dominant class of therapeutics. This makes the goal of having a transferable scoring function, i.e., one that spans the range of small-molecule to protein ligands, even more challenging. One such broadly applicable scoring function is the Solvated Interaction Energy (SIE), which has been developed and applied in our lab for the last 15 years, leading to several important applications. This physics-based method arose from efforts to understand the physics governing binding events, with particular care given to the role played by solvation. SIE has been used by us and many independent labs worldwide for virtual screening and discovery of novel small-molecule binders or optimization of known drugs. Moreover, without any retraining, it is found to be transferrable to predictions of antibody-antigen relative binding affinities and as accurate as functions trained on protein-protein binding affinities. SIE has been incorporated in conjunction with other scoring functions into ADAPT (Assisted Design of Antibody and Protein Therapeutics), our platform for affinity modulation of antibodies. Application of ADAPT resulted in the optimization of several antibodies with 10-to-100-fold improvements in binding affinity. Further applications included broadening the specificity of a single-domain antibody to be cross-reactive with virus variants of both SARS-CoV-1 and SARS-CoV-2, and the design of safer antibodies by engineering of a pH switch to make them more selective towards acidic tumors while sparing normal tissues at physiological pH.
Collapse
|
28
|
Yan X, Yue T, Winkler DA, Yin Y, Zhu H, Jiang G, Yan B. Converting Nanotoxicity Data to Information Using Artificial Intelligence and Simulation. Chem Rev 2023. [PMID: 37262026 DOI: 10.1021/acs.chemrev.3c00070] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Decades of nanotoxicology research have generated extensive and diverse data sets. However, data is not equal to information. The question is how to extract critical information buried in vast data streams. Here we show that artificial intelligence (AI) and molecular simulation play key roles in transforming nanotoxicity data into critical information, i.e., constructing the quantitative nanostructure (physicochemical properties)-toxicity relationships, and elucidating the toxicity-related molecular mechanisms. For AI and molecular simulation to realize their full impacts in this mission, several obstacles must be overcome. These include the paucity of high-quality nanomaterials (NMs) and standardized nanotoxicity data, the lack of model-friendly databases, the scarcity of specific and universal nanodescriptors, and the inability to simulate NMs at realistic spatial and temporal scales. This review provides a comprehensive and representative, but not exhaustive, summary of the current capability gaps and tools required to fill these formidable gaps. Specifically, we discuss the applications of AI and molecular simulation, which can address the large-scale data challenge for nanotoxicology research. The need for model-friendly nanotoxicity databases, powerful nanodescriptors, new modeling approaches, molecular mechanism analysis, and design of the next-generation NMs are also critically discussed. Finally, we provide a perspective on future trends and challenges.
Collapse
Affiliation(s)
- Xiliang Yan
- Institute of Environmental Research at the Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Tongtao Yue
- Key Laboratory of Marine Environment and Ecology, Ministry of Education, Institute of Coastal Environmental Pollution Control, Ocean University of China, Qingdao 266100, China
| | - David A Winkler
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria 3052, Australia
- School of Pharmacy, University of Nottingham, Nottingham NG7 2QL, U.K
- Department of Biochemistry and Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria 3086, Australia
| | - Yongguang Yin
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Hao Zhu
- Department of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Bing Yan
- Institute of Environmental Research at the Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| |
Collapse
|
29
|
Pliushcheuskaya P, Künze G. Recent Advances in Computer-Aided Structure-Based Drug Design on Ion Channels. Int J Mol Sci 2023; 24:ijms24119226. [PMID: 37298178 DOI: 10.3390/ijms24119226] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 05/16/2023] [Accepted: 05/22/2023] [Indexed: 06/12/2023] Open
Abstract
Ion channels play important roles in fundamental biological processes, such as electric signaling in cells, muscle contraction, hormone secretion, and regulation of the immune response. Targeting ion channels with drugs represents a treatment option for neurological and cardiovascular diseases, muscular degradation disorders, and pathologies related to disturbed pain sensation. While there are more than 300 different ion channels in the human organism, drugs have been developed only for some of them and currently available drugs lack selectivity. Computational approaches are an indispensable tool for drug discovery and can speed up, especially, the early development stages of lead identification and optimization. The number of molecular structures of ion channels has considerably increased over the last ten years, providing new opportunities for structure-based drug development. This review summarizes important knowledge about ion channel classification, structure, mechanisms, and pathology with the main focus on recent developments in the field of computer-aided, structure-based drug design on ion channels. We highlight studies that link structural data with modeling and chemoinformatic approaches for the identification and characterization of new molecules targeting ion channels. These approaches hold great potential to advance research on ion channel drugs in the future.
Collapse
Affiliation(s)
- Palina Pliushcheuskaya
- Institute for Drug Discovery, Medical Faculty, University of Leipzig, Brüderstr. 34, D-04103 Leipzig, Germany
| | - Georg Künze
- Institute for Drug Discovery, Medical Faculty, University of Leipzig, Brüderstr. 34, D-04103 Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstr. 16-18, D-04107 Leipzig, Germany
| |
Collapse
|
30
|
Basu S, Gsponer J, Kurgan L. DEPICTER2: a comprehensive webserver for intrinsic disorder and disorder function prediction. Nucleic Acids Res 2023:7151337. [PMID: 37140058 DOI: 10.1093/nar/gkad330] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 04/12/2023] [Accepted: 04/18/2023] [Indexed: 05/05/2023] Open
Abstract
Intrinsic disorder in proteins is relatively abundant in nature and essential for a broad spectrum of cellular functions. While disorder can be accurately predicted from protein sequences, as it was empirically demonstrated in recent community-organized assessments, it is rather challenging to collect and compile a comprehensive prediction that covers multiple disorder functions. To this end, we introduce the DEPICTER2 (DisorderEd PredictIon CenTER) webserver that offers convenient access to a curated collection of fast and accurate disorder and disorder function predictors. This server includes a state-of-the-art disorder predictor, flDPnn, and five modern methods that cover all currently predictable disorder functions: disordered linkers and protein, peptide, DNA, RNA and lipid binding. DEPICTER2 allows selection of any combination of the six methods, batch predictions of up to 25 proteins per request and provides interactive visualization of the resulting predictions. The webserver is freely available at http://biomine.cs.vcu.edu/servers/DEPICTER2/.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
31
|
Cavasotto CN, Di Filippo JI. The Impact of Supervised Learning Methods in Ultralarge High-Throughput Docking. J Chem Inf Model 2023; 63:2267-2280. [PMID: 37036491 DOI: 10.1021/acs.jcim.2c01471] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]
Abstract
Structure-based virtual screening methods are, nowadays, one of the key pillars of computational drug discovery. In recent years, a series of studies have reported docking-based virtual screening campaigns of large databases ranging from hundreds to thousands of millions compounds, further identifying novel hits after experimental validation. As these larg-scale efforts are not generally accessible, machine learning-based protocols have emerged to accelerate the identification of virtual hits within an ultralarge chemical space, reaching impressive reductions in computational time. Herein, we illustrate the motivation and the problem behind the screening of large databases, providing an overview of key concepts and essential applications of machine learning-accelerated protocols, specifically concerning supervised learning methods. We also discuss where the field stands with these novel developments, highlighting possible insights for future studies.
Collapse
Affiliation(s)
- Claudio N Cavasotto
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Facultad de Ciencias Biomédicas, and Facultad de Ingeniería, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Austral Institute for Applied Artificial Intelligence, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
| | - Juan I Di Filippo
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Facultad de Ciencias Biomédicas, and Facultad de Ingeniería, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Austral Institute for Applied Artificial Intelligence, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
| |
Collapse
|
32
|
Tran-Nguyen VK, Ballester PJ. Beware of Simple Methods for Structure-Based Virtual Screening: The Critical Importance of Broader Comparisons. J Chem Inf Model 2023; 63:1401-1405. [PMID: 36848585 PMCID: PMC10015451 DOI: 10.1021/acs.jcim.3c00218] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
We discuss how data unbiasing and simple methods such as protein-ligand Interaction FingerPrint (IFP) can overestimate virtual screening performance. We also show that IFP is strongly outperformed by target-specific machine-learning scoring functions, which were not considered in a recent report concluding that simple methods were better than machine-learning scoring functions at virtual screening.
Collapse
Affiliation(s)
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, U.K
| |
Collapse
|
33
|
Szwabowski GL, Baker DL, Parrill AL. Application of computational methods for class A GPCR Ligand discovery. J Mol Graph Model 2023; 121:108434. [PMID: 36841204 DOI: 10.1016/j.jmgm.2023.108434] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 02/11/2023] [Accepted: 02/13/2023] [Indexed: 02/22/2023]
Abstract
G protein-coupled receptors (GPCR) are integral membrane proteins of considerable interest as targets for drug development due to their role in transmitting cellular signals in a multitude of biological processes. Of the six classes categorizing GPCR (A, B, C, D, E, and F), class A contains the largest number of therapeutically relevant GPCR. Despite their importance as drug targets, many challenges exist for the discovery of novel class A GPCR ligands serving as drug precursors. Though knowledge of the structural and functional characteristics of GPCR has grown significantly over the past 20 years, a large portion of GPCR lack reported, experimentally determined structures. Furthermore, many GPCR have no known endogenous and/or synthetic ligands, limiting further exploration of their biochemical, cellular, and physiological roles. While many successes in GPCR ligand discovery have resulted from experimental high-throughput screening, computational methods have played an increasingly important role in GPCR ligand identification in the past decade. Here we discuss computational techniques applied to GPCR ligand discovery. This review summarizes class A GPCR structure/function and provides an overview of many obstacles currently faced in GPCR ligand discovery. Furthermore, we discuss applications and recent successes of computational techniques used to predict GPCR structure as well as present a summary of ligand- and structure-based methods used to identify potential GPCR ligands. Finally, we discuss computational hit list generation and refinement and provide comprehensive workflows for GPCR ligand identification.
Collapse
Affiliation(s)
| | - Daniel L Baker
- Department of Chemistry, The University of Memphis, Memphis, TN, 38152, USA
| | - Abby L Parrill
- Department of Chemistry, The University of Memphis, Memphis, TN, 38152, USA.
| |
Collapse
|
34
|
Predicting Potent Compounds Using a Conditional Variational Autoencoder Based upon a New Structure-Potency Fingerprint. Biomolecules 2023; 13:biom13020393. [PMID: 36830761 PMCID: PMC9953226 DOI: 10.3390/biom13020393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 02/07/2023] [Accepted: 02/16/2023] [Indexed: 02/22/2023] Open
Abstract
Prediction of the potency of bioactive compounds generally relies on linear or nonlinear quantitative structure-activity relationship (QSAR) models. Nonlinear models are generated using machine learning methods. We introduce a novel approach for potency prediction that depends on a newly designed molecular fingerprint (FP) representation. This structure-potency fingerprint (SPFP) combines different modules accounting for the structural features of active compounds and their potency values in a single bit string, hence unifying structure and potency representation. This encoding enables the derivation of a conditional variational autoencoder (CVAE) using SPFPs of training compounds and apply the model to predict the SPFP potency module of test compounds using only their structure module as input. The SPFP-CVAE approach correctly predicts the potency values of compounds belonging to different activity classes with an accuracy comparable to support vector regression (SVR), representing the state-of-the-art in the field. In addition, highly potent compounds are predicted with very similar accuracy as SVR and deep neural networks.
Collapse
|
35
|
Machine Learning Scoring Functions for Drug Discovery from Experimental and Computer-Generated Protein-Ligand Structures: Towards Per-Target Scoring Functions. Molecules 2023; 28:molecules28041661. [PMID: 36838647 PMCID: PMC9966217 DOI: 10.3390/molecules28041661] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/05/2023] [Accepted: 02/06/2023] [Indexed: 02/12/2023] Open
Abstract
In recent years, machine learning has been proposed as a promising strategy to build accurate scoring functions for computational docking finalized to numerically empowered drug discovery. However, the latest studies have suggested that over-optimistic results had been reported due to the correlations present in the experimental databases used for training and testing. Here, we investigate the performance of an artificial neural network in binding affinity predictions, comparing results obtained using both experimental protein-ligand structures as well as larger sets of computer-generated structures created using commercial software. Interestingly, similar performances are obtained on both databases. We find a noticeable performance suppression when moving from random horizontal tests to vertical tests performed on target proteins not included in the training data. The possibility to train the network on relatively easily created computer-generated databases leads us to explore per-target scoring functions, trained and tested ad-hoc on complexes including only one target protein. Encouraging results are obtained, depending on the type of protein being addressed.
Collapse
|
36
|
McNair D. Artificial Intelligence and Machine Learning for Lead-to-Candidate Decision-Making and Beyond. Annu Rev Pharmacol Toxicol 2023; 63:77-97. [PMID: 35679624 DOI: 10.1146/annurev-pharmtox-051921-023255] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The use of artificial intelligence (AI) and machine learning (ML) in pharmaceutical research and development has to date focused on research: target identification; docking-, fragment-, and motif-based generation of compound libraries; modeling of synthesis feasibility; rank-ordering likely hits according to structural and chemometric similarity to compounds having known activity and affinity to the target(s); optimizing a smaller library for synthesis and high-throughput screening; and combining evidence from screening to support hit-to-lead decisions. Applying AI/ML methods to lead optimization and lead-to-candidate (L2C) decision-making has shown slower progress, especially regarding predicting absorption, distribution, metabolism, excretion, and toxicology properties. The present review surveys reasons why this is so, reports progress that has occurred in recent years, and summarizes some of the issues that remain. Effective AI/ML tools to derisk L2C and later phases of development are important to accelerate the pharmaceutical development process, ameliorate escalating development costs, and achieve greater success rates.
Collapse
Affiliation(s)
- Douglas McNair
- Global Health, Integrated Development, Bill & Melinda Gates Foundation, Seattle, Washington, USA;
| |
Collapse
|
37
|
Combining machine‐learning and molecular‐modeling methods for drug‐target affinity predictions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
38
|
Simple nearest-neighbour analysis meets the accuracy of compound potency predictions using complex machine learning models. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00581-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
39
|
Assessing How Residual Errors of Scoring Functions Correlate to Ligand Structural Features. Int J Mol Sci 2022; 23:ijms232315018. [PMID: 36499344 PMCID: PMC9739603 DOI: 10.3390/ijms232315018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Revised: 11/08/2022] [Accepted: 11/10/2022] [Indexed: 12/02/2022] Open
Abstract
Scoring functions (SFs) are ubiquitous tools for early stage drug discovery. However, their accuracy currently remains quite moderate. Despite a number of successful target-specific SFs appearing recently, up until now, no ideas on how to systematically improve the general scope of SFs have been formulated. In this work, we hypothesized that the specific features of ligands, corresponding to interactions well appreciated by medicinal chemists (e.g., hydrogen bonds, hydrophobic and aromatic interactions), might be responsible, in part, for the remaining SF errors. The latter provides direction to efforts aimed at the rational and systematic improvement of SF accuracy. In this proof-of-concept work, we took a CASF-2016 coreset of 285 ligands as a basis for comparison and calculated the values of scores for a representative panel of SFs (including AutoDock 4.2, AutoDock Vina, X-Score, NNScore2.0, ΔVina RF20, and DSX). The residual error of linear correlation of each SF value, with the experimental values of affinity and activity, was then analyzed in terms of its correlation with the presence of the fragments responsible for certain medicinal chemistry defined interactions. We showed that, despite the fact that SFs generally perform reasonably, there is room for improvement in terms of better parameterization of interactions involving certain fragments in ligands. Thus, this approach opens a potential way for the systematic improvement of SFs without their significant complication. However, the straightforward application of the proposed approach is limited by the scarcity of reliable available data for ligand-receptor complexes, which is a common problem in the field.
Collapse
|
40
|
García-Ortegón M, Simm GNC, Tripp AJ, Hernández-Lobato JM, Bender A, Bacallado S. DOCKSTRING: Easy Molecular Docking Yields Better Benchmarks for Ligand Design. J Chem Inf Model 2022; 62:3486-3502. [PMID: 35849793 PMCID: PMC9364321 DOI: 10.1021/acs.jcim.1c01334] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Indexed: 01/05/2023]
Abstract
The field of machine learning for drug discovery is witnessing an explosion of novel methods. These methods are often benchmarked on simple physicochemical properties such as solubility or general druglikeness, which can be readily computed. However, these properties are poor representatives of objective functions in drug design, mainly because they do not depend on the candidate compound's interaction with the target. By contrast, molecular docking is a widely applied method in drug discovery to estimate binding affinities. However, docking studies require a significant amount of domain knowledge to set up correctly, which hampers adoption. Here, we present dockstring, a bundle for meaningful and robust comparison of ML models using docking scores. dockstring consists of three components: (1) an open-source Python package for straightforward computation of docking scores, (2) an extensive dataset of docking scores and poses of more than 260,000 molecules for 58 medically relevant targets, and (3) a set of pharmaceutically relevant benchmark tasks such as virtual screening or de novo design of selective kinase inhibitors. The Python package implements a robust ligand and target preparation protocol that allows nonexperts to obtain meaningful docking scores. Our dataset is the first to include docking poses, as well as the first of its size that is a full matrix, thus facilitating experiments in multiobjective optimization and transfer learning. Overall, our results indicate that docking scores are a more realistic evaluation objective than simple physicochemical properties, yielding benchmark tasks that are more challenging and more closely related to real problems in drug discovery.
Collapse
Affiliation(s)
- Miguel García-Ortegón
- Statistical
Laboratory, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Rd., Cambridge CB3 0WB, United Kingdom
| | - Gregor N. C. Simm
- Department
of Engineering, University of Cambridge, Trumpington St., Cambridge CB2 1PZ, United Kingdom
| | - Austin J. Tripp
- Department
of Engineering, University of Cambridge, Trumpington St., Cambridge CB2 1PZ, United Kingdom
| | | | - Andreas Bender
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield
Rd., Cambridge CB2 1EW, United Kingdom
| | - Sergio Bacallado
- Statistical
Laboratory, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Rd., Cambridge CB3 0WB, United Kingdom
| |
Collapse
|
41
|
Shen C, Zhang X, Deng Y, Gao J, Wang D, Xu L, Pan P, Hou T, Kang Y. Boosting Protein-Ligand Binding Pose Prediction and Virtual Screening Based on Residue-Atom Distance Likelihood Potential and Graph Transformer. J Med Chem 2022; 65:10691-10706. [PMID: 35917397 DOI: 10.1021/acs.jmedchem.2c00991] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The past few years have witnessed enormous progress toward applying machine learning approaches to the development of protein-ligand scoring functions. However, the robust performance and wide applicability of scoring functions remain a big challenge for increasing the success rate of docking-based virtual screening. Herein, a novel scoring function named RTMScore was developed by introducing a tailored residue-based graph representation strategy and several graph transformer layers for the learning of protein and ligand representations, followed by a mixture density network to obtain residue-atom distance likelihood potential. Our approach was resolutely validated on the CASF-2016 benchmark, and the results indicate that RTMScore can outperform almost all of the other state-of-the-art methods in terms of both the docking and screening powers. Further evaluation confirms the robustness of our approach that can not only retain its docking power on cross-docked poses but also achieve improved performance as a rescoring tool in larger-scale virtual screening.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Junbo Gao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
42
|
McGibbon M, Money-Kyrle S, Blay V, Houston DR. SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation. J Adv Res 2022; 46:135-147. [PMID: 35901959 PMCID: PMC10105235 DOI: 10.1016/j.jare.2022.07.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Revised: 07/08/2022] [Accepted: 07/09/2022] [Indexed: 11/17/2022] Open
Abstract
INTRODUCTION The discovery of a new drug is a costly and lengthy endeavour. The computational prediction of which small molecules can bind to a protein target can accelerate this process if the predictions are fast and accurate enough. Recent machine-learning scoring functions re-evaluate the output of molecular docking to achieve more accurate predictions. However, previous scoring functions were trained on crystalised protein-ligand complexes and datasets of decoys. The limited availability of crystal structures and biases in the decoy datasets can lower the performance of scoring functions. OBJECTIVES To address key limitations of previous scoring functions and thus improve the predictive performance of structure-based virtual screening. METHODS A novel machine-learning scoring function was created, named SCORCH (Scoring COnsensus for RMSD-based Classification of Hits). To develop SCORCH, training data is augmented by considering multiple ligand poses and labelling poses based on their RMSD from the native pose. Decoy bias is addressed by generating property-matched decoys for each ligand and using the same methodology for preparing and docking decoys and ligands. A consensus of 3 different machine learning approaches is also used to improve performance. RESULTS We find that multi-pose augmentation in SCORCH improves its docking power and screening power on independent benchmark datasets. SCORCH outperforms an equivalent scoring function trained on single poses, with a 1% enrichment factor (EF) of 13.78 vs. 10.86 on 18 DEKOIS 2.0 targets and a mean native pose rank of 5.9 vs 30.4 on CSAR 2014. Additionally, SCORCH outperforms widely used scoring functions in virtual screening and pose prediction on independent benchmark datasets. CONCLUSION By rationally addressing key limitations of previous scoring functions, SCORCH improves the performance of virtual screening. SCORCH also provides an estimate of its uncertainty, which can help reduce the cost and time required for drug discovery.
Collapse
Affiliation(s)
- Miles McGibbon
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK
| | - Sam Money-Kyrle
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK
| | - Vincent Blay
- Department of Microbiology and Environmental Toxicology, University of California at Santa Cruz, Santa Cruz, CA 95064, USA; Institute for Integrative Systems Biology (I(2)SysBio), Universitat de València and Spanish Research Council (CSIC), 46980 Valencia, Spain.
| | - Douglas R Houston
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK.
| |
Collapse
|
43
|
Yang C, Chen EA, Zhang Y. Protein-Ligand Docking in the Machine-Learning Era. Molecules 2022; 27:4568. [PMID: 35889440 PMCID: PMC9323102 DOI: 10.3390/molecules27144568] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 07/14/2022] [Indexed: 11/16/2022] Open
Abstract
Molecular docking plays a significant role in early-stage drug discovery, from structure-based virtual screening (VS) to hit-to-lead optimization, and its capability and predictive power is critically dependent on the protein-ligand scoring function. In this review, we give a broad overview of recent scoring function development, as well as the docking-based applications in drug discovery. We outline the strategies and resources available for structure-based VS and discuss the assessment and development of classical and machine learning protein-ligand scoring functions. In particular, we highlight the recent progress of machine learning scoring function ranging from descriptor-based models to deep learning approaches. We also discuss the general workflow and docking protocols of structure-based VS, such as structure preparation, binding site detection, docking strategies, and post-docking filter/re-scoring, as well as a case study on the large-scale docking-based VS test on the LIT-PCBA data set.
Collapse
Affiliation(s)
- Chao Yang
- Department of Chemistry, New York University, New York, NY 10003, USA; (C.Y.); (E.A.C.)
| | - Eric Anthony Chen
- Department of Chemistry, New York University, New York, NY 10003, USA; (C.Y.); (E.A.C.)
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, NY 10003, USA; (C.Y.); (E.A.C.)
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
44
|
Gao K, Wang R, Chen J, Cheng L, Frishcosy J, Huzumi Y, Qiu Y, Schluckbier T, Wei X, Wei GW. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2. Chem Rev 2022; 122:11287-11368. [PMID: 35594413 PMCID: PMC9159519 DOI: 10.1021/acs.chemrev.1c00965] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Despite tremendous efforts in the past two years, our understanding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), virus-host interactions, immune response, virulence, transmission, and evolution is still very limited. This limitation calls for further in-depth investigation. Computational studies have become an indispensable component in combating coronavirus disease 2019 (COVID-19) due to their low cost, their efficiency, and the fact that they are free from safety and ethical constraints. Additionally, the mechanism that governs the global evolution and transmission of SARS-CoV-2 cannot be revealed from individual experiments and was discovered by integrating genotyping of massive viral sequences, biophysical modeling of protein-protein interactions, deep mutational data, deep learning, and advanced mathematics. There exists a tsunami of literature on the molecular modeling, simulations, and predictions of SARS-CoV-2 and related developments of drugs, vaccines, antibodies, and diagnostics. To provide readers with a quick update about this literature, we present a comprehensive and systematic methodology-centered review. Aspects such as molecular biophysics, bioinformatics, cheminformatics, machine learning, and mathematics are discussed. This review will be beneficial to researchers who are looking for ways to contribute to SARS-CoV-2 studies and those who are interested in the status of the field.
Collapse
Affiliation(s)
- Kaifu Gao
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Rui Wang
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Jiahui Chen
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Limei Cheng
- Clinical
Pharmacology and Pharmacometrics, Bristol
Myers Squibb, Princeton, New Jersey 08536, United States
| | - Jaclyn Frishcosy
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuta Huzumi
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuchi Qiu
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Tom Schluckbier
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Xiaoqi Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
45
|
Blay V, Gailiunaite S, Lee CY, Chang HY, Hupp T, Houston DR, Chi P. Comparison of ATP-binding pockets and discovery of homologous recombination inhibitors. Bioorg Med Chem 2022; 70:116923. [PMID: 35841829 DOI: 10.1016/j.bmc.2022.116923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 06/16/2022] [Accepted: 07/06/2022] [Indexed: 11/02/2022]
Abstract
The ATP binding sites of many enzymes are structurally related, which complicates their development as therapeutic targets. In this work, we explore a diverse set of ATPases and compare their ATP binding pockets using different strategies, including direct and indirect structural methods, in search of pockets attractive for drug discovery. We pursue different direct and indirect structural strategies, as well as ligandability assessments to help guide target selection. The analyses indicate human RAD51, an enzyme crucial in homologous recombination, as a promising, tractable target. Inhibition of RAD51 has shown promise in the treatment of certain cancers but more potent inhibitors are needed. Thus, we design compounds computationally against the ATP binding pocket of RAD51 with consideration of multiple criteria, including predicted specificity, drug-likeness, and toxicity. The molecules designed are evaluated experimentally using molecular and cell-based assays. Our results provide two novel hit compounds against RAD51 and illustrate a computational pipeline to design new inhibitors against ATPases.
Collapse
Affiliation(s)
- Vincent Blay
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK; Department of Microbiology and Environmental Toxicology, University of California at Santa Cruz, Santa Cruz, CA 95064, USA; Institute for Integrative Systems Biology (I2Sysbio), Universitat de València and Spanish Research Council (CSIC), 46980 Valencia, Spain.
| | - Saule Gailiunaite
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK
| | - Chih-Ying Lee
- Institute of Biochemical Sciences, National Taiwan University, Taipei 10617, Taiwan
| | - Hao-Yen Chang
- Institute of Biochemical Sciences, National Taiwan University, Taipei 10617, Taiwan
| | - Ted Hupp
- MRC Institute of Genetics & Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - Douglas R Houston
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK.
| | - Peter Chi
- Institute of Biochemical Sciences, National Taiwan University, Taipei 10617, Taiwan; Institute of Biological Chemistry, Academia Sinica, Taipei 11529, Taiwan
| |
Collapse
|
46
|
Villalobos-Alva J, Ochoa-Toledo L, Villalobos-Alva MJ, Aliseda A, Pérez-Escamirosa F, Altamirano-Bustamante NF, Ochoa-Fernández F, Zamora-Solís R, Villalobos-Alva S, Revilla-Monsalve C, Kemper-Valverde N, Altamirano-Bustamante MM. Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field. Front Bioeng Biotechnol 2022; 10:788300. [PMID: 35875501 PMCID: PMC9301016 DOI: 10.3389/fbioe.2022.788300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open
Abstract
Proteins are some of the most fascinating and challenging molecules in the universe, and they pose a big challenge for artificial intelligence. The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. This opens up epistemic horizons thanks to a coupling of human tacit-explicit knowledge with machine learning power, the benefits of which are already tangible, such as important advances in protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment, and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. There are many tasks such as novel protein design, protein folding pathways, and synthetic metabolic routes, as well as protein-aggregation mechanisms, pathogenesis of protein misfolding and disease, and proteostasis networks that are currently unexplored or unrevealed. In this systematic review and biochemical meta-analysis, we aim to contribute to bridging the gap between what we call binomial artificial intelligence (AI) and protein science (PS), a growing research enterprise with exciting and promising biotechnological and biomedical applications. We undertake our task by exploring "the state of the art" in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain, including What kind of tasks are already explored by ML approaches to protein sciences? What are the most common ML algorithms and databases used? What is the situational diagnostic of the AI-PS inter-field? What do ML processing steps have in common? We also formulate novel questions such as Is it possible to discover what the rules of protein evolution are with the binomial AI-PS? How do protein folding pathways evolve? What are the rules that dictate the folds? What are the minimal nuclear protein structures? How do protein aggregates form and why do they exhibit different toxicities? What are the structural properties of amyloid proteins? How can we design an effective proteostasis network to deal with misfolded proteins? We are a cross-functional group of scientists from several academic disciplines, and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO Web of Science), resulting in 144 research articles. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings is as follows: regarding AI applications, there are mainly four types: 1) genomics, 2) protein structure and function, 3) protein design and evolution, and 4) drug design. In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common ones (21 and 8%, respectively). Moreover, we identified that approximately 63% of the articles organized their results into three steps, which we labeled pre-process, process, and post-process. A few studies combined data from several databases or created their own databases after the pre-process. Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI-PS binomial. All research efforts to collect, integrate multidimensional data features, and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in drug design, protein structures, design, and function prediction while also presenting the "state of the art" on research in the AI-PS binomial until February 2021. Thus, we pave the way toward future advances in the synthetic redesign of novel proteins and protein networks and artificial metabolic pathways, learning lessons from nature for the welfare of humankind. Many of the novel proteins and metabolic pathways are currently non-existent in nature, nor are they used in the chemical industry or biomedical field.
Collapse
Affiliation(s)
- Jalil Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Luis Ochoa-Toledo
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Mario Javier Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Atocha Aliseda
- Instituto de Investigaciones Filosóficas, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Fernando Pérez-Escamirosa
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | | | - Francine Ochoa-Fernández
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Ricardo Zamora-Solís
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Sebastián Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Cristina Revilla-Monsalve
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Nicolás Kemper-Valverde
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Myriam M. Altamirano-Bustamante
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| |
Collapse
|
47
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
48
|
Yang C, Zhang Y. Delta Machine Learning to Improve Scoring-Ranking-Screening Performances of Protein-Ligand Scoring Functions. J Chem Inf Model 2022; 62:2696-2712. [PMID: 35579568 DOI: 10.1021/acs.jcim.2c00485] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Protein-ligand scoring functions are widely used in structure-based drug design for fast evaluation of protein-ligand interactions, and it is of strong interest to develop scoring functions with machine-learning approaches. In this work, by expanding the training set, developing physically meaningful features, employing our recently developed linear empirical scoring function Lin_F9 (Yang, C. J. Chem. Inf. Model. 2021, 61, 4630-4644) as the baseline, and applying extreme gradient boosting (XGBoost) with Δ-machine learning, we have further improved the robustness and applicability of machine-learning scoring functions. Besides the top performances for scoring-ranking-screening power tests of the CASF-2016 benchmark, the new scoring function ΔLin_F9XGB also achieves superior scoring and ranking performances in different structure types that mimic real docking applications. The scoring powers of ΔLin_F9XGB for locally optimized poses, flexible redocked poses, and ensemble docked poses of the CASF-2016 core set achieve Pearson's correlation coefficient (R) values of 0.853, 0.839, and 0.813, respectively. In addition, the large-scale docking-based virtual screening test on the LIT-PCBA data set demonstrates the reliability and robustness of ΔLin_F9XGB in virtual screening application. The ΔLin_F9XGB scoring function and its code are freely available on the web at (https://yzhang.hpc.nyu.edu/Delta_LinF9_XGB).
Collapse
Affiliation(s)
- Chao Yang
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
49
|
Shulga DA, Ivanov NN, Palyulin VA. In Silico Structure-Based Approach for Group Efficiency Estimation in Fragment-Based Drug Design Using Evaluation of Fragment Contributions. Molecules 2022; 27:1985. [PMID: 35335347 PMCID: PMC8951103 DOI: 10.3390/molecules27061985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 03/10/2022] [Accepted: 03/15/2022] [Indexed: 12/10/2022] Open
Abstract
The notion of a contribution of a specific group in an organic molecule's property and/or activity is both common in our thinking and is still not strictly correct due to the inherent non-additivity of free energy with respect to molecular fragments composing a molecule. The fragment- based drug discovery (FBDD) approach has proven to be fruitful in addressing the above notions. The main difficulty of the FBDD, however, is in its reliance on the low throughput and expensive experimental means of determining the fragment-sized molecules binding. In this article we propose a way to enhance the throughput and availability of the FBDD methods by judiciously using an in silico means of assessing the contribution to ligand-receptor binding energy of fragments of a molecule under question using a previously developed in silico Reverse Fragment Based Drug Discovery (R-FBDD) approach. It has been shown that the proposed structure-based drug discovery (SBDD) type of approach fills in the vacant niche among the existing in silico approaches, which mainly stem from the ligand-based drug discovery (LBDD) counterparts. In order to illustrate the applicability of the approach, our work retrospectively repeats the findings of the use case of an FBDD hit-to-lead project devoted to the experimentally based determination of additive group efficiency (GE)-an analog of ligand efficiency (LE) for a group in the molecule-using the Free-Wilson (FW) decomposition. It is shown that in using our in silico approach to evaluate fragment contributions of a ligand and to estimate GE one can arrive at similar decisions as those made using the experimentally determined activity-based FW decomposition. It is also shown that the approach is rather robust to the choice of the scoring function, provided the latter demonstrates a decent scoring power. We argue that the proposed approach of in silico assessment of GE has a wider applicability domain and expect that it will be widely applicable to enhance the net throughput of drug discovery based on the FBDD paradigm.
Collapse
Affiliation(s)
- Dmitry A. Shulga
- Department of Chemistry, Lomonosov Moscow State University, 119991 Moscow, Russia;
| | | | - Vladimir A. Palyulin
- Department of Chemistry, Lomonosov Moscow State University, 119991 Moscow, Russia;
| |
Collapse
|
50
|
Wang J, Dokholyan NV. Yuel: Improving the Generalizability of Structure-Free Compound-Protein Interaction Prediction. J Chem Inf Model 2022; 62:463-471. [PMID: 35103472 PMCID: PMC9203246 DOI: 10.1021/acs.jcim.1c01531] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Predicting binding affinities between small molecules and the protein target is at the core of computational drug screening and drug target identification. Deep learning-based approaches have recently been adapted to predict binding affinities and they claim to achieve high prediction accuracy in their tests; we show that these approaches do not generalize, that is, they fail to predict interactions between unknown proteins and unknown small molecules. To address these shortcomings, we develop a new compound-protein interaction predictor, Yuel, which predicts compound-protein interactions with a higher generalizability than the existing methods. Upon comprehensive tests on various data sets, we find that out of all the deep-learning approaches surveyed, Yuel manifests the best ability to predict interactions between unknown compounds and unknown proteins.
Collapse
Affiliation(s)
- Jian Wang
- Department of Pharmacology, Penn State College of Medicine, Hershey, PA 17033, USA
| | - Nikolay V. Dokholyan
- Department of Pharmacology, Penn State College of Medicine, Hershey, PA 17033, USA
- Department of Biochemistry & Molecular Biology, Penn State College of Medicine, Hershey, PA 17033, USA
- Department of Chemistry, Pennsylvania State University, University Park, PA 16802, USA
- Department of Biomedical Engineering, Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|