1
|
Saifi I, Bhat BA, Hamdani SS, Bhat UY, Lobato-Tapia CA, Mir MA, Dar TUH, Ganie SA. Artificial intelligence and cheminformatics tools: a contribution to the drug development and chemical science. J Biomol Struct Dyn 2024; 42:6523-6541. [PMID: 37434311 DOI: 10.1080/07391102.2023.2234039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 07/03/2023] [Indexed: 07/13/2023]
Abstract
In the ever-evolving field of drug discovery, the integration of Artificial Intelligence (AI) and Machine Learning (ML) with cheminformatics has proven to be a powerful combination. Cheminformatics, which combines the principles of computer science and chemistry, is used to extract chemical information and search compound databases, while the application of AI and ML allows for the identification of potential hit compounds, optimization of synthesis routes, and prediction of drug efficacy and toxicity. This collaborative approach has led to the discovery, preclinical evaluations and approval of over 70 drugs in recent years. To aid researchers in the pursuit of new drugs, this article presents a comprehensive list of databases, datasets, predictive and generative models, scoring functions and web platforms that have been launched between 2021 and 2022. These resources provide a wealth of information and tools for computer-assisted drug development, and are a valuable asset for those working in the field of cheminformatics. Overall, the integration of AI, ML and cheminformatics has greatly advanced the drug discovery process and continues to hold great potential for the future. As new resources and technologies become available, we can expect to see even more groundbreaking discoveries and advancements in these fields.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ifra Saifi
- Chaudhary Charan Singh University, Meerut, Uttar Pradesh, India
| | - Basharat Ahmad Bhat
- Department of Bioresources, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| | - Syed Suhail Hamdani
- Department of Bioresources, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| | - Umar Yousuf Bhat
- Department of Zoology, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| | | | - Mushtaq Ahmad Mir
- Department of Clinical Laboratory Sciences, College of Applied Medical Science, King Khalid University, KSA, Saudi Arabia
| | - Tanvir Ul Hasan Dar
- Department of Biotechnology, School of Biosciences and Biotechnology, BGSB University, Rajouri, India
| | - Showkat Ahmad Ganie
- Department of Clinical Biochemistry, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| |
Collapse
|
2
|
Kairys V, Baranauskiene L, Kazlauskiene M, Zubrienė A, Petrauskas V, Matulis D, Kazlauskas E. Recent advances in computational and experimental protein-ligand affinity determination techniques. Expert Opin Drug Discov 2024; 19:649-670. [PMID: 38715415 DOI: 10.1080/17460441.2024.2349169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024]
Abstract
INTRODUCTION Modern drug discovery revolves around designing ligands that target the chosen biomolecule, typically proteins. For this, the evaluation of affinities of putative ligands is crucial. This has given rise to a multitude of dedicated computational and experimental methods that are constantly being developed and improved. AREAS COVERED In this review, the authors reassess both the industry mainstays and the newest trends among the methods for protein - small-molecule affinity determination. They discuss both computational affinity predictions and experimental techniques, describing their basic principles, main limitations, and advantages. Together, this serves as initial guide to the currently most popular and cutting-edge ligand-binding assays employed in rational drug design. EXPERT OPINION The affinity determination methods continue to develop toward miniaturization, high-throughput, and in-cell application. Moreover, the availability of data analysis tools has been constantly increasing. Nevertheless, cross-verification of data using at least two different techniques and careful result interpretation remain of utmost importance.
Collapse
Affiliation(s)
- Visvaldas Kairys
- Department of Bioinformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Lina Baranauskiene
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | | | - Asta Zubrienė
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Vytautas Petrauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Daumantas Matulis
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Egidijus Kazlauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
3
|
Qu X, Dong L, Luo D, Si Y, Wang B. Water Network-Augmented Two-State Model for Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2024; 64:2263-2274. [PMID: 37433009 DOI: 10.1021/acs.jcim.3c00567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2023]
Abstract
Water network rearrangement from the ligand-unbound state to the ligand-bound state is known to have significant effects on the protein-ligand binding interactions, but most of the current machine learning-based scoring functions overlook these effects. In this study, we endeavor to construct a comprehensive and realistic deep learning model by incorporating water network information into both ligand-unbound and -bound states. In particular, extended connectivity interaction features were integrated into graph representation, and graph transformer operator was employed to extract features of the ligand-unbound and -bound states. Through these efforts, we developed a water network-augmented two-state model called ECIFGraph::HM-Holo-Apo. Our new model exhibits satisfactory performance in terms of scoring, ranking, docking, screening, and reverse screening power tests on the CASF-2016 benchmark. In addition, it can achieve superior performance in large-scale docking-based virtual screening tests on the DEKOIS2.0 data set. Our study highlights that the use of a water network-augmented two-state model can be an effective strategy to bolster the robustness and applicability of machine learning-based scoring functions, particularly for targets with hydrophilic or solvent-exposed binding pockets.
Collapse
Affiliation(s)
- Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yubing Si
- College of Chemistry, Zhengzhou University, Zhengzhou 450001, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, P. R. China
| |
Collapse
|
4
|
Rayka M, Mirzaei M, Mohammad Latifi A. An ensemble-based approach to estimate confidence of predicted protein-ligand binding affinity values. Mol Inform 2024; 43:e202300292. [PMID: 38358080 DOI: 10.1002/minf.202300292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 01/22/2024] [Accepted: 02/02/2024] [Indexed: 02/16/2024]
Abstract
When designing a machine learning-based scoring function, we access a limited number of protein-ligand complexes with experimentally determined binding affinity values, representing only a fraction of all possible protein-ligand complexes. Consequently, it is crucial to report a measure of confidence and quantify the uncertainty in the model's predictions during test time. Here, we adopt the conformal prediction technique to evaluate the confidence of a prediction for each member of the core set of the CASF 2016 benchmark. The conformal prediction technique requires a diverse ensemble of predictors for uncertainty estimation. To this end, we introduce ENS-Score as an ensemble predictor, which includes 30 models with different protein-ligand representation approaches and achieves Pearson's correlation of 0.842 on the core set of the CASF 2016 benchmark. Also, we comprehensively investigate the residual error of each data point to assess the normality behavior of the distribution of the residual errors and their correlation to the structural features of the ligands, such as hydrophobic interactions and halogen bonding. In the end, we provide a local host web application to facilitate the usage of ENS-Score. All codes to repeat results are provided at https://github.com/miladrayka/ENS_Score.
Collapse
Affiliation(s)
- Milad Rayka
- Applied Biotechnology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Morteza Mirzaei
- Applied Biotechnology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Ali Mohammad Latifi
- Applied Biotechnology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
| |
Collapse
|
5
|
Jaeger-Honz S, Klein K, Schreiber F. Systematic analysis, aggregation and visualisation of interaction fingerprints for molecular dynamics simulation data. J Cheminform 2024; 16:28. [PMID: 38475907 DOI: 10.1186/s13321-024-00822-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 03/02/2024] [Indexed: 03/14/2024] Open
Abstract
Computational methods such as molecular docking or molecular dynamics (MD) simulations have been developed to simulate and explore the interactions between biomolecules. However, the interactions obtained using these methods are difficult to analyse and evaluate. Interaction fingerprints (IFPs) have been proposed to derive interactions from static 3D coordinates and transform them into 1D bit vectors. More recently, the concept has been applied to derive IFPs from MD simulations, which adds a layer of complexity by adding the temporal motion and dynamics of a system. As a result, many IFPs are obtained from one MD simulation, resulting in a large number of individual IFPs that are difficult to analyse compared to IFPs derived from static 3D structures. Scientific contribution: We introduce a new method to systematically aggregate IFPs derived from MD simulation data. In addition, we propose visualisations to effectively analyse and compare IFPs derived from MD simulation data to account for the temporal evolution of interactions and to compare IFPs across different MD simulations. This has been implemented as a freely available Python library and can therefore be easily adopted by other researchers and to different MD simulation datasets.
Collapse
Affiliation(s)
- Sabrina Jaeger-Honz
- Department of Computer and Information Science, University of Konstanz, Universitätsstrasse 10, 78464, Constance, Germany.
| | - Karsten Klein
- Department of Computer and Information Science, University of Konstanz, Universitätsstrasse 10, 78464, Constance, Germany
| | - Falk Schreiber
- Department of Computer and Information Science, University of Konstanz, Universitätsstrasse 10, 78464, Constance, Germany
- Faculty of Information Technology, Monash University, Clayton, VIC, 3800, Australia
| |
Collapse
|
6
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
7
|
Visan AI, Negut I. Integrating Artificial Intelligence for Drug Discovery in the Context of Revolutionizing Drug Delivery. Life (Basel) 2024; 14:233. [PMID: 38398742 PMCID: PMC10890405 DOI: 10.3390/life14020233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/03/2024] [Accepted: 02/06/2024] [Indexed: 02/25/2024] Open
Abstract
Drug development is expensive, time-consuming, and has a high failure rate. In recent years, artificial intelligence (AI) has emerged as a transformative tool in drug discovery, offering innovative solutions to complex challenges in the pharmaceutical industry. This manuscript covers the multifaceted role of AI in drug discovery, encompassing AI-assisted drug delivery design, the discovery of new drugs, and the development of novel AI techniques. We explore various AI methodologies, including machine learning and deep learning, and their applications in target identification, virtual screening, and drug design. This paper also discusses the historical development of AI in medicine, emphasizing its profound impact on healthcare. Furthermore, it addresses AI's role in the repositioning of existing drugs and the identification of drug combinations, underscoring its potential in revolutionizing drug delivery systems. The manuscript provides a comprehensive overview of the AI programs and platforms currently used in drug discovery, illustrating the technological advancements and future directions of this field. This study not only presents the current state of AI in drug discovery but also anticipates its future trajectory, highlighting the challenges and opportunities that lie ahead.
Collapse
Affiliation(s)
| | - Irina Negut
- National Institute for Lasers, Plasma and Radiation Physics, 409 Atomistilor Street, 077125 Magurele, Ilfov, Romania;
| |
Collapse
|
8
|
Liu J, Wan J, Ren Y, Shao X, Xu X, Rao L. DOX_BDW: Incorporating Solvation and Desolvation Effects of Cavity Water into Nonfitting Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2023; 63:4850-4863. [PMID: 37539963 DOI: 10.1021/acs.jcim.3c00776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
Accurate prediction of the protein-ligand binding affinity (PLBA) with an affordable cost is one of the ultimate goals in the field of structure-based drug design (SBDD), as well as a great challenge in the computational and theoretical chemistry. Herein, we have systematically addressed the complicated solvation and desolvation effects on the PLBA brought by the difference of the explicit water in the protein cavity before and after ligands bind to the protein-binding site. Based on the new solvation model, a nonfitting method at the first-principles level for the PLBA prediction was developed by taking the bridging and displaced water (BDW) molecules into account simultaneously. The newly developed method, DOX_BDW, was validated against a total of 765 noncovalent and covalent protein-ligand binding pairs, including the CASF2016 core set, Cov_2022 covalent binding testing set, and six testing sets for the hit and lead compound optimization (HLO) simulation. In all of the testing sets, the DOX_BDW method was able to produce PLBA predictions that were strongly correlated with the corresponding experimental data (R = 0.66-0.85). The overall performance of DOX_BDW is better than the current empirical scoring functions that are heavily parameterized. DOX_BDW is particularly outstanding for the covalent binding situation, implying the need for considering an electronic structure in covalent drug design. Furthermore, the method is especially recommended to be used in the HLO scenario of SBDD, where hundreds of similar derivatives need to be screened and refined. The computational cost of DOX_BDW is affordable, and its accuracy is remarkable.
Collapse
Affiliation(s)
- Jiaqi Liu
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, Hubei International Scientific and Technological Cooperation Base of Pesticide and Green Synthesis, College of Chemistry, Central China Normal University, Wuhan 43009, People's Republic of China
| | - Jian Wan
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, Hubei International Scientific and Technological Cooperation Base of Pesticide and Green Synthesis, College of Chemistry, Central China Normal University, Wuhan 43009, People's Republic of China
| | - Yanliang Ren
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, Hubei International Scientific and Technological Cooperation Base of Pesticide and Green Synthesis, College of Chemistry, Central China Normal University, Wuhan 43009, People's Republic of China
| | - Xubo Shao
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, Hubei International Scientific and Technological Cooperation Base of Pesticide and Green Synthesis, College of Chemistry, Central China Normal University, Wuhan 43009, People's Republic of China
| | - Xin Xu
- Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Ministry of Education (MOE) Laboratory for Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, People's Republic of China
| | - Li Rao
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, Hubei International Scientific and Technological Cooperation Base of Pesticide and Green Synthesis, College of Chemistry, Central China Normal University, Wuhan 43009, People's Republic of China
| |
Collapse
|
9
|
Nangunuri BG, Shirke RP, Kim MH. Metal-free synthesis of dihydrofuran derivatives as anti-vicinal amino alcohol isosteres. Org Biomol Chem 2023; 21:960-965. [PMID: 36625241 DOI: 10.1039/d2ob02077g] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Dihydrofuran cores are commonly incorporated into synthetically and pharmacologically significant scaffolds in natural product and drug discovery chemistry. Herein, we report a concise and practical strategy to construct spiro-dihydrofuran and amino dihydrofuran scaffolds as anti-vicinal amino alcohol isosteres. Hypervalent iodine (PhI(OAc)(NTs2))-mediated C-H activation of alkynes resulted in two-bond formations with one pi bond cleavage: (i) C(sp2)-N(sp3) and O(sp3)-C(sp2); (ii) C(sp2)-N(sp3) and C(sp3)-C(sp2). The metal-free 5-endo-dig oxidative cyclization provided versatile amino 2,3- and 2,5-dihydrofurans bearing the C5 quaternary carbon. The non-toxicity of all synthesised dihydrofurans was verified via in vitro cell viability assay.
Collapse
Affiliation(s)
- Bhargav Gupta Nangunuri
- Gachon Institute of Pharmaceutical Science & Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea.
| | - Rajendra P Shirke
- Gachon Institute of Pharmaceutical Science & Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea.
| | - Mi-Hyun Kim
- Gachon Institute of Pharmaceutical Science & Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea.
| |
Collapse
|
10
|
Boyles F, Deane CM, Morris GM. Learning from Docked Ligands: Ligand-Based Features Rescue Structure-Based Scoring Functions When Trained on Docked Poses. J Chem Inf Model 2022; 62:5329-5341. [PMID: 34469150 DOI: 10.1021/acs.jcim.1c00096] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes. We explore how the use of docked rather than crystallographic poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. We also present a new, freely available validation set─the Updated DUD-E Diverse Subset─for binding affinity prediction using data from DUD-E and ChEMBL. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function sometimes generalizes poorly to a protein target not represented in the training set, demonstrating the need for improved scoring functions and additional validation benchmarks.
Collapse
Affiliation(s)
- Fergus Boyles
- Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, United Kingdom
| | - Charlotte M Deane
- Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, United Kingdom
| | - Garrett M Morris
- Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, United Kingdom
| |
Collapse
|
11
|
Prediction of chemical warfare agents based on cholinergic array type meta-predictors. Sci Rep 2022; 12:16709. [PMID: 36203081 PMCID: PMC9537167 DOI: 10.1038/s41598-022-21150-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 09/23/2022] [Indexed: 11/08/2022] Open
Abstract
Molecular insights into chemical safety are very important for sustainable development as well as risk assessment. This study considers how to manage future upcoming harmful agents, especially potentially cholinergic chemical warfare agents (CWAs). For this purpose, the structures of known cholinergic agents were encoded by molecular descriptors. And then each drug target interaction (DTI) was learned from the encoded structures and their cholinergic activities to build DTI classification models for five cholinergic targets with reliable statistical validation (ensemble-AUC: up to 0.790, MCC: up to 0.991, accuracy: up to 0.995). The collected classifiers were transformed into 2D or 3D array type meta-predictors for multi-task: (1) cholinergic prediction and (2) CWA detection. The detection ability of the array classifiers was verified under the imbalanced dataset between CWAs and none CWAs (area under the precision-recall curve: up to 0.997, MCC: up to 0.638, F1-score of none CWAs: up to 0.991, F1-score of CWAs: up to 0.585).
Collapse
|
12
|
Ahn S, Lee SE, Kim MH. Random-forest model for drug-target interaction prediction via Kullbeck-Leibler divergence. J Cheminform 2022; 14:67. [PMID: 36192818 PMCID: PMC9531514 DOI: 10.1186/s13321-022-00644-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 09/11/2022] [Indexed: 12/04/2022] Open
Abstract
Virtual screening has significantly improved the success rate of early stage drug discovery. Recent virtual screening methods have improved owing to advances in machine learning and chemical information. Among these advances, the creative extraction of drug features is important for predicting drug–target interaction (DTI), which is a large-scale virtual screening of known drugs. Herein, we report Kullbeck–Leibler divergence (KLD) as a DTI feature and the feature-driven classification model applicable to DTI prediction. For the purpose, E3FP three-dimensional (3D) molecular fingerprints of drugs as a molecular representation allow the computation of 3D similarities between ligands within each target (Q–Q matrix) to identify the uniqueness of pharmacological targets and those between a query and a ligand (Q–L vector) in DTIs. The 3D similarity matrices are transformed into probability density functions via kernel density estimation as a nonparametric estimation. Each density model can exploit the characteristics of each pharmacological target and measure the quasi-distance between the ligands. Furthermore, we developed a random forest model from the KLD feature vectors to successfully predict DTIs for representative 17 targets (mean accuracy: 0.882, out-of-bag score estimate: 0.876, ROC AUC: 0.990). The method is applicable for 2D chemical similarity.
Collapse
Affiliation(s)
- Sangjin Ahn
- Gachon Institute of Pharmaceutical Science and Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea.,Department of Artificial Intelligence, Ajou University, Suwon, 16499, Republic of Korea
| | - Si Eun Lee
- Gachon Institute of Pharmaceutical Science and Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea
| | - Mi-Hyun Kim
- Gachon Institute of Pharmaceutical Science and Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea.
| |
Collapse
|
13
|
Korlepara DB, Vasavi CS, Jeurkar S, Pal PK, Roy S, Mehta S, Sharma S, Kumar V, Muvva C, Sridharan B, Garg A, Modee R, Bhati AP, Nayar D, Priyakumar UD. PLAS-5k: Dataset of Protein-Ligand Affinities from Molecular Dynamics for Machine Learning Applications. Sci Data 2022; 9:548. [PMID: 36071074 PMCID: PMC9451116 DOI: 10.1038/s41597-022-01631-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 08/15/2022] [Indexed: 11/08/2022] Open
Abstract
Computational methods and recently modern machine learning methods have played a key role in structure-based drug design. Though several benchmarking datasets are available for machine learning applications in virtual screening, accurate prediction of binding affinity for a protein-ligand complex remains a major challenge. New datasets that allow for the development of models for predicting binding affinities better than the state-of-the-art scoring functions are important. For the first time, we have developed a dataset, PLAS-5k comprised of 5000 protein-ligand complexes chosen from PDB database. The dataset consists of binding affinities along with energy components like electrostatic, van der Waals, polar and non-polar solvation energy calculated from molecular dynamics simulations using MMPBSA (Molecular Mechanics Poisson-Boltzmann Surface Area) method. The calculated binding affinities outperformed docking scores and showed a good correlation with the available experimental values. The availability of energy components may enable optimization of desired components during machine learning-based drug design. Further, OnionNet model has been retrained on PLAS-5k dataset and is provided as a baseline for the prediction of binding affinities.
Collapse
Affiliation(s)
- Divya B Korlepara
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - C S Vasavi
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shruti Jeurkar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Pradeep Kumar Pal
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Subhajit Roy
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
- UM-DAE-Centre For Excellence In Basic Sciences, University of Mumbai, Vidyanagari, Mumbai, India
| | - Sarvesh Mehta
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shubham Sharma
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Vishal Kumar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Charuvaka Muvva
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Bhuvanesh Sridharan
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Akshit Garg
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Rohit Modee
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Agastya P Bhati
- Centre for Computational Science, Department of Chemistry, University College London, London, WC1H 0AJ, United Kingdom
| | - Divya Nayar
- Department of Materials Science and Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India.
| | - U Deva Priyakumar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India.
| |
Collapse
|
14
|
Monteiro NR, Oliveira JL, Arrais JP. DTITR: End-to-end drug–target binding affinity prediction with transformers. Comput Biol Med 2022; 147:105772. [DOI: 10.1016/j.compbiomed.2022.105772] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 06/07/2022] [Accepted: 06/19/2022] [Indexed: 11/03/2022]
|
15
|
Monteiro NRC, Simões CJV, Ávila HV, Abbasi M, Oliveira JL, Arrais JP. Explainable deep drug-target representations for binding affinity prediction. BMC Bioinformatics 2022; 23:237. [PMID: 35715734 PMCID: PMC9204982 DOI: 10.1186/s12859-022-04767-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 05/25/2022] [Indexed: 11/10/2022] Open
Abstract
Background Several computational advances have been achieved in the drug discovery field, promoting the identification of novel drug–target interactions and new leads. However, most of these methodologies have been overlooking the importance of providing explanations to the decision-making process of deep learning architectures. In this research study, we explore the reliability of convolutional neural networks (CNNs) at identifying relevant regions for binding, specifically binding sites and motifs, and the significance of the deep representations extracted by providing explanations to the model’s decisions based on the identification of the input regions that contributed the most to the prediction. We make use of an end-to-end deep learning architecture to predict binding affinity, where CNNs are exploited in their capacity to automatically identify and extract discriminating deep representations from 1D sequential and structural data. Results The results demonstrate the effectiveness of the deep representations extracted from CNNs in the prediction of drug–target interactions. CNNs were found to identify and extract features from regions relevant for the interaction, where the weight associated with these spots was in the range of those with the highest positive influence given by the CNNs in the prediction. The end-to-end deep learning model achieved the highest performance both in the prediction of the binding affinity and on the ability to correctly distinguish the interaction strength rank order when compared to baseline approaches. Conclusions This research study validates the potential applicability of an end-to-end deep learning architecture in the context of drug discovery beyond the confined space of proteins and ligands with determined 3D structure. Furthermore, it shows the reliability of the deep representations extracted from the CNNs by providing explainability to the decision-making process. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04767-y.
Collapse
Affiliation(s)
- Nelson R C Monteiro
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal.
| | | | - Henrique V Ávila
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - Maryam Abbasi
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - José L Oliveira
- IEETA, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Joel P Arrais
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| |
Collapse
|
16
|
fingeRNAt—A novel tool for high-throughput analysis of nucleic acid-ligand interactions. PLoS Comput Biol 2022; 18:e1009783. [PMID: 35653385 PMCID: PMC9197077 DOI: 10.1371/journal.pcbi.1009783] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 06/14/2022] [Accepted: 05/06/2022] [Indexed: 11/19/2022] Open
Abstract
Computational methods play a pivotal role in drug discovery and are widely applied in virtual screening, structure optimization, and compound activity profiling. Over the last decades, almost all the attention in medicinal chemistry has been directed to protein-ligand binding, and computational tools have been created with this target in mind. With novel discoveries of functional RNAs and their possible applications, RNAs have gained considerable attention as potential drug targets. However, the availability of bioinformatics tools for nucleic acids is limited. Here, we introduce fingeRNAt—a software tool for detecting non-covalent interactions formed in complexes of nucleic acids with ligands. The program detects nine types of interactions: (i) hydrogen and (ii) halogen bonds, (iii) cation-anion, (iv) pi-cation, (v) pi-anion, (vi) pi-stacking, (vii) inorganic ion-mediated, (viii) water-mediated, and (ix) lipophilic interactions. However, the scope of detected interactions can be easily expanded using a simple plugin system. In addition, detected interactions can be visualized using the associated PyMOL plugin, which facilitates the analysis of medium-throughput molecular complexes. Interactions are also encoded and stored as a bioinformatics-friendly Structural Interaction Fingerprint (SIFt)—a binary string where the respective bit in the fingerprint is set to 1 if a particular interaction is present and to 0 otherwise. This output format, in turn, enables high-throughput analysis of interaction data using data analysis techniques. We present applications of fingeRNAt-generated interaction fingerprints for visual and computational analysis of RNA-ligand complexes, including analysis of interactions formed in experimentally determined RNA-small molecule ligand complexes deposited in the Protein Data Bank. We propose interaction fingerprint-based similarity as an alternative measure to RMSD to recapitulate complexes with similar interactions but different folding. We present an application of interaction fingerprints for the clustering of molecular complexes. This approach can be used to group ligands that form similar binding networks and thus have similar biological properties. The fingeRNAt software is freely available at https://github.com/n-szulc/fingeRNAt.
Collapse
|
17
|
Volkov M, Turk JA, Drizard N, Martin N, Hoffmann B, Gaston-Mathé Y, Rognan D. On the Frustration to Predict Binding Affinities from Protein-Ligand Structures with Deep Neural Networks. J Med Chem 2022; 65:7946-7958. [PMID: 35608179 DOI: 10.1021/acs.jmedchem.2c00487] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Accurate prediction of binding affinities from protein-ligand atomic coordinates remains a major challenge in early stages of drug discovery. Using modular message passing graph neural networks describing both the ligand and the protein in their free and bound states, we unambiguously evidence that an explicit description of protein-ligand noncovalent interactions does not provide any advantage with respect to ligand or protein descriptors. Simple models, inferring binding affinities of test samples from that of the closest ligands or proteins in the training set, already exhibit good performances, suggesting that memorization largely dominates true learning in the deep neural networks. The current study suggests considering only noncovalent interactions while omitting their protein and ligand atomic environments. Removing all hidden biases probably requires much denser protein-ligand training matrices and a coordinated effort of the drug design community to solve the necessary protein-ligand structures.
Collapse
Affiliation(s)
- Mikhail Volkov
- Laboratoire d'innovation thérapeutique, UMR7200 CNRS-Université de Strasbourg, 74 route du Rhin, Illkirch 67400, France
| | | | | | | | | | | | - Didier Rognan
- Laboratoire d'innovation thérapeutique, UMR7200 CNRS-Université de Strasbourg, 74 route du Rhin, Illkirch 67400, France
| |
Collapse
|
18
|
Selvaraj C, Chandra I, Singh SK. Artificial intelligence and machine learning approaches for drug design: challenges and opportunities for the pharmaceutical industries. Mol Divers 2021; 26:1893-1913. [PMID: 34686947 PMCID: PMC8536481 DOI: 10.1007/s11030-021-10326-z] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 09/24/2021] [Indexed: 12/27/2022]
Abstract
The global spread of COVID-19 has raised the importance of pharmaceutical drug development as intractable and hot research. Developing new drug molecules to overcome any disease is a costly and lengthy process, but the process continues uninterrupted. The critical point to consider the drug design is to use the available data resources and to find new and novel leads. Once the drug target is identified, several interdisciplinary areas work together with artificial intelligence (AI) and machine learning (ML) methods to get enriched drugs. These AI and ML methods are applied in every step of the computer-aided drug design, and integrating these AI and ML methods results in a high success rate of hit compounds. In addition, this AI and ML integration with high-dimension data and its powerful capacity have taken a step forward. Clinical trials output prediction through the AI/ML integrated models could further decrease the clinical trials cost by also improving the success rate. Through this review, we discuss the backend of AI and ML methods in supporting the computer-aided drug design, along with its challenge and opportunity for the pharmaceutical industry. From the available information or data, the AI and ML based prediction for the high throughput virtual screening. After this integration of AI and ML, the success rate of hit identification has gained a momentum with huge success by providing novel drugs.
Collapse
Affiliation(s)
- Chandrabose Selvaraj
- CADD and Molecular Modelling Lab, Department of Bioinformatics, Alagappa University, Science Block, Karaikudi, Tamil Nadu, 630004, India.
| | - Ishwar Chandra
- CADD and Molecular Modelling Lab, Department of Bioinformatics, Alagappa University, Science Block, Karaikudi, Tamil Nadu, 630004, India
| | - Sanjeev Kumar Singh
- CADD and Molecular Modelling Lab, Department of Bioinformatics, Alagappa University, Science Block, Karaikudi, Tamil Nadu, 630004, India.
| |
Collapse
|
19
|
Xiong G, Shen C, Yang Z, Jiang D, Liu S, Lu A, Chen X, Hou T, Cao D. Featurization strategies for protein–ligand interactions and their applications in scoring function development. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1567] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Guoli Xiong
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Ziyi Yang
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Dejun Jiang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
- College of Computer Science and Technology Zhejiang University Hangzhou China
| | - Shao Liu
- Department of Pharmacy Xiangya Hospital, Central South University Changsha China
| | - Aiping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis Xiangya Hospital, Central South University Changsha China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| |
Collapse
|