1
|
Vittorio S, Lunghini F, Morerio P, Gadioli D, Orlandini S, Silva P, Jan Martinovic, Pedretti A, Bonanni D, Del Bue A, Palermo G, Vistoli G, Beccari AR. Addressing docking pose selection with structure-based deep learning: Recent advances, challenges and opportunities. Comput Struct Biotechnol J 2024; 23:2141-2151. [PMID: 38827235 PMCID: PMC11141151 DOI: 10.1016/j.csbj.2024.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/15/2024] [Accepted: 05/15/2024] [Indexed: 06/04/2024] Open
Abstract
Molecular docking is a widely used technique in drug discovery to predict the binding mode of a given ligand to its target. However, the identification of the near-native binding pose in docking experiments still represents a challenging task as the scoring functions currently employed by docking programs are parametrized to predict the binding affinity, and, therefore, they often fail to correctly identify the ligand native binding conformation. Selecting the correct binding mode is crucial to obtaining meaningful results and to conveniently optimizing new hit compounds. Deep learning (DL) algorithms have been an area of a growing interest in this sense for their capability to extract the relevant information directly from the protein-ligand structure. Our review aims to present the recent advances regarding the development of DL-based pose selection approaches, discussing limitations and possible future directions. Moreover, a comparison between the performances of some classical scoring functions and DL-based methods concerning their ability to select the correct binding mode is reported. In this regard, two novel DL-based pose selectors developed by us are presented.
Collapse
Affiliation(s)
- Serena Vittorio
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Filippo Lunghini
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| | - Pietro Morerio
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Davide Gadioli
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Sergio Orlandini
- SCAI, SuperComputing Applications and Innovation Department, CINECA, Via dei Tizii 6, Rome 00185, Italy
| | - Paulo Silva
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Jan Martinovic
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Alessandro Pedretti
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Domenico Bonanni
- Department of Physical and Chemical Sciences, University of L′Aquila, via Vetoio, L′Aquila 67010, Italy
| | - Alessio Del Bue
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Gianluca Palermo
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Giulio Vistoli
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Andrea R. Beccari
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| |
Collapse
|
2
|
Schmidt B, Hildebrandt A. From GPUs to AI and quantum: three waves of acceleration in bioinformatics. Drug Discov Today 2024; 29:103990. [PMID: 38663581 DOI: 10.1016/j.drudis.2024.103990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/05/2024] [Accepted: 04/17/2024] [Indexed: 05/01/2024]
Abstract
The enormous growth in the amount of data generated by the life sciences is continuously shifting the field from model-driven science towards data-driven science. The need for efficient processing has led to the adoption of massively parallel accelerators such as graphics processing units (GPUs). Consequently, the development of bioinformatics methods nowadays often heavily depends on the effective use of these powerful technologies. Furthermore, progress in computational techniques and architectures continues to be highly dynamic, involving novel deep neural network models and artificial intelligence (AI) accelerators, and potentially quantum processing units in the future. These are expected to be disruptive for the life sciences as a whole and for drug discovery in particular. Here, we identify three waves of acceleration and their applications in a bioinformatics context: (i) GPU computing, (ii) AI and (iii) next-generation quantum computers.
Collapse
Affiliation(s)
- Bertil Schmidt
- Institut für Informatik, Johannes Gutenberg University, Mainz, Germany.
| | | |
Collapse
|
3
|
Nasser Binjawhar D, Abu Ali OA, Alqahtani AS, Fayad E, Abo-Bakr AM, Mekhael AM, Sadek FM. Powerful Approach for New Drugs as Antibacterial Agents via Molecular Docking and In Vitro Studies of Some New Cyclic Imides and Quinazoline-2,5-diones. ACS OMEGA 2024; 9:18566-18575. [PMID: 38680340 PMCID: PMC11044208 DOI: 10.1021/acsomega.4c01176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 03/19/2024] [Accepted: 03/27/2024] [Indexed: 05/01/2024]
Abstract
We generated novel elven 1,2,3,6-tetrahydrophthalimides and tetrahydroquinazoline derivatives from 1,2,3,6-tetrahydrophthalic anhydride (1) in response to our interest in using the anhydrides to produce heterocyclic nitrogen compounds. The elemental and spectral analyses of the produced compounds validated the recommended configurations and MOE 2014.09 (Molecular Operating Environment) computations were used to perform their in silico analysis. The synthesized compounds have been analyzed and put through various experiments, including in vitro and in silico methods to assess their biological activity against Escherichia coli Penicillin-Binding Protein 3 (PBP3) and Staphylococcus aureus Penicillin-Binding Protein 2 (PBP2), among these compounds showing promising data as antibacterial drugs.
Collapse
Affiliation(s)
- Dalal Nasser Binjawhar
- Department
of Chemistry, College of Science, Princess
Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Ola A. Abu Ali
- Department
of Chemistry, College of Science, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
| | - Arwa Sultan Alqahtani
- Department
of Chemistry, College of Science, Imam Mohammad
Ibn Saud Islamic University (IMSIU), P.O. Box 90950, Riyadh 11623, Saudi Arabia
| | - Eman Fayad
- Department
of Biotechnology, College of Sciences, Taif
University, P.O. Box 11099, Taif 21944, Saudi Arabia
| | - Ahmed M. Abo-Bakr
- Chemistry
Department, Faculty of Science, South Valley
University, P.O. Box 83523, Qena 83523, Egypt
| | - Antonous. M. Mekhael
- Cotton Leaf
Worm Department, Plant Protection Research Institute, Agriculture Research Center, P.O. Box 12619, Giza 12611, Egypt
| | - Fayza M. Sadek
- Radiation
Sciences Department, Medical Research Institution, Alexandria University, P.O. Box 21500, Alexandria 5424041, Egypt
| |
Collapse
|
4
|
Mqawass G, Popov P. graphLambda: Fusion Graph Neural Networks for Binding Affinity Prediction. J Chem Inf Model 2024; 64:2323-2330. [PMID: 38366974 DOI: 10.1021/acs.jcim.3c00771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]
Abstract
Predicting the binding affinity of protein-ligand complexes is crucial for computer-aided drug discovery (CADD) and the identification of potential drug candidates. The deep learning-based scoring functions have emerged as promising predictors of binding constants. Building on recent advancements in graph neural networks, we present graphLambda for protein-ligand binding affinity prediction, which utilizes graph convolutional, attention, and isomorphism blocks to enhance the predictive capabilities. The graphLambda model exhibits superior performance across CASF16 and CSAR HiQ NRC benchmarks and demonstrates robustness with respect to different types of train-validation set partitions. The development of graphLambda underscores the potential of graph neural networks in advancing binding affinity prediction models, contributing to more effective CADD methodologies.
Collapse
Affiliation(s)
- Ghaith Mqawass
- Faculty of Computer Science, University of Vienna, Vienna A-1090, Austria
- UniVie Doctoral School Computer Science, University of Vienna, Vienna A-1090, Austria
| | - Petr Popov
- Tetra-d, Rheinweg 9, Schaffhausen 8200, Switzerland
- School of Science, Constructor University Bremen gGmbH, Bremen 28759, Germany
| |
Collapse
|
5
|
Nandi S, Bhaduri S, Das D, Ghosh P, Mandal M, Mitra P. Deciphering the Lexicon of Protein Targets: A Review on Multifaceted Drug Discovery in the Era of Artificial Intelligence. Mol Pharm 2024; 21:1563-1590. [PMID: 38466810 DOI: 10.1021/acs.molpharmaceut.3c01161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Understanding protein sequence and structure is essential for understanding protein-protein interactions (PPIs), which are essential for many biological processes and diseases. Targeting protein binding hot spots, which regulate signaling and growth, with rational drug design is promising. Rational drug design uses structural data and computational tools to study protein binding sites and protein interfaces to design inhibitors that can change these interactions, thereby potentially leading to therapeutic approaches. Artificial intelligence (AI), such as machine learning (ML) and deep learning (DL), has advanced drug discovery and design by providing computational resources and methods. Quantum chemistry is essential for drug reactivity, toxicology, drug screening, and quantitative structure-activity relationship (QSAR) properties. This review discusses the methodologies and challenges of identifying and characterizing hot spots and binding sites. It also explores the strategies and applications of artificial-intelligence-based rational drug design technologies that target proteins and protein-protein interaction (PPI) binding hot spots. It provides valuable insights for drug design with therapeutic implications. We have also demonstrated the pathological conditions of heat shock protein 27 (HSP27) and matrix metallopoproteinases (MMP2 and MMP9) and designed inhibitors of these proteins using the drug discovery paradigm in a case study on the discovery of drug molecules for cancer treatment. Additionally, the implications of benzothiazole derivatives for anticancer drug design and discovery are deliberated.
Collapse
Affiliation(s)
- Suvendu Nandi
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Soumyadeep Bhaduri
- Centre for Computational and Data Sciences, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Debraj Das
- Centre for Computational and Data Sciences, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Priya Ghosh
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Mahitosh Mandal
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| |
Collapse
|
6
|
Wang Z, Wang S, Li Y, Guo J, Wei Y, Mu Y, Zheng L, Li W. A new paradigm for applying deep learning to protein-ligand interaction prediction. Brief Bioinform 2024; 25:bbae145. [PMID: 38581420 PMCID: PMC10998640 DOI: 10.1093/bib/bbae145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/21/2024] [Accepted: 03/18/2024] [Indexed: 04/08/2024] Open
Abstract
Protein-ligand interaction prediction presents a significant challenge in drug design. Numerous machine learning and deep learning (DL) models have been developed to accurately identify docking poses of ligands and active compounds against specific targets. However, current models often suffer from inadequate accuracy or lack practical physical significance in their scoring systems. In this research paper, we introduce IGModel, a novel approach that utilizes the geometric information of protein-ligand complexes as input for predicting the root mean square deviation of docking poses and the binding strength (pKd, the negative value of the logarithm of binding affinity) within the same prediction framework. This ensures that the output scores carry intuitive meaning. We extensively evaluate the performance of IGModel on various docking power test sets, including the CASF-2016 benchmark, PDBbind-CrossDocked-Core and DISCO set, consistently achieving state-of-the-art accuracies. Furthermore, we assess IGModel's generalizability and robustness by evaluating it on unbiased test sets and sets containing target structures generated by AlphaFold2. The exceptional performance of IGModel on these sets demonstrates its efficacy. Additionally, we visualize the latent space of protein-ligand interactions encoded by IGModel and conduct interpretability analysis, providing valuable insights. This study presents a novel framework for DL-based prediction of protein-ligand interactions, contributing to the advancement of this field. The IGModel is available at GitHub repository https://github.com/zchwang/IGModel.
Collapse
Affiliation(s)
- Zechen Wang
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| | - Sheng Wang
- Shanghai Zelixir Biotech, Xiangke Road, 200030, Shanghai, China
| | - Yangyang Li
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| | - Jingjing Guo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Rua de Luís Gonzaga Gomes, Macao, China
| | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Xueyuan Road 1068, Shenzhen, 518055 Guang Dong, China
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, Singapore
| | - Liangzhen Zheng
- Shanghai Zelixir Biotech, Xiangke Road, 200030, Shanghai, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Xueyuan Road 1068, Shenzhen, 518055 Guang Dong, China
| | - Weifeng Li
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| |
Collapse
|
7
|
Kravchenko A, de Vries SJ, Smaïl-Tabbone M, Chauvot de Beauchene I. HIPPO: HIstogram-based Pseudo-POtential for scoring protein-ssRNA fragment-based docking poses. BMC Bioinformatics 2024; 25:129. [PMID: 38532339 DOI: 10.1186/s12859-024-05733-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 03/06/2024] [Indexed: 03/28/2024] Open
Abstract
BACKGROUND The RNA-Recognition motif (RRM) is a protein domain that binds single-stranded RNA (ssRNA) and is present in as much as 2% of the human genome. Despite this important role in biology, RRM-ssRNA interactions are very challenging to study on the structural level because of the remarkable flexibility of ssRNA. In the absence of atomic-level experimental data, the only method able to predict the 3D structure of protein-ssRNA complexes with any degree of accuracy is ssRNA'TTRACT, an ssRNA fragment-based docking approach using ATTRACT. However, since ATTRACT parameters are not ssRNA-specific and were determined in 2010, there is substantial opportunity for enhancement. RESULTS Here we present HIPPO, a composite RRM-ssRNA scoring potential derived analytically from contact frequencies in near-native versus non-native docking models. HIPPO consists of a consensus of four distinct potentials, each extracted from a distinct reference pool of protein-trinucleotide docking decoys. To score a docking pose with one potential, for each pair of RNA-protein coarse-grained bead types, each contact is awarded or penalised according to the relative frequencies of this contact distance range among the correct and incorrect poses of the reference pool. Validated on a fragment-based docking benchmark of 57 experimentally solved RRM-ssRNA complexes, HIPPO achieved a threefold or higher enrichment for half of the fragments, versus only a quarter with the ATTRACT scoring function. In particular, HIPPO drastically improved the chance of very high enrichment (12-fold or higher), a scenario where the incremental modelling of entire ssRNA chains from fragments becomes viable. However, for the latter result, more research is needed to make it directly practically applicable. Regardless, our approach already improves upon the state of the art in RRM-ssRNA modelling and is in principle extendable to other types of protein-nucleic acid interactions.
Collapse
Affiliation(s)
- Anna Kravchenko
- Université de Lorraine, CNRS, Inria, LORIA, 54000, Nancy, France
| | | | | | | |
Collapse
|
8
|
Metcalf DP, Glick ZL, Bortolato A, Jiang A, Cheney DL, Sherrill CD. Directional Δ G Neural Network (DrΔ G-Net): A Modular Neural Network Approach to Binding Free Energy Prediction. J Chem Inf Model 2024; 64:1907-1918. [PMID: 38470995 DOI: 10.1021/acs.jcim.3c02054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2024]
Abstract
The protein-ligand binding free energy is a central quantity in structure-based computational drug discovery efforts. Although popular alchemical methods provide sound statistical means of computing the binding free energy of a large breadth of systems, they are generally too costly to be applied at the same frequency as end point or ligand-based methods. By contrast, these data-driven approaches are typically fast enough to address thousands of systems but with reduced transferability to unseen systems. We introduce DrΔG-Net (or simply Dragnet), an equivariant graph neural network that can blend ligand-based and protein-ligand data-driven approaches. It is based on a 3D fingerprint representation of the ligand alone and in complex with the protein target. Dragnet is a global scoring function to predict the binding affinity of arbitrary protein-ligand complexes, but can be easily tuned via transfer learning to specific systems or end points, performing similarly to common 2D ligand-based approaches in these tasks. Dragnet is evaluated on a total of 28 validation proteins with a set of congeneric ligands derived from the Binding DB and one custom set extracted from the ChEMBL Database. In general, a handful of experimental binding affinities are sufficient to optimize the scoring function for a particular protein and ligand scaffold. When not available, predictions from physics-based methods such as absolute free energy perturbation can be used for the transfer learning tuning of Dragnet. Furthermore, we use our data to illustrate the present limitations of data-driven modeling of binding free energy predictions.
Collapse
Affiliation(s)
- Derek P Metcalf
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United States
| | - Zachary L Glick
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United States
| | - Andrea Bortolato
- Molecular Structure and Design, Bristol-Myers Squibb Company, P.O. Box 5400, Princeton, New Jersey 08543, United States
| | - Andy Jiang
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United States
| | - Daniel L Cheney
- Molecular Structure and Design, Bristol-Myers Squibb Company, P.O. Box 5400, Princeton, New Jersey 08543, United States
| | - C David Sherrill
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United States
| |
Collapse
|
9
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
10
|
Chakrabarti M, Tan YS, Balius TE. Considerations Around Structure-Based Drug Discovery for KRAS Using DOCK. Methods Mol Biol 2024; 2797:67-90. [PMID: 38570453 DOI: 10.1007/978-1-0716-3822-4_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2024]
Abstract
Molecular docking is a popular computational tool in drug discovery. Leveraging structural information, docking software predicts binding poses of small molecules to cavities on the surfaces of proteins. Virtual screening for ligand discovery is a useful application of docking software. In this chapter, using the enigmatic KRAS protein as an example system, we endeavor to teach the reader about best practices for performing molecular docking with UCSF DOCK. We discuss methods for virtual screening and docking molecules on KRAS. We present the following six points to optimize our docking setup for prosecuting a virtual screen: protein structure choice, pocket selection, optimization of the scoring function, modification of sampling spheres and sampling procedures, choosing an appropriate portion of chemical space to dock, and the choice of which top scoring molecules to pick for purchase.
Collapse
Affiliation(s)
- Mayukh Chakrabarti
- NCI RAS Initiative, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Y Stanley Tan
- NCI RAS Initiative, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Trent E Balius
- NCI RAS Initiative, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
| |
Collapse
|
11
|
Baillif B, Cole J, Giangreco I, McCabe P, Bender A. Applying atomistic neural networks to bias conformer ensembles towards bioactive-like conformations. J Cheminform 2023; 15:124. [PMID: 38129933 PMCID: PMC10740246 DOI: 10.1186/s13321-023-00794-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 12/10/2023] [Indexed: 12/23/2023] Open
Abstract
Identifying bioactive conformations of small molecules is an essential process for virtual screening applications relying on three-dimensional structure such as molecular docking. For most small molecules, conformer generators retrieve at least one bioactive-like conformation, with an atomic root-mean-square deviation (ARMSD) lower than 1 Å, among the set of low-energy conformers generated. However, there is currently no general method to prioritise these likely target-bound conformations in the ensemble. In this work, we trained atomistic neural networks (AtNNs) on 3D information of generated conformers of a curated subset of PDBbind ligands to predict the ARMSD to their closest bioactive conformation, and evaluated the early enrichment of bioactive-like conformations when ranking conformers by AtNN prediction. AtNN ranking was compared with bioactivity-unaware baselines such as ascending Sage force field energy ranking, and a slower bioactivity-based baseline ranking by ascending Torsion Fingerprint Deviation to the Maximum Common Substructure to the most similar molecule in the training set (TFD2SimRefMCS). On test sets from random ligand splits of PDBbind, ranking conformers using ComENet, the AtNN encoding the most 3D information, leads to early enrichment of bioactive-like conformations with a median BEDROC of 0.29 ± 0.02, outperforming the best bioactivity-unaware Sage energy ranking baseline (median BEDROC of 0.18 ± 0.02), and performing on a par with the bioactivity-based TFD2SimRefMCS baseline (median BEDROC of 0.31 ± 0.02). The improved performance of the AtNN and TFD2SimRefMCS baseline is mostly observed on test set ligands that bind proteins similar to proteins observed in the training set. On a more challenging subset of flexible molecules, the bioactivity-unaware baselines showed median BEDROCs up to 0.02, while AtNNs and TFD2SimRefMCS showed median BEDROCs between 0.09 and 0.13. When performing rigid ligand re-docking of PDBbind ligands with GOLD using the 1% top-ranked conformers, ComENet ranked conformers showed a higher successful docking rate than bioactivity-unaware baselines, with a rate of 0.48 ± 0.02 compared to CSD probability baseline with a rate of 0.39 ± 0.02. Similarly, on a pharmacophore searching experiment, selecting the 20% top-ranked conformers ranked by ComENet showed higher hit rate compared to baselines. Hence, the approach presented here uses AtNNs successfully to focus conformer ensembles towards bioactive-like conformations, representing an opportunity to reduce computational expense in virtual screening applications on known targets that require input conformations.
Collapse
Affiliation(s)
- Benoit Baillif
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Rd, Cambridge, CB2 1EW, UK
| | - Jason Cole
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, CB2 1EZ, UK
| | - Ilenia Giangreco
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, CB2 1EZ, UK
- Exscientia plc, The Schrödinger Building, Oxford Science Park, Oxford, OX4 4GE, UK
| | - Patrick McCabe
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, CB2 1EZ, UK
| | - Andreas Bender
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Rd, Cambridge, CB2 1EW, UK.
| |
Collapse
|
12
|
Li Y, Fan Z, Rao J, Chen Z, Chu Q, Zheng M, Li X. An overview of recent advances and challenges in predicting compound-protein interaction (CPI). MEDICAL REVIEW (2021) 2023; 3:465-486. [PMID: 38282802 PMCID: PMC10808869 DOI: 10.1515/mr-2023-0030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 08/30/2023] [Indexed: 01/30/2024]
Abstract
Compound-protein interactions (CPIs) are critical in drug discovery for identifying therapeutic targets, drug side effects, and repurposing existing drugs. Machine learning (ML) algorithms have emerged as powerful tools for CPI prediction, offering notable advantages in cost-effectiveness and efficiency. This review provides an overview of recent advances in both structure-based and non-structure-based CPI prediction ML models, highlighting their performance and achievements. It also offers insights into CPI prediction-related datasets and evaluation benchmarks. Lastly, the article presents a comprehensive assessment of the current landscape of CPI prediction, elucidating the challenges faced and outlining emerging trends to advance the field.
Collapse
Affiliation(s)
- Yanbei Li
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhehuan Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jingxin Rao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhiyi Chen
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Qinyu Chu
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
13
|
Yan J, Ye Z, Yang Z, Lu C, Zhang S, Liu Q, Qiu J. Multi-task bioassay pre-training for protein-ligand binding affinity prediction. Brief Bioinform 2023; 25:bbad451. [PMID: 38084920 PMCID: PMC10783875 DOI: 10.1093/bib/bbad451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/27/2023] [Accepted: 11/15/2023] [Indexed: 12/18/2023] Open
Abstract
Protein-ligand binding affinity (PLBA) prediction is the fundamental task in drug discovery. Recently, various deep learning-based models predict binding affinity by incorporating the three-dimensional (3D) structure of protein-ligand complexes as input and achieving astounding progress. However, due to the scarcity of high-quality training data, the generalization ability of current models is still limited. Although there is a vast amount of affinity data available in large-scale databases such as ChEMBL, issues such as inconsistent affinity measurement labels (i.e. IC50, Ki, Kd), different experimental conditions, and the lack of available 3D binding structures complicate the development of high-precision affinity prediction models using these data. To address these issues, we (i) propose Multi-task Bioassay Pre-training (MBP), a pre-training framework for structure-based PLBA prediction; (ii) construct a pre-training dataset called ChEMBL-Dock with more than 300k experimentally measured affinity labels and about 2.8M docked 3D structures. By introducing multi-task pre-training to treat the prediction of different affinity labels as different tasks and classifying relative rankings between samples from the same bioassay, MBP learns robust and transferrable structural knowledge from our new ChEMBL-Dock dataset with varied and noisy labels. Experiments substantiate the capability of MBP on the structure-based PLBA prediction task. To the best of our knowledge, MBP is the first affinity pre-training model and shows great potential for future development. MBP web-server is now available for free at: https://huggingface.co/spaces/jiaxianustc/mbp.
Collapse
Affiliation(s)
- Jiaxian Yan
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Zhaofeng Ye
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Ziyi Yang
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Chengqiang Lu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Shengyu Zhang
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Qi Liu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Jiezhong Qiu
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| |
Collapse
|
14
|
Libouban PY, Aci-Sèche S, Gómez-Tamayo JC, Tresadern G, Bonnet P. The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks. Int J Mol Sci 2023; 24:16120. [PMID: 38003312 PMCID: PMC10671244 DOI: 10.3390/ijms242216120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 11/26/2023] Open
Abstract
Artificial intelligence (AI) has gained significant traction in the field of drug discovery, with deep learning (DL) algorithms playing a crucial role in predicting protein-ligand binding affinities. Despite advancements in neural network architectures, system representation, and training techniques, the performance of DL affinity prediction has reached a plateau, prompting the question of whether it is truly solved or if the current performance is overly optimistic and reliant on biased, easily predictable data. Like other DL-related problems, this issue seems to stem from the training and test sets used when building the models. In this work, we investigate the impact of several parameters related to the input data on the performance of neural network affinity prediction models. Notably, we identify the size of the binding pocket as a critical factor influencing the performance of our statistical models; furthermore, it is more important to train a model with as much data as possible than to restrict the training to only high-quality datasets. Finally, we also confirm the bias in the typically used current test sets. Therefore, several types of evaluation and benchmarking are required to understand models' decision-making processes and accurately compare the performance of models.
Collapse
Affiliation(s)
- Pierre-Yves Libouban
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Samia Aci-Sèche
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Jose Carlos Gómez-Tamayo
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Gary Tresadern
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Pascal Bonnet
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| |
Collapse
|
15
|
Francoeur PG, Koes DR. Expanding Training Data for Structure-Based Receptor-Ligand Binding Affinity Regression through Imputation of Missing Labels. ACS OMEGA 2023; 8:41680-41688. [PMID: 37970017 PMCID: PMC10634251 DOI: 10.1021/acsomega.3c05931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 10/10/2023] [Accepted: 10/17/2023] [Indexed: 11/17/2023]
Abstract
The success of machine learning is, in part, due to a large volume of data available to train models. However, the amount of training data for structure-based molecular property prediction remains limited. The previously described CrossDocked2020 data set expanded the available training data for binding pose classification in a molecular docking setting but did not address expanding the amount of receptor-ligand binding affinity data. We present experiments demonstrating that imputing binding affinity labels for complexes without experimentally determined binding affinities is a viable approach to expanding training data for structure-based models of receptor-ligand binding affinity. In particular, we demonstrate that utilizing imputed labels from a convolutional neural network trained only on the affinity data present in CrossDocked2020 results in a small improvement in the binding affinity regression performance, despite the additional sources of noise that such imputed labels add to the training data. The code, data splits, and imputation labels utilized in this paper are freely available at https://github.com/francoep/ImputationPaper.
Collapse
Affiliation(s)
- Paul G. Francoeur
- Department of Computational and Systems
Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - David R. Koes
- Department of Computational and Systems
Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| |
Collapse
|
16
|
Barsbey M, ÖZçelİk R, Bağ A, Atil B, ÖZgür A, Ozkirimli E. A Computational Software for Training Robust Drug-Target Affinity Prediction Models: pydebiaseddta. J Comput Biol 2023; 30:1240-1245. [PMID: 37988394 DOI: 10.1089/cmb.2023.0194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023] Open
Abstract
Robust generalization of drug-target affinity (DTA) prediction models is a notoriously difficult problem in computational drug discovery. In this article, we present pydebiaseddta: a computational software for improving the generalizability of DTA prediction models to novel ligands and/or proteins. pydebiaseddta serves as the practical implementation of the DebiasedDTA training framework, which advocates modifying the training distribution to mitigate the effect of spurious correlations in the training data set that leads to substantially degraded performance for novel ligands and proteins. Written in Python programming language, pydebiaseddta combines a user-friendly streamlined interface with a feature-rich and highly modifiable architecture. With this article we introduce our software, showcase its main functionalities, and describe practical ways for new users to engage with it.
Collapse
Affiliation(s)
- Melİh Barsbey
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
| | - Riza ÖZçelİk
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
| | - Alperen Bağ
- Technical University of Munich, Munich, Germany
| | - Berk Atil
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
| | - Arzucan ÖZgür
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
| | - Elif Ozkirimli
- Roche Informatics, F. Hoffmann-La Roche AG, Basel, Switzerland
| |
Collapse
|
17
|
Kemp BA, Howell NL, Gildea JJ, Hinkle JD, Shabanowitz J, Hunt DF, Conaway MR, Keller SR, Carey RM. Evidence That Binding of Cyclic GMP to the Extracellular Domain of NKA (Sodium-Potassium ATPase) Mediates Natriuresis. Circ Res 2023; 132:1127-1140. [PMID: 36919600 PMCID: PMC10171454 DOI: 10.1161/circresaha.122.321693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 03/07/2023] [Indexed: 03/16/2023]
Abstract
BACKGROUND Extracellular renal interstitial guanosine cyclic 3',5'-monophosphate (cGMP) inhibits renal proximal tubule (RPT) sodium (Na+) reabsorption via Src (Src family kinase) activation. Through which target extracellular cGMP acts to induce natriuresis is unknown. We hypothesized that cGMP binds to the extracellular α1-subunit of NKA (sodium-potassium ATPase) on RPT basolateral membranes to inhibit Na+ transport similar to ouabain-a cardiotonic steroid. METHODS Urine Na+ excretion was measured in uninephrectomized 12-week-old female Sprague-Dawley rats that received renal interstitial infusions of vehicle (5% dextrose in water), cGMP (18, 36, and 72 μg/kg per minute; 30 minutes each), or cGMP+rostafuroxin (12 ng/kg per minute) or were subjected to pressure-natriuresis±rostafuroxin infusion. Rostafuroxin is a digitoxigenin derivative that displaces ouabain from NKA. RESULTS Renal interstitial cGMP and raised renal perfusion pressure induced natriuresis and increased phosphorylated SrcTyr416 and Erk 1/2 (extracellular signal-regulated protein kinase 1/2)Thr202/Tyr204; these responses were abolished with rostafuroxin coinfusion. To assess cGMP binding to NKA, we performed competitive binding studies with isolated rat RPTs using bodipy-ouabain (2 μM)+cGMP (10 µM) or rostafuroxin (10 µM) and 8-biotin-11-cGMP (2 μM)+ouabain (10 μM) or rostafuroxin (10 µM). cGMP or rostafuroxin reduced bodipy-ouabain fluorescence intensity, and ouabain or rostafuroxin reduced 8-biotin-11-cGMP staining. We cross-linked isolated rat RPTs with 4-N3-PET-8-biotin-11-cGMP (2 μM); 8-N3-6-biotin-10-cAMP served as negative control. Precipitation with streptavidin beads followed by immunoblot analysis showed that RPTs after cross-linking with 4-N3-PET-8-biotin-11-cGMP exhibited a significantly stronger signal for NKA than non-cross-linked samples and cross-linked or non-cross-linked 8-N3-6-biotin-10-cAMP RPTs. Ouabain (10 μM) reduced NKA in cross-linked 4-N3-PET-8-biotin-11-cGMP RPTs confirming fluorescence staining. 4-N3-PET-8-biotin-11-cGMP cross-linked samples were separated by SDS gel electrophoresis and slices corresponding to NKA molecular weight excised and processed for mass spectrometry. NKA was the second most abundant protein with 50 unique NKA peptides covering 47% of amino acids in NKA. Molecular modeling demonstrated a potential cGMP docking site in the ouabain-binding pocket of NKA. CONCLUSIONS cGMP can bind to NKA and thereby mediate natriuresis.
Collapse
Affiliation(s)
- Brandon A Kemp
- Department of Medicine, Division of Endocrinology and Metabolism (B.A.K., N.L.H., S.R.K., R.M.C.), University of Virginia, Charlottesville
| | - Nancy L Howell
- Department of Medicine, Division of Endocrinology and Metabolism (B.A.K., N.L.H., S.R.K., R.M.C.), University of Virginia, Charlottesville
| | - John J Gildea
- Department of Pathology (J.J.G.), University of Virginia, Charlottesville
| | - Josh D Hinkle
- Department of Chemistry (J.D.H., J.S., D.F.H.), University of Virginia, Charlottesville
| | - Jeffrey Shabanowitz
- Department of Chemistry (J.D.H., J.S., D.F.H.), University of Virginia, Charlottesville
| | - Donald F Hunt
- Department of Chemistry (J.D.H., J.S., D.F.H.), University of Virginia, Charlottesville
| | - Mark R Conaway
- Division of Translational Research and Applied Statistics, Department of Public Health Sciences (M.R.C.), University of Virginia, Charlottesville
| | - Susanna R Keller
- Department of Medicine, Division of Endocrinology and Metabolism (B.A.K., N.L.H., S.R.K., R.M.C.), University of Virginia, Charlottesville
| | - Robert M Carey
- Department of Medicine, Division of Endocrinology and Metabolism (B.A.K., N.L.H., S.R.K., R.M.C.), University of Virginia, Charlottesville
| |
Collapse
|