1
|
Kumar GS, Dubey A, Panda SP, Alawi MM, Sindi AA, Azhar EI, Dwivedi VD, Agrawal S. Repurposing of antibacterial compounds for suppression of Mycobacterium tuberculosis dormancy reactivation by targeting resuscitation-promoting factors B. J Biomol Struct Dyn 2024; 42:6850-6862. [PMID: 37551014 DOI: 10.1080/07391102.2023.2245059] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 07/08/2023] [Indexed: 08/09/2023]
Abstract
Tuberculosis infection has always been a global concern for public health, and the mortality rate has increased tremendously every year. The ability of the resuscitation Mycobacterium tuberculosis (Mtb) from the dormant state is one of the major reasons for the epidemic spread of tuberculosis infection, especially latent tuberculosis infection (LTBI). The element that encourages resuscitation, RpfB (resuscitation-promoting factors B), is mostly in charge of bringing Mtb out of slumber. This reason makes RpfB a promising target for developing tuberculosis drugs because of the effects of latent tuberculosis. Therefore, this work was executed using a computational three-level screening of the Selleckhem antibiotics database consisting of 462 antibiotics against the ligand binding region of the RpfB protein, followed by an estimation of binding free energy for ideal identification and confirmation of potential RpfB inhibitor. Subsequently, three antibiotic drug molecules, i.e., Amikacin hydrate (-66.87 kcal/mol), Isepamicin sulphate (-60.8 kcal/mol), and Bekanamycin (-46.89 kcal/mol), were selected on the basis of their binding free energy value for further computational studies in comparison to reference ligand, 4-benzoyl-2-nitrophenyl thiocyanate (NPT7). Based on the intermolecular interaction profiling, 200 ns molecular dynamic simulation (MD), post-simulation analysis and principal component analysis (PCA), the selected antibiotics showed substantial stability with the RpfB protein compared to the NPT7 inhibitor. Conclusively based on the computational results, the preferred drugs can be potent inhibitors of the RpfB protein, which can be further validated using in vivo research and in vitro enzyme inhibition to understand their therapeutic activity against tuberculosis infection.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Geethu S Kumar
- Department of Life Science, School of Basic Science and Research, Sharda University, Greater Noida, India
| | - Amit Dubey
- Computational Chemistry and Drug Discovery Division, Quanta Calculus, Greater Noida, India
| | - Siva Prasad Panda
- Institute of Pharmaceutical Research, GLA University, Mathura, India
| | - Maha M Alawi
- Special Infectious Agents Unit-BSL3, King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Medical Microbiology and Parasitology, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Anees A Sindi
- Special Infectious Agents Unit-BSL3, King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Anesthesia and Critical Care, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
- Pulmonary and Critical Care Department, International Medical Center Hospital, Jeddah, Saudi Arabia
| | - Esam I Azhar
- Special Infectious Agents Unit-BSL3, King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | | | - Sharad Agrawal
- Department of Life Science, School of Basic Science and Research, Sharda University, Greater Noida, India
| |
Collapse
|
2
|
Venkatraman V, Gaiser J, Demekas D, Roy A, Xiong R, Wheeler TJ. Do Molecular Fingerprints Identify Diverse Active Drugs in Large-Scale Virtual Screening? (No). Pharmaceuticals (Basel) 2024; 17:992. [PMID: 39204097 PMCID: PMC11356940 DOI: 10.3390/ph17080992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 07/18/2024] [Accepted: 07/23/2024] [Indexed: 09/03/2024] Open
Abstract
Computational approaches for small-molecule drug discovery now regularly scale to the consideration of libraries containing billions of candidate small molecules. One promising approach to increased the speed of evaluating billion-molecule libraries is to develop succinct representations of each molecule that enable the rapid identification of molecules with similar properties. Molecular fingerprints are thought to provide a mechanism for producing such representations. Here, we explore the utility of commonly used fingerprints in the context of predicting similar molecular activity. We show that fingerprint similarity provides little discriminative power between active and inactive molecules for a target protein based on a known active-while they may sometimes provide some enrichment for active molecules in a drug screen, a screened data set will still be dominated by inactive molecules. We also demonstrate that high-similarity actives appear to share a scaffold with the query active, meaning that they could more easily be identified by structural enumeration. Furthermore, even when limited to only active molecules, fingerprint similarity values do not correlate with compound potency. In sum, these results highlight the need for a new wave of molecular representations that will improve the capacity to detect biologically active molecules based on their similarity to other such molecules.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Chemistry, Norwegian University of Science and Technology, 7034 Trondheim, Norway
| | - Jeremiah Gaiser
- School of Information, University of Arizona, Tucson, AZ 85721, USA
| | - Daphne Demekas
- R. Ken Coit College Pharmacy, University of Arizona, Tucson, AZ 85721, USA
| | - Amitava Roy
- Rocky Mountain Laboratories, Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Hamilton, MT 59840, USA;
- Department of Biomedical and Pharmaceutical Sciences, University of Montana, Missoula, MT 59812, USA
| | - Rui Xiong
- Department of Pharmacology & Toxicology, University of Arizona, Tucson, AZ 85721, USA
| | - Travis J. Wheeler
- R. Ken Coit College Pharmacy, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
3
|
Abbas MKG, Rassam A, Karamshahi F, Abunora R, Abouseada M. The Role of AI in Drug Discovery. Chembiochem 2024; 25:e202300816. [PMID: 38735845 DOI: 10.1002/cbic.202300816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 05/09/2024] [Accepted: 05/10/2024] [Indexed: 05/14/2024]
Abstract
The emergence of Artificial Intelligence (AI) in drug discovery marks a pivotal shift in pharmaceutical research, blending sophisticated computational techniques with conventional scientific exploration to break through enduring obstacles. This review paper elucidates the multifaceted applications of AI across various stages of drug development, highlighting significant advancements and methodologies. It delves into AI's instrumental role in drug design, polypharmacology, chemical synthesis, drug repurposing, and the prediction of drug properties such as toxicity, bioactivity, and physicochemical characteristics. Despite AI's promising advancements, the paper also addresses the challenges and limitations encountered in the field, including data quality, generalizability, computational demands, and ethical considerations. By offering a comprehensive overview of AI's role in drug discovery, this paper underscores the technology's potential to significantly enhance drug development, while also acknowledging the hurdles that must be overcome to fully realize its benefits.
Collapse
Affiliation(s)
- M K G Abbas
- Center for Advanced Materials, Qatar University, P.O. Box, 2713, Doha, Qatar
| | - Abrar Rassam
- Secondary Education, Educational Sciences, Qatar University, P.O. Box, 2713, Doha, Qatar
| | - Fatima Karamshahi
- Department of Chemistry and Earth Sciences, Qatar University, P.O. Box, 2713, Doha, Qatar
| | - Rehab Abunora
- Faculty of Medicine, General Medicine and Surgery, Helwan University, Cairo, Egypt
| | - Maha Abouseada
- Department of Chemistry and Earth Sciences, Qatar University, P.O. Box, 2713, Doha, Qatar
| |
Collapse
|
4
|
Schmidt B, Hildebrandt A. From GPUs to AI and quantum: three waves of acceleration in bioinformatics. Drug Discov Today 2024; 29:103990. [PMID: 38663581 DOI: 10.1016/j.drudis.2024.103990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/05/2024] [Accepted: 04/17/2024] [Indexed: 05/01/2024]
Abstract
The enormous growth in the amount of data generated by the life sciences is continuously shifting the field from model-driven science towards data-driven science. The need for efficient processing has led to the adoption of massively parallel accelerators such as graphics processing units (GPUs). Consequently, the development of bioinformatics methods nowadays often heavily depends on the effective use of these powerful technologies. Furthermore, progress in computational techniques and architectures continues to be highly dynamic, involving novel deep neural network models and artificial intelligence (AI) accelerators, and potentially quantum processing units in the future. These are expected to be disruptive for the life sciences as a whole and for drug discovery in particular. Here, we identify three waves of acceleration and their applications in a bioinformatics context: (i) GPU computing, (ii) AI and (iii) next-generation quantum computers.
Collapse
Affiliation(s)
- Bertil Schmidt
- Institut für Informatik, Johannes Gutenberg University, Mainz, Germany.
| | | |
Collapse
|
5
|
Zhou Y, Chen SJ. Advances in machine-learning approaches to RNA-targeted drug design. ARTIFICIAL INTELLIGENCE CHEMISTRY 2024; 2:100053. [PMID: 38434217 PMCID: PMC10904028 DOI: 10.1016/j.aichem.2024.100053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
RNA molecules play multifaceted functional and regulatory roles within cells and have garnered significant attention in recent years as promising therapeutic targets. With remarkable successes achieved by artificial intelligence (AI) in different fields such as computer vision and natural language processing, there is a growing imperative to harness AI's potential in computer-aided drug design (CADD) to discover novel drug compounds that target RNA. Although machine-learning (ML) approaches have been widely adopted in the discovery of small molecules targeting proteins, the application of ML approaches to model interactions between RNA and small molecule is still in its infancy. Compared to protein-targeted drug discovery, the major challenges in ML-based RNA-targeted drug discovery stem from the scarcity of available data resources. With the growing interest and the development of curated databases focusing on interactions between RNA and small molecule, the field anticipates a rapid growth and the opening of a new avenue for disease treatment. In this review, we aim to provide an overview of recent advancements in computationally modeling RNA-small molecule interactions within the context of RNA-targeted drug discovery, with a particular emphasis on methodologies employing ML techniques.
Collapse
Affiliation(s)
- Yuanzhe Zhou
- Department of Physics and Astronomy, University of Missouri, Columbia, MO 65211-7010, USA
| | - Shi-Jie Chen
- Department of Physics and Astronomy, Department of Biochemistry, Institute of Data Sciences and Informatics, University of Missouri, Columbia, MO 65211-7010, USA
| |
Collapse
|
6
|
Caba K, Tran-Nguyen VK, Rahman T, Ballester PJ. Comprehensive machine learning boosts structure-based virtual screening for PARP1 inhibitors. J Cheminform 2024; 16:40. [PMID: 38582911 PMCID: PMC10999096 DOI: 10.1186/s13321-024-00832-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 03/23/2024] [Indexed: 04/08/2024] Open
Abstract
Poly ADP-ribose polymerase 1 (PARP1) is an attractive therapeutic target for cancer treatment. Machine-learning scoring functions constitute a promising approach to discovering novel PARP1 inhibitors. Cutting-edge PARP1-specific machine-learning scoring functions were investigated using semi-synthetic training data from docking activity-labelled molecules: known PARP1 inhibitors, hard-to-discriminate decoys property-matched to them with generative graph neural networks and confirmed inactives. We further made test sets harder by including only molecules dissimilar to those in the training set. Comprehensive analysis of these datasets using five supervised learning algorithms, and protein-ligand fingerprints extracted from docking poses and ligand only features revealed one highly predictive scoring function. This is the PARP1-specific support vector machine-based regressor, when employing PLEC fingerprints, which achieved a high Normalized Enrichment Factor at the top 1% on the hardest test set (NEF1% = 0.588, median of 10 repetitions), and was more predictive than any other investigated scoring function, especially the classical scoring function employed as baseline.
Collapse
Affiliation(s)
- Klaudia Caba
- Department of Bioengineering, Imperial College London, London, SW7 2AZ, UK
| | - Viet-Khoa Tran-Nguyen
- Unité de Biologie Fonctionnelle et Adaptative (BFA), UFR Sciences du Vivant, Université Paris Cité, 75013, Paris, France
| | - Taufiq Rahman
- Department of Pharmacology, University of Cambridge, Cambridge, CB2 1PD, UK
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
7
|
Metcalf D, Glick ZL, Bortolato A, Jiang A, Cheney DL, Sherrill CD. Directional Δ G Neural Network (DrΔ G-Net): A Modular Neural Network Approach to Binding Free Energy Prediction. J Chem Inf Model 2024; 64:1907-1918. [PMID: 38470995 PMCID: PMC10966643 DOI: 10.1021/acs.jcim.3c02054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 02/23/2024] [Accepted: 02/26/2024] [Indexed: 03/14/2024]
Abstract
The protein-ligand binding free energy is a central quantity in structure-based computational drug discovery efforts. Although popular alchemical methods provide sound statistical means of computing the binding free energy of a large breadth of systems, they are generally too costly to be applied at the same frequency as end point or ligand-based methods. By contrast, these data-driven approaches are typically fast enough to address thousands of systems but with reduced transferability to unseen systems. We introduce DrΔG-Net (or simply Dragnet), an equivariant graph neural network that can blend ligand-based and protein-ligand data-driven approaches. It is based on a 3D fingerprint representation of the ligand alone and in complex with the protein target. Dragnet is a global scoring function to predict the binding affinity of arbitrary protein-ligand complexes, but can be easily tuned via transfer learning to specific systems or end points, performing similarly to common 2D ligand-based approaches in these tasks. Dragnet is evaluated on a total of 28 validation proteins with a set of congeneric ligands derived from the Binding DB and one custom set extracted from the ChEMBL Database. In general, a handful of experimental binding affinities are sufficient to optimize the scoring function for a particular protein and ligand scaffold. When not available, predictions from physics-based methods such as absolute free energy perturbation can be used for the transfer learning tuning of Dragnet. Furthermore, we use our data to illustrate the present limitations of data-driven modeling of binding free energy predictions.
Collapse
Affiliation(s)
- Derek
P. Metcalf
- Center
for Computational Molecular Science and Technology, School of Chemistry
and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United
States
| | - Zachary L. Glick
- Center
for Computational Molecular Science and Technology, School of Chemistry
and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United
States
| | - Andrea Bortolato
- Molecular
Structure and Design, Bristol-Myers Squibb
Company, P.O. Box 5400, Princeton, New Jersey 08543, United States
| | - Andy Jiang
- Center
for Computational Molecular Science and Technology, School of Chemistry
and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United
States
| | - Daniel L. Cheney
- Molecular
Structure and Design, Bristol-Myers Squibb
Company, P.O. Box 5400, Princeton, New Jersey 08543, United States
| | - C. David Sherrill
- Center
for Computational Molecular Science and Technology, School of Chemistry
and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United
States
| |
Collapse
|
8
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
9
|
Knight IS, Mailhot O, Tang KG, Irwin JJ. DockOpt: A Tool for Automatic Optimization of Docking Models. J Chem Inf Model 2024; 64:1004-1016. [PMID: 38206771 PMCID: PMC10865354 DOI: 10.1021/acs.jcim.3c01406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 12/17/2023] [Accepted: 12/26/2023] [Indexed: 01/13/2024]
Abstract
Molecular docking is a widely used technique for leveraging protein structure for ligand discovery, but it remains difficult to utilize due to limitations that have not been adequately addressed. Despite some progress toward automation, docking still requires expert guidance, hindering its adoption by a broader range of investigators. To make docking more accessible, we developed a new utility called DockOpt, which automates the creation, evaluation, and optimization of docking models prior to their deployment in large-scale prospective screens. DockOpt outperforms our previous automated pipeline across all 43 targets in the DUDE-Z benchmark data set, and the generated models for 84% of targets demonstrate sufficient enrichment to warrant their use in prospective screens, with normalized LogAUC values of at least 15%. DockOpt is available as part of the Python package Pydock3 included in the UCSF DOCK 3.8 distribution, which is available for free to academic researchers at https://dock.compbio.ucsf.edu and free for everyone upon registration at https://tldr.docking.org.
Collapse
Affiliation(s)
- Ian S. Knight
- Department of Pharmaceutical Chemistry, UCSF, 1700 Fourth Street, San Francisco, California 94158-2330, United States
| | - Olivier Mailhot
- Department of Pharmaceutical Chemistry, UCSF, 1700 Fourth Street, San Francisco, California 94158-2330, United States
| | - Khanh G. Tang
- Department of Pharmaceutical Chemistry, UCSF, 1700 Fourth Street, San Francisco, California 94158-2330, United States
| | - John J. Irwin
- Department of Pharmaceutical Chemistry, UCSF, 1700 Fourth Street, San Francisco, California 94158-2330, United States
| |
Collapse
|
10
|
Gómez-Sacristán P, Simeon S, Tran-Nguyen VK, Patil S, Ballester PJ. Inactive-enriched machine-learning models exploiting patent data improve structure-based virtual screening for PDL1 dimerizers. J Adv Res 2024:S2090-1232(24)00037-7. [PMID: 38280715 DOI: 10.1016/j.jare.2024.01.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 12/01/2023] [Accepted: 01/21/2024] [Indexed: 01/29/2024] Open
Abstract
INTRODUCTION Small-molecule Programmable Cell Death Protein 1/Programmable Death-Ligand 1 (PD1/PDL1) inhibition via PDL1 dimerization has the potential to lead to inexpensive drugs with better cancer patient outcomes and milder side effects. However, this therapeutic approach has proven challenging, with only one PDL1 dimerizer reaching early clinical trials so far. There is hence a need for fast and accurate methods to develop alternative PDL1 dimerizers. OBJECTIVES We aim to show that structure-based virtual screening (SBVS) based on PDL1-specific machine-learning (ML) scoring functions (SFs) is a powerful drug design tool for detecting PD1/PDL1 inhibitors via PDL1 dimerization. METHODS By incorporating the latest MLSF advances, we generated and evaluated PDL1-specific MLSFs (classifiers and inactive-enriched regressors) on two demanding test sets. RESULTS 60 PDL1-specific MLSFs (30 classifiers and 30 regressors) were generated. Our large-scale analysis provides highly predictive PDL1-specific MLSFs that benefitted from training with large volumes of docked inactives and enabling inactive-enriched regression. CONCLUSION PDL1-specific MLSFs strongly outperformed generic SFs of various types on this target and are released here without restrictions.
Collapse
Affiliation(s)
| | - Saw Simeon
- Centre de Recherche en Cancérologie de Marseille, Marseille 13009, France
| | | | - Sachin Patil
- NanoBio Laboratory, Widener University, Chester, PA 19013, USA
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|
11
|
Sobhani N, Tardiel-Cyril DR, Chai D, Generali D, Li JR, Vazquez-Perez J, Lim JM, Morris R, Bullock ZN, Davtyan A, Cheng C, Decker WK, Li Y. Artificial intelligence-powered discovery of small molecules inhibiting CTLA-4 in cancer. BJC REPORTS 2024; 2:4. [PMID: 38312352 PMCID: PMC10838660 DOI: 10.1038/s44276-023-00035-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 12/14/2023] [Accepted: 12/28/2023] [Indexed: 02/06/2024]
Abstract
BACKGROUND/OBJECTIVES Checkpoint inhibitors, which generate durable responses in many cancer patients, have revolutionized cancer immunotherapy. However, their therapeutic efficacy is limited, and immune-related adverse events are severe, especially for monoclonal antibody treatment directed against cytotoxic T-lymphocyte-associated protein 4 (CTLA-4), which plays a pivotal role in preventing autoimmunity and fostering anticancer immunity by interacting with the B7 proteins CD80 and CD86. Small molecules impairing the CTLA-4/CD80 interaction have been developed; however, they directly target CD80, not CTLA-4. SUBJECTS/METHODS In this study, we performed artificial intelligence (AI)-powered virtual screening of approximately ten million compounds to identify those targeting CTLA-4. We validated the hits molecules with biochemical, biophysical, immunological, and experimental animal assays. RESULTS The primary hits obtained from the virtual screening were successfully validated in vitro and in vivo. We then optimized lead compounds and obtained inhibitors (inhibitory concentration, 1 micromole) that disrupted the CTLA-4/CD80 interaction without degrading CTLA-4. CONCLUSIONS Several compounds inhibited tumor development prophylactically and therapeutically in syngeneic and CTLA-4-humanized mice. Our findings support using AI-based frameworks to design small molecules targeting immune checkpoints for cancer therapy.
Collapse
Affiliation(s)
- Navid Sobhani
- Department of Medicine, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Cancer Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77054, USA
| | | | - Dafei Chai
- Department of Medicine, Baylor College of Medicine, Houston, TX 77030, USA
| | - Daniele Generali
- Department of Medical, Surgery and Health Sciences, University of Trieste, 34147 Trieste, Italy
| | - Jian-Rong Li
- Department of Medicine, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jonathan Vazquez-Perez
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jing Ming Lim
- Department of Medicine, Baylor College of Medicine, Houston, TX 77030, USA
| | - Rachel Morris
- Department of Cancer Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77054, USA
| | - Zaniqua N. Bullock
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Aram Davtyan
- Atomwise Inc., 717 Market St, Suite 800, San Francisco, CA 94103, USA
| | - Chao Cheng
- Department of Medicine, Baylor College of Medicine, Houston, TX 77030, USA
- Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - William K. Decker
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 77030, USA
- Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Cell and Gene Therapy, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yong Li
- Department of Medicine, Baylor College of Medicine, Houston, TX 77030, USA
- Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
12
|
Kuder KJ. Docking Foundations: From Rigid to Flexible Docking. Methods Mol Biol 2024; 2780:3-14. [PMID: 38987460 DOI: 10.1007/978-1-0716-3985-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Despite the development of methods for the experimental determination of protein structures, the dissonance between the number of known sequences and their solved structures is still enormous. This is particularly evident in protein-protein complexes. To fill this gap, diverse technologies have been developed to study protein-protein interactions (PPIs) in a cellular context including a range of biological and computational methods. The latter derive from techniques originally published and applied almost half a century ago and are based on interdisciplinary knowledge from the nexus of the fields of biology, chemistry, and physics about protein sequences, structures, and their folding. Protein-protein docking, the main protagonist of this chapter, is routinely treated as an integral part of protein research. Herein, we describe the basic foundations of the whole process in general terms, but step by step from protein representations through docking methods and evaluation of complexes to their final validation.
Collapse
Affiliation(s)
- Kamil J Kuder
- Department of Technology and Biotechnology of Drugs, Faculty of Pharmacy, Jagiellonian University Medical College, Kraków, Poland.
| |
Collapse
|
13
|
Abou Hajal A, Al Meslamani AZ. Insights into artificial intelligence utilisation in drug discovery. J Med Econ 2024; 27:304-308. [PMID: 38385328 DOI: 10.1080/13696998.2024.2315864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 02/05/2024] [Indexed: 02/23/2024]
Affiliation(s)
- Abdallah Abou Hajal
- College of Pharmacy, Al Ain University, Abu Dhabi, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi, United Arab Emirates
| | - Ahmad Z Al Meslamani
- College of Pharmacy, Al Ain University, Abu Dhabi, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi, United Arab Emirates
| |
Collapse
|
14
|
Li A, Bouhss A, Clément MJ, Bauvais C, Taylor JP, Bollot G, Pastré D. Using the structural diversity of RNA: protein interfaces to selectively target RNA with small molecules in cells: methods and perspectives. Front Mol Biosci 2023; 10:1298441. [PMID: 38033386 PMCID: PMC10687564 DOI: 10.3389/fmolb.2023.1298441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 10/24/2023] [Indexed: 12/02/2023] Open
Abstract
In recent years, RNA has gained traction both as a therapeutic molecule and as a therapeutic target in several human pathologies. In this review, we consider the approach of targeting RNA using small molecules for both research and therapeutic purposes. Given the primary challenge presented by the low structural diversity of RNA, we discuss the potential for targeting RNA: protein interactions to enhance the structural and sequence specificity of drug candidates. We review available tools and inherent challenges in this approach, ranging from adapted bioinformatics tools to in vitro and cellular high-throughput screening and functional analysis. We further consider two critical steps in targeting RNA/protein interactions: first, the integration of in silico and structural analyses to improve the efficacy of molecules by identifying scaffolds with high affinity, and second, increasing the likelihood of identifying on-target compounds in cells through a combination of high-throughput approaches and functional assays. We anticipate that the development of a new class of molecules targeting RNA: protein interactions to prevent physio-pathological mechanisms could significantly expand the arsenal of effective therapeutic compounds.
Collapse
Affiliation(s)
- Aixiao Li
- Synsight, Genopole Entreprises, Evry, France
| | - Ahmed Bouhss
- Université Paris-Saclay, INSERM U1204, Université d’Évry, Structure-Activité des Biomolécules Normales et Pathologiques (SABNP), Evry, France
| | - Marie-Jeanne Clément
- Université Paris-Saclay, INSERM U1204, Université d’Évry, Structure-Activité des Biomolécules Normales et Pathologiques (SABNP), Evry, France
| | | | - J. Paul Taylor
- Department of Cell and Molecular Biology, St. Jude Children’s Research Hospital, Memphis, TN, United States
| | | | - David Pastré
- Université Paris-Saclay, INSERM U1204, Université d’Évry, Structure-Activité des Biomolécules Normales et Pathologiques (SABNP), Evry, France
| |
Collapse
|
15
|
Hou Y, Bai Y, Lu C, Wang Q, Wang Z, Gao J, Xu H. Applying molecular docking to pesticides. PEST MANAGEMENT SCIENCE 2023; 79:4140-4152. [PMID: 37547967 DOI: 10.1002/ps.7700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 07/17/2023] [Accepted: 08/05/2023] [Indexed: 08/08/2023]
Abstract
Pesticide creation is related to the development of sustainable agricultural and ecological safety, and molecular docking technology can effectively help in pesticide innovation. This paper introduces the basic theory behind molecular docking, pesticide databases, and docking software. It also summarizes the application of molecular docking in the pesticide field, including the virtual screening of lead compounds, detection of pesticides and their metabolites in the environment, reverse screening of pesticide targets, and the study of resistance mechanisms. Finally, problems with the use of molecular docking technology in pesticide creation are discussed, and prospects for the future use of molecular docking technology in new pesticide development are discussed. © 2023 Society of Chemical Industry.
Collapse
Affiliation(s)
- Yang Hou
- Engineering Research Center of Pesticide of Heilongjiang Province, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin, China
| | - Yuqian Bai
- Engineering Research Center of Pesticide of Heilongjiang Province, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin, China
| | - Chang Lu
- Engineering Research Center of Pesticide of Heilongjiang Province, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin, China
| | - Qiuchan Wang
- Engineering Research Center of Pesticide of Heilongjiang Province, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin, China
| | - Zishi Wang
- Engineering Research Center of Pesticide of Heilongjiang Province, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin, China
| | - Jinsheng Gao
- Engineering Research Center of Pesticide of Heilongjiang Province, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin, China
| | - Hongliang Xu
- Engineering Research Center of Pesticide of Heilongjiang Province, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin, China
| |
Collapse
|
16
|
Tran-Nguyen VK, Junaid M, Simeon S, Ballester PJ. A practical guide to machine-learning scoring for structure-based virtual screening. Nat Protoc 2023; 18:3460-3511. [PMID: 37845361 DOI: 10.1038/s41596-023-00885-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 07/03/2023] [Indexed: 10/18/2023]
Abstract
Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol , can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.
Collapse
Affiliation(s)
| | - Muhammad Junaid
- Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | - Saw Simeon
- Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | | |
Collapse
|
17
|
Abdel-Rehim A, Orhobor O, Hang L, Ni H, King RD. Protein-ligand binding affinity prediction exploiting sequence constituent homology. Bioinformatics 2023; 39:btad502. [PMID: 37572302 PMCID: PMC10463547 DOI: 10.1093/bioinformatics/btad502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 07/10/2023] [Accepted: 08/11/2023] [Indexed: 08/14/2023] Open
Abstract
MOTIVATION Molecular docking is a commonly used approach for estimating binding conformations and their resultant binding affinities. Machine learning has been successfully deployed to enhance such affinity estimations. Many methods of varying complexity have been developed making use of some or all the spatial and categorical information available in these structures. The evaluation of such methods has mainly been carried out using datasets from PDBbind. Particularly the Comparative Assessment of Scoring Functions (CASF) 2007, 2013, and 2016 datasets with dedicated test sets. This work demonstrates that only a small number of simple descriptors is necessary to efficiently estimate binding affinity for these complexes without the need to know the exact binding conformation of a ligand. RESULTS The developed approach of using a small number of ligand and protein descriptors in conjunction with gradient boosting trees demonstrates high performance on the CASF datasets. This includes the commonly used benchmark CASF2016 where it appears to perform better than any other approach. This methodology is also useful for datasets where the spatial relationship between the ligand and protein is unknown as demonstrated using a large ChEMBL-derived dataset. AVAILABILITY AND IMPLEMENTATION Code and data uploaded to https://github.com/abbiAR/PLBAffinity.
Collapse
Affiliation(s)
- Abbi Abdel-Rehim
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
| | | | - Lou Hang
- Department of Mathematics, University College London, London WC1H 0AY, United Kingdom
| | - Hao Ni
- Department of Mathematics, University College London, London WC1H 0AY, United Kingdom
- The Alan Turing Institute, London NW1 2DB, United Kingdom
| | - Ross D King
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
- The Alan Turing Institute, London NW1 2DB, United Kingdom
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden
| |
Collapse
|
18
|
Boswell Z, Verga JU, Mackle J, Guerrero-Vazquez K, Thomas OP, Cray J, Wolf BJ, Choo YM, Croot P, Hamann MT, Hardiman G. In-Silico Approaches for the Screening and Discovery of Broad-Spectrum Marine Natural Product Antiviral Agents Against Coronaviruses. Infect Drug Resist 2023; 16:2321-2338. [PMID: 37155475 PMCID: PMC10122865 DOI: 10.2147/idr.s395203] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Accepted: 03/16/2023] [Indexed: 05/10/2023] Open
Abstract
The urgent need for SARS-CoV-2 controls has led to a reassessment of approaches to identify and develop natural product inhibitors of zoonotic, highly virulent, and rapidly emerging viruses. There are yet no clinically approved broad-spectrum antivirals available for beta-coronaviruses. Discovery pipelines for pan-virus medications against a broad range of betacoronaviruses are therefore a priority. A variety of marine natural product (MNP) small molecules have shown inhibitory activity against viral species. Access to large data caches of small molecule structural information is vital to finding new pharmaceuticals. Increasingly, molecular docking simulations are being used to narrow the space of possibilities and generate drug leads. Combining in-silico methods, augmented by metaheuristic optimization and machine learning (ML) allows the generation of hits from within a virtual MNP library to narrow screens for novel targets against coronaviruses. In this review article, we explore current insights and techniques that can be leveraged to generate broad-spectrum antivirals against betacoronaviruses using in-silico optimization and ML. ML approaches are capable of simultaneously evaluating different features for predicting inhibitory activity. Many also provide a semi-quantitative measure of feature relevance and can guide in selecting a subset of features relevant for inhibition of SARS-CoV-2.
Collapse
Affiliation(s)
- Zachary Boswell
- School of Biological Sciences and Institute for Global Security, Queen's University, Belfast, Northern Ireland, UK
| | - Jacopo Umberto Verga
- School of Biological Sciences and Institute for Global Security, Queen's University, Belfast, Northern Ireland, UK
- Genomic Data Science, University of Galway, Galway, Ireland
| | - James Mackle
- School of Biological Sciences and Institute for Global Security, Queen's University, Belfast, Northern Ireland, UK
| | | | - Olivier P Thomas
- School of Biological and Chemical Sciences, Ryan Institute, University of Galway, Galway, H91TK33Ireland
| | - James Cray
- Department of Biomedical Education and Anatomy, College of Medicine and Division of Biosciences, College of Dentistry, Ohio State University, Columbus, OH, USA
| | - Bethany J Wolf
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA
| | - Yeun-Mun Choo
- Department of Chemistry, University of Malaya, Kuala Lumpur, Malaysia
| | - Peter Croot
- Irish Centre for Research in Applied Geoscience, Earth and Ocean Sciences and Ryan Institute, School of Natural Sciences, University of Galway, Galway, Ireland
| | - Mark T Hamann
- Departments of Drug Discovery and Biomedical Sciences and Public Health, Colleges of Pharmacy and Medicine, Medical University of South Carolina, Charleston, SC, USA
| | - Gary Hardiman
- School of Biological Sciences and Institute for Global Security, Queen's University, Belfast, Northern Ireland, UK
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA
- Department of Medicine, Medical University of South Carolina, Charleston, SC, USA
| |
Collapse
|
19
|
Wang Y, Li Y, Chen X, Zhao L. HIV-1/HBV Coinfection Accurate Multitarget Prediction Using a Graph Neural Network-Based Ensemble Predicting Model. Int J Mol Sci 2023; 24:ijms24087139. [PMID: 37108305 PMCID: PMC10139236 DOI: 10.3390/ijms24087139] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 04/07/2023] [Accepted: 04/11/2023] [Indexed: 04/29/2023] Open
Abstract
HIV and HBV infection are both serious public health challenges. There are more than approximately 4 million patients coinfected with HIV and HBV worldwide, and approximately 5% to 15% of those infected with HIV are coinfected with HBV. Disease progression is more rapid in patients with coinfection, which significantly increases the likelihood of patients progressing from chronic hepatitis to cirrhosis, end-stage liver disease, and hepatocellular carcinoma. HIV treatment is complicated by drug interactions, antiretroviral (ARV) hepatotoxicity, and HBV-related immune reconditioning and inflammatory syndromes. Drug development is a highly costly and time-consuming procedure with traditional experimental methods. With the development of computer-aided drug design techniques, both machine learning and deep learning have been successfully used to facilitate rapid innovations in the virtual screening of candidate drugs. In this study, we proposed a graph neural network-based molecular feature extraction model by integrating one optimal supervised learner to replace the output layer of the GNN to accurately predict the potential multitargets of HIV-1/HBV coinfections. The experimental results strongly suggested that DMPNN + GBDT may greatly improve the accuracy of binary-target predictions and efficiently identify the potential multiple targets of HIV-1 and HBV simultaneously.
Collapse
Affiliation(s)
- Yishu Wang
- School of Mathematics and Statistics, University of Science and Technology Beijing, Beijing 100083, China
| | - Yue Li
- School of Mathematics and Statistics, University of Science and Technology Beijing, Beijing 100083, China
| | - Xiaomin Chen
- School of Mathematics and Statistics, University of Science and Technology Beijing, Beijing 100083, China
| | - Lutao Zhao
- School of Mathematics and Statistics, University of Science and Technology Beijing, Beijing 100083, China
- Center for Energy and Environmental Policy Research, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
20
|
Cavasotto CN, Di Filippo JI. The Impact of Supervised Learning Methods in Ultralarge High-Throughput Docking. J Chem Inf Model 2023; 63:2267-2280. [PMID: 37036491 DOI: 10.1021/acs.jcim.2c01471] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]
Abstract
Structure-based virtual screening methods are, nowadays, one of the key pillars of computational drug discovery. In recent years, a series of studies have reported docking-based virtual screening campaigns of large databases ranging from hundreds to thousands of millions compounds, further identifying novel hits after experimental validation. As these larg-scale efforts are not generally accessible, machine learning-based protocols have emerged to accelerate the identification of virtual hits within an ultralarge chemical space, reaching impressive reductions in computational time. Herein, we illustrate the motivation and the problem behind the screening of large databases, providing an overview of key concepts and essential applications of machine learning-accelerated protocols, specifically concerning supervised learning methods. We also discuss where the field stands with these novel developments, highlighting possible insights for future studies.
Collapse
Affiliation(s)
- Claudio N Cavasotto
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Facultad de Ciencias Biomédicas, and Facultad de Ingeniería, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Austral Institute for Applied Artificial Intelligence, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
| | - Juan I Di Filippo
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Facultad de Ciencias Biomédicas, and Facultad de Ingeniería, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Austral Institute for Applied Artificial Intelligence, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
| |
Collapse
|
21
|
Tran-Nguyen VK, Ballester PJ. Beware of Simple Methods for Structure-Based Virtual Screening: The Critical Importance of Broader Comparisons. J Chem Inf Model 2023; 63:1401-1405. [PMID: 36848585 PMCID: PMC10015451 DOI: 10.1021/acs.jcim.3c00218] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
We discuss how data unbiasing and simple methods such as protein-ligand Interaction FingerPrint (IFP) can overestimate virtual screening performance. We also show that IFP is strongly outperformed by target-specific machine-learning scoring functions, which were not considered in a recent report concluding that simple methods were better than machine-learning scoring functions at virtual screening.
Collapse
Affiliation(s)
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, U.K
| |
Collapse
|
22
|
Kwon Y, Park S, Lee J, Kang J, Lee HJ, Kim W. BEAR: A Novel Virtual Screening Method Based on Large-Scale Bioactivity Data. J Chem Inf Model 2023; 63:1429-1437. [PMID: 36821004 DOI: 10.1021/acs.jcim.2c01300] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Data-driven drug discovery exploits a comprehensive set of big data to provide an efficient path for the development of new drugs. Currently, publicly available bioassay data sets provide extensive information regarding the bioactivity profiles of millions of compounds. Using these large-scale drug screening data sets, we developed a novel in silico method to virtually screen hit compounds against protein targets, named BEAR (Bioactive compound Enrichment by Assay Repositioning). The underlying idea of BEAR is to reuse bioassay data for predicting hit compounds for targets other than their originally intended purposes, i.e., "assay repositioning". The BEAR approach differs from conventional virtual screening methods in that (1) it relies solely on bioactivity data and requires no physicochemical features of either the target or ligand. (2) Accordingly, structurally diverse candidates are predicted, allowing for scaffold hopping. (3) BEAR shows stable performance across diverse target classes, suggesting its general applicability. Large-scale cross-validation of more than a thousand targets showed that BEAR accurately predicted known ligands (median area under the curve = 0.87), proving that BEAR maintained a robust performance even in the validation set with additional constraints. In addition, a comparative analysis demonstrated that BEAR outperformed other machine learning models, including a recent deep learning model for ABC transporter family targets. We predicted P-gp and BCRP dual inhibitors using the BEAR approach and validated the predicted candidates using in vitro assays. The intracellular accumulation effects of mitoxantrone, a well-known P-gp/BCRP dual substrate for cancer treatment, confirmed nine out of 72 dual inhibitor candidates preselected by primary cytotoxicity screening. Consequently, these nine hits are novel and potent dual inhibitors for both P-gp and BCRP, solely predicted by bioactivity profiles without relying on any structural information of targets or ligands.
Collapse
Affiliation(s)
| | - Sera Park
- KaiPharm, Seoul 03760, Republic of Korea
| | - Jaeok Lee
- College of Pharmacy, Research Institute of Pharmaceutical Science, Ewha Womans University, Seoul 03760, Republic of Korea
| | - Jiyeon Kang
- College of Pharmacy and Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul 03760, Republic of Korea
| | - Hwa Jeong Lee
- College of Pharmacy and Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul 03760, Republic of Korea
| | - Wankyu Kim
- KaiPharm, Seoul 03760, Republic of Korea.,Department of Life Sciences, College of Natural Science, Ewha Womans University, Seoul 03760, Republic of Korea
| |
Collapse
|
23
|
Potlitz F, Link A, Schulig L. Advances in the discovery of new chemotypes through ultra-large library docking. Expert Opin Drug Discov 2023; 18:303-313. [PMID: 36714919 DOI: 10.1080/17460441.2023.2171984] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
INTRODUCTION The size and complexity of virtual screening libraries in drug discovery have skyrocketed in recent years, reaching up to multiple billions of accessible compounds. However, virtual screening of such ultra-large libraries poses several challenges associated with preparing the libraries, sampling, and pre-selection of suitable compounds. The utilization of artificial intelligence (AI)-assisted screening approaches, such as deep learning, poses a promising countermeasure to deal with this rapidly expanding chemical space. For example, various AI-driven methods were recently successfully used to identify novel small molecule inhibitors of the SARS-CoV-2 main protease (Mpro). AREAS COVERED This review focuses on presenting various kinds of virtual screening methods suitable for dealing with ultra-large libraries. Challenges associated with these computational methodologies are discussed, and recent advances are highlighted in the example of the discovery of novel Mpro inhibitors targeting the SARS-CoV-2 virus. EXPERT OPINION With the rapid expansion of the virtual chemical space, the methodologies for docking and screening such quantities of molecules need to keep pace. Employment of AI-driven screening compounds has already been shown to be effective in a range from a few thousand to multiple billion compounds, furthered by de novo generation of drug-like molecules without human interference.
Collapse
Affiliation(s)
- Felix Potlitz
- Department of Pharmaceutical and Medicinal Chemistry, Institute of Pharmacy, University of Greifswald, Germany
| | - Andreas Link
- Department of Pharmaceutical and Medicinal Chemistry, Institute of Pharmacy, University of Greifswald, Germany
| | - Lukas Schulig
- Department of Pharmaceutical and Medicinal Chemistry, Institute of Pharmacy, University of Greifswald, Germany
| |
Collapse
|
24
|
Machine Learning Scoring Functions for Drug Discovery from Experimental and Computer-Generated Protein-Ligand Structures: Towards Per-Target Scoring Functions. Molecules 2023; 28:molecules28041661. [PMID: 36838647 PMCID: PMC9966217 DOI: 10.3390/molecules28041661] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/05/2023] [Accepted: 02/06/2023] [Indexed: 02/12/2023] Open
Abstract
In recent years, machine learning has been proposed as a promising strategy to build accurate scoring functions for computational docking finalized to numerically empowered drug discovery. However, the latest studies have suggested that over-optimistic results had been reported due to the correlations present in the experimental databases used for training and testing. Here, we investigate the performance of an artificial neural network in binding affinity predictions, comparing results obtained using both experimental protein-ligand structures as well as larger sets of computer-generated structures created using commercial software. Interestingly, similar performances are obtained on both databases. We find a noticeable performance suppression when moving from random horizontal tests to vertical tests performed on target proteins not included in the training data. The possibility to train the network on relatively easily created computer-generated databases leads us to explore per-target scoring functions, trained and tested ad-hoc on complexes including only one target protein. Encouraging results are obtained, depending on the type of protein being addressed.
Collapse
|
25
|
Recent advances in predicting lncRNA-disease associations based on computational methods. Drug Discov Today 2023; 28:103432. [PMID: 36370992 DOI: 10.1016/j.drudis.2022.103432] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 10/19/2022] [Accepted: 11/03/2022] [Indexed: 11/11/2022]
Abstract
Mutations in and dysregulation of long non-coding RNAs (lncRNAs) are closely associated with the development of various human complex diseases, but only a few lncRNAs have been experimentally confirmed to be associated with human diseases. Predicting new potential lncRNA-disease associations (LDAs) will help us to understand the pathogenesis of human diseases and to detect disease markers, as well as in disease diagnosis, prevention and treatment. Computational methods can effectively narrow down the screening scope of biological experiments, thereby reducing the duration and cost of such experiments. In this review, we outline recent advances in computational methods for predicting LDAs, focusing on LDA databases, lncRNA/disease similarity calculations, and advanced computational models. In addition, we analyze the limitations of various computational models and discuss future challenges and directions for development.
Collapse
|
26
|
Wang Z, Zheng L, Wang S, Lin M, Wang Z, Kong AWK, Mu Y, Wei Y, Li W. A fully differentiable ligand pose optimization framework guided by deep learning and a traditional scoring function. Brief Bioinform 2023; 24:6887112. [PMID: 36502369 DOI: 10.1093/bib/bbac520] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 10/17/2022] [Accepted: 10/31/2022] [Indexed: 12/14/2022] Open
Abstract
The recently reported machine learning- or deep learning-based scoring functions (SFs) have shown exciting performance in predicting protein-ligand binding affinities with fruitful application prospects. However, the differentiation between highly similar ligand conformations, including the native binding pose (the global energy minimum state), remains challenging that could greatly enhance the docking. In this work, we propose a fully differentiable, end-to-end framework for ligand pose optimization based on a hybrid SF called DeepRMSD+Vina combined with a multi-layer perceptron (DeepRMSD) and the traditional AutoDock Vina SF. The DeepRMSD+Vina, which combines (1) the root mean square deviation (RMSD) of the docking pose with respect to the native pose and (2) the AutoDock Vina score, is fully differentiable; thus is capable of optimizing the ligand binding pose to the energy-lowest conformation. Evaluated by the CASF-2016 docking power dataset, the DeepRMSD+Vina reaches a success rate of 94.4%, which outperforms most reported SFs to date. We evaluated the ligand conformation optimization framework in practical molecular docking scenarios (redocking and cross-docking tasks), revealing the high potentialities of this framework in drug design and discovery. Structural analysis shows that this framework has the ability to identify key physical interactions in protein-ligand binding, such as hydrogen-bonding. Our work provides a paradigm for optimizing ligand conformations based on deep learning algorithms. The DeepRMSD+Vina model and the optimization framework are available at GitHub repository https://github.com/zchwang/DeepRMSD-Vina_Optimization.
Collapse
Affiliation(s)
- Zechen Wang
- School of Physics, Shandong University, Jinan, Shandong 250100, China
| | - Liangzhen Zheng
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,Shanghai Zelixir Biotech Company Ltd., Shanghai 200030, China
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., Shanghai 200030, China
| | - Mingzhi Lin
- Shanghai Zelixir Biotech Company Ltd., Shanghai 200030, China
| | - Zhihao Wang
- School of Physics, Shandong University, Jinan, Shandong 250100, China
| | - Adams Wai-Kin Kong
- Rolls-Royce Corporate Lab, Nanyang Technological University, Singapore 637551, Singapore
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
| | - Weifeng Li
- School of Physics, Shandong University, Jinan, Shandong 250100, China
| |
Collapse
|
27
|
Assessing How Residual Errors of Scoring Functions Correlate to Ligand Structural Features. Int J Mol Sci 2022; 23:ijms232315018. [PMID: 36499344 PMCID: PMC9739603 DOI: 10.3390/ijms232315018] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Revised: 11/08/2022] [Accepted: 11/10/2022] [Indexed: 12/02/2022] Open
Abstract
Scoring functions (SFs) are ubiquitous tools for early stage drug discovery. However, their accuracy currently remains quite moderate. Despite a number of successful target-specific SFs appearing recently, up until now, no ideas on how to systematically improve the general scope of SFs have been formulated. In this work, we hypothesized that the specific features of ligands, corresponding to interactions well appreciated by medicinal chemists (e.g., hydrogen bonds, hydrophobic and aromatic interactions), might be responsible, in part, for the remaining SF errors. The latter provides direction to efforts aimed at the rational and systematic improvement of SF accuracy. In this proof-of-concept work, we took a CASF-2016 coreset of 285 ligands as a basis for comparison and calculated the values of scores for a representative panel of SFs (including AutoDock 4.2, AutoDock Vina, X-Score, NNScore2.0, ΔVina RF20, and DSX). The residual error of linear correlation of each SF value, with the experimental values of affinity and activity, was then analyzed in terms of its correlation with the presence of the fragments responsible for certain medicinal chemistry defined interactions. We showed that, despite the fact that SFs generally perform reasonably, there is room for improvement in terms of better parameterization of interactions involving certain fragments in ligands. Thus, this approach opens a potential way for the systematic improvement of SFs without their significant complication. However, the straightforward application of the proposed approach is limited by the scarcity of reliable available data for ligand-receptor complexes, which is a common problem in the field.
Collapse
|
28
|
Boyles F, Deane CM, Morris GM. Learning from Docked Ligands: Ligand-Based Features Rescue Structure-Based Scoring Functions When Trained on Docked Poses. J Chem Inf Model 2022; 62:5329-5341. [PMID: 34469150 DOI: 10.1021/acs.jcim.1c00096] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes. We explore how the use of docked rather than crystallographic poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. We also present a new, freely available validation set─the Updated DUD-E Diverse Subset─for binding affinity prediction using data from DUD-E and ChEMBL. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function sometimes generalizes poorly to a protein target not represented in the training set, demonstrating the need for improved scoring functions and additional validation benchmarks.
Collapse
Affiliation(s)
- Fergus Boyles
- Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, United Kingdom
| | - Charlotte M Deane
- Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, United Kingdom
| | - Garrett M Morris
- Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, United Kingdom
| |
Collapse
|
29
|
Qu X, Dong L, Zhang J, Si Y, Wang B. Systematic Improvement of the Performance of Machine Learning Scoring Functions by Incorporating Features of Protein-Bound Water Molecules. J Chem Inf Model 2022; 62:4369-4379. [PMID: 36083808 DOI: 10.1021/acs.jcim.2c00916] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Water molecules at the ligand-protein interfaces play crucial roles in the binding of the ligands, but the behavior of protein-bound water is largely ignored in many currently used machine learning (ML)-based scoring functions (SFs). In an attempt to improve the prediction performance of existing ML-based SFs, we estimated the water distribution with a HydraMap (HM) method and then incorporated the features extracted from protein-bound waters obtained in this way into three ML-based SFs: RF-Score, ECIF, and PLEC. It was found that a combination of HM-based features can consistently improve the performance of all three SFs, including their scoring, ranking, and docking power. HydraMap-based features show consistently good performance with both crystal structures and docked structures, demonstrating their robustness for SFs. Overall, HM-based features, which are a statistical representation of hydration sites at protein-ligand interfaces, are expected to improve the prediction performance for diverse SFs.
Collapse
Affiliation(s)
- Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| | - Jinyan Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| | - Yubing Si
- College of Chemistry, Zhengzhou University, Zhengzhou 450001, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| |
Collapse
|
30
|
Shen C, Zhang X, Deng Y, Gao J, Wang D, Xu L, Pan P, Hou T, Kang Y. Boosting Protein-Ligand Binding Pose Prediction and Virtual Screening Based on Residue-Atom Distance Likelihood Potential and Graph Transformer. J Med Chem 2022; 65:10691-10706. [PMID: 35917397 DOI: 10.1021/acs.jmedchem.2c00991] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The past few years have witnessed enormous progress toward applying machine learning approaches to the development of protein-ligand scoring functions. However, the robust performance and wide applicability of scoring functions remain a big challenge for increasing the success rate of docking-based virtual screening. Herein, a novel scoring function named RTMScore was developed by introducing a tailored residue-based graph representation strategy and several graph transformer layers for the learning of protein and ligand representations, followed by a mixture density network to obtain residue-atom distance likelihood potential. Our approach was resolutely validated on the CASF-2016 benchmark, and the results indicate that RTMScore can outperform almost all of the other state-of-the-art methods in terms of both the docking and screening powers. Further evaluation confirms the robustness of our approach that can not only retain its docking power on cross-docked poses but also achieve improved performance as a rescoring tool in larger-scale virtual screening.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Junbo Gao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
31
|
McGibbon M, Money-Kyrle S, Blay V, Houston DR. SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation. J Adv Res 2022; 46:135-147. [PMID: 35901959 PMCID: PMC10105235 DOI: 10.1016/j.jare.2022.07.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Revised: 07/08/2022] [Accepted: 07/09/2022] [Indexed: 11/17/2022] Open
Abstract
INTRODUCTION The discovery of a new drug is a costly and lengthy endeavour. The computational prediction of which small molecules can bind to a protein target can accelerate this process if the predictions are fast and accurate enough. Recent machine-learning scoring functions re-evaluate the output of molecular docking to achieve more accurate predictions. However, previous scoring functions were trained on crystalised protein-ligand complexes and datasets of decoys. The limited availability of crystal structures and biases in the decoy datasets can lower the performance of scoring functions. OBJECTIVES To address key limitations of previous scoring functions and thus improve the predictive performance of structure-based virtual screening. METHODS A novel machine-learning scoring function was created, named SCORCH (Scoring COnsensus for RMSD-based Classification of Hits). To develop SCORCH, training data is augmented by considering multiple ligand poses and labelling poses based on their RMSD from the native pose. Decoy bias is addressed by generating property-matched decoys for each ligand and using the same methodology for preparing and docking decoys and ligands. A consensus of 3 different machine learning approaches is also used to improve performance. RESULTS We find that multi-pose augmentation in SCORCH improves its docking power and screening power on independent benchmark datasets. SCORCH outperforms an equivalent scoring function trained on single poses, with a 1% enrichment factor (EF) of 13.78 vs. 10.86 on 18 DEKOIS 2.0 targets and a mean native pose rank of 5.9 vs 30.4 on CSAR 2014. Additionally, SCORCH outperforms widely used scoring functions in virtual screening and pose prediction on independent benchmark datasets. CONCLUSION By rationally addressing key limitations of previous scoring functions, SCORCH improves the performance of virtual screening. SCORCH also provides an estimate of its uncertainty, which can help reduce the cost and time required for drug discovery.
Collapse
Affiliation(s)
- Miles McGibbon
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK
| | - Sam Money-Kyrle
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK
| | - Vincent Blay
- Department of Microbiology and Environmental Toxicology, University of California at Santa Cruz, Santa Cruz, CA 95064, USA; Institute for Integrative Systems Biology (I(2)SysBio), Universitat de València and Spanish Research Council (CSIC), 46980 Valencia, Spain.
| | - Douglas R Houston
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK.
| |
Collapse
|
32
|
Yang C, Chen EA, Zhang Y. Protein-Ligand Docking in the Machine-Learning Era. Molecules 2022; 27:4568. [PMID: 35889440 PMCID: PMC9323102 DOI: 10.3390/molecules27144568] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 07/14/2022] [Indexed: 11/16/2022] Open
Abstract
Molecular docking plays a significant role in early-stage drug discovery, from structure-based virtual screening (VS) to hit-to-lead optimization, and its capability and predictive power is critically dependent on the protein-ligand scoring function. In this review, we give a broad overview of recent scoring function development, as well as the docking-based applications in drug discovery. We outline the strategies and resources available for structure-based VS and discuss the assessment and development of classical and machine learning protein-ligand scoring functions. In particular, we highlight the recent progress of machine learning scoring function ranging from descriptor-based models to deep learning approaches. We also discuss the general workflow and docking protocols of structure-based VS, such as structure preparation, binding site detection, docking strategies, and post-docking filter/re-scoring, as well as a case study on the large-scale docking-based VS test on the LIT-PCBA data set.
Collapse
Affiliation(s)
- Chao Yang
- Department of Chemistry, New York University, New York, NY 10003, USA; (C.Y.); (E.A.C.)
| | - Eric Anthony Chen
- Department of Chemistry, New York University, New York, NY 10003, USA; (C.Y.); (E.A.C.)
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, NY 10003, USA; (C.Y.); (E.A.C.)
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
33
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
34
|
Tran-Nguyen VK, Simeon S, Junaid M, Ballester PJ. Structure-based virtual screening for PDL1 dimerizers: Evaluating generic scoring functions. Curr Res Struct Biol 2022; 4:206-210. [PMID: 35769111 PMCID: PMC9234010 DOI: 10.1016/j.crstbi.2022.06.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/14/2022] [Accepted: 06/02/2022] [Indexed: 10/31/2022] Open
Abstract
The interaction between PD1 and its ligand PDL1 has been shown to render tumor cells resistant to apoptosis and promote tumor progression. An innovative mechanism to inhibit the PD1/PDL1 interaction is PDL1 dimerization induced by small-molecule PDL1 binders. Structure-based virtual screening is a promising approach to discovering such small-molecule PD1/PDL1 inhibitors. Here we investigate which type of generic scoring functions is most suitable to tackle this problem. We consider CNN-Score, an ensemble of convolutional neural networks, as the representative of machine-learning scoring functions. We also evaluate Smina, a commonly used classical scoring function, and IFP, a top structural fingerprint similarity scoring function. These three types of scoring functions were evaluated on two test sets sharing the same set of small-molecule PD1/PDL1 inhibitors, but using different types of inactives: either true inactives (molecules with no in vitro PD1/PDL1 inhibition activity) or assumed inactives (property-matched decoy molecules generated from each active). On both test sets, CNN-Score performed much better than Smina, which in turn strongly outperformed IFP. The fact that the latter was the case, despite precluding any possibility of exploiting decoy bias, demonstrates the predictive value of CNN-Score for PDL1. These results suggest that re-scoring Smina-docked molecules with CNN-Score is a promising structure-based virtual screening method to discover new small-molecule inhibitors of this therapeutic target.
Collapse
Affiliation(s)
- Viet-Khoa Tran-Nguyen
- Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France
- CNRS, UMR7258, Marseille, F-13009, France
- Institut Paoli-Calmettes, Marseille, F-13009, France
- Aix-Marseille University, UM 105, F-13284, Marseille, France
| | - Saw Simeon
- Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France
- CNRS, UMR7258, Marseille, F-13009, France
- Institut Paoli-Calmettes, Marseille, F-13009, France
- Aix-Marseille University, UM 105, F-13284, Marseille, France
| | - Muhammad Junaid
- Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France
- CNRS, UMR7258, Marseille, F-13009, France
- Institut Paoli-Calmettes, Marseille, F-13009, France
- Aix-Marseille University, UM 105, F-13284, Marseille, France
| | - Pedro J. Ballester
- Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France
- CNRS, UMR7258, Marseille, F-13009, France
- Institut Paoli-Calmettes, Marseille, F-13009, France
- Aix-Marseille University, UM 105, F-13284, Marseille, France
| |
Collapse
|
35
|
Fujimoto KJ, Minami S, Yanai T. Machine-Learning- and Knowledge-Based Scoring Functions Incorporating Ligand and Protein Fingerprints. ACS OMEGA 2022; 7:19030-19039. [PMID: 35694525 PMCID: PMC9178954 DOI: 10.1021/acsomega.2c02822] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 05/12/2022] [Indexed: 06/15/2023]
Abstract
We propose a novel machine-learning-based scoring function for drug discovery that incorporates ligand and protein structural information into a knowledge-based PMF score. Molecular docking, a simulation method for structure-based drug design (SBDD), is expected to reduce the enormous costs associated with conventional experimental methods in terms of rational drug discovery. Molecular docking has two main purposes: to predict ligand-binding structures for target proteins and to predict protein-ligand binding affinity. Currently available programs of molecular docking offer an accurate prediction of ligand binding structures for many systems. However, the accurate prediction of binding affinity remains challenging. In this study, we developed a new scoring function that incorporates fingerprints representing ligand and protein structures as descriptors in the PMF score. Here, regression analysis of the scoring function was performed using the following machine learning techniques: least absolute shrinkage and selection operator (LASSO) and light gradient boosting machine (LightGBM). The results on a test data set showed that the binding affinity delivered by the newly developed scoring function has a Pearson correlation coefficient of 0.79 with the experimental value, which surpasses that of the conventional scoring functions. Further analysis provided a chemical understanding of the descriptors that contributed significantly to the improvement in prediction accuracy. Our approach and findings are useful for rational drug discovery.
Collapse
Affiliation(s)
- Kazuhiro J. Fujimoto
- Institute
of Transformative Bio-Molecules (WPI-ITbM), Nagoya University, Furocho, Chikusa, Nagoya 464-8601, Japan
- Department
of Chemistry, Graduate School of Science, Nagoya University, Furocho, Chikusa, Nagoya 464-8601, Japan
| | - Shota Minami
- Department
of Chemistry, Graduate School of Science, Nagoya University, Furocho, Chikusa, Nagoya 464-8601, Japan
| | - Takeshi Yanai
- Institute
of Transformative Bio-Molecules (WPI-ITbM), Nagoya University, Furocho, Chikusa, Nagoya 464-8601, Japan
- Department
of Chemistry, Graduate School of Science, Nagoya University, Furocho, Chikusa, Nagoya 464-8601, Japan
| |
Collapse
|
36
|
Zhang X, Shen C, Liao B, Jiang D, Wang J, Wu Z, Du H, Wang T, Huo W, Xu L, Cao D, Hsieh CY, Hou T. TocoDecoy: A New Approach to Design Unbiased Datasets for Training and Benchmarking Machine-Learning Scoring Functions. J Med Chem 2022; 65:7918-7932. [PMID: 35642777 DOI: 10.1021/acs.jmedchem.2c00460] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Development of accurate machine-learning-based scoring functions (MLSFs) for structure-based virtual screening against a given target requires a large unbiased dataset with structurally diverse actives and decoys. However, most datasets for the development of MLSFs were designed for traditional SFs and may suffer from hidden biases and data insufficiency. Hereby, we developed a new approach named Topology-based and Conformation-based decoys generation (TocoDecoy), which integrates two strategies to generate decoys by tweaking the actives for a specific target, to generate unbiased and expandable datasets for training and benchmarking MLSFs. For hidden bias evaluation, the performance of InteractionGraphNet (IGN) trained on the TocoDecoy, LIT-PCBA, and DUD-E-like datasets was assessed. The results illustrate that the IGN model trained on the TocoDecoy dataset is competitive with that trained on the LIT-PCBA dataset but remarkably outperforms that trained on the DUD-E dataset, suggesting that the decoys in TocoDecoy are unbiased for training and benchmarking MLSFs.
Collapse
Affiliation(s)
- Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, China.,Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Ben Liao
- Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China.,National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan 430072, Hubei, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tianyue Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Wenbo Huo
- Tsinghua AI Drug Discovery group, Research Institute of Tsinghua University in Shenzhen, Shenzhen 518057, Guangdong, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
37
|
Yang C, Zhang Y. Delta Machine Learning to Improve Scoring-Ranking-Screening Performances of Protein-Ligand Scoring Functions. J Chem Inf Model 2022; 62:2696-2712. [PMID: 35579568 DOI: 10.1021/acs.jcim.2c00485] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Protein-ligand scoring functions are widely used in structure-based drug design for fast evaluation of protein-ligand interactions, and it is of strong interest to develop scoring functions with machine-learning approaches. In this work, by expanding the training set, developing physically meaningful features, employing our recently developed linear empirical scoring function Lin_F9 (Yang, C. J. Chem. Inf. Model. 2021, 61, 4630-4644) as the baseline, and applying extreme gradient boosting (XGBoost) with Δ-machine learning, we have further improved the robustness and applicability of machine-learning scoring functions. Besides the top performances for scoring-ranking-screening power tests of the CASF-2016 benchmark, the new scoring function ΔLin_F9XGB also achieves superior scoring and ranking performances in different structure types that mimic real docking applications. The scoring powers of ΔLin_F9XGB for locally optimized poses, flexible redocked poses, and ensemble docked poses of the CASF-2016 core set achieve Pearson's correlation coefficient (R) values of 0.853, 0.839, and 0.813, respectively. In addition, the large-scale docking-based virtual screening test on the LIT-PCBA data set demonstrates the reliability and robustness of ΔLin_F9XGB in virtual screening application. The ΔLin_F9XGB scoring function and its code are freely available on the web at (https://yzhang.hpc.nyu.edu/Delta_LinF9_XGB).
Collapse
Affiliation(s)
- Chao Yang
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
38
|
Orhobor OI, Rehim AA, Lou H, Ni H, King RD. A simple spatial extension to the extended connectivity interaction features for binding affinity prediction. ROYAL SOCIETY OPEN SCIENCE 2022; 9:211745. [PMID: 35573039 PMCID: PMC9066299 DOI: 10.1098/rsos.211745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 04/13/2022] [Indexed: 05/03/2023]
Abstract
The representation of the protein-ligand complexes used in building machine learning models play an important role in the accuracy of binding affinity prediction. The Extended Connectivity Interaction Features (ECIF) is one such representation. We report that (i) including the discretized distances between protein-ligand atom pairs in the ECIF scheme improves predictive accuracy, and (ii) in an evaluation using gradient boosted trees, we found that the resampling method used in selecting the best hyperparameters has a strong effect on predictive performance, especially for benchmarking purposes.
Collapse
Affiliation(s)
| | - Abbi Abdel Rehim
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
| | - Hang Lou
- Department of Mathematics, University College London, London, UK
| | - Hao Ni
- Department of Mathematics, University College London, London, UK
- The Alan Turing Institute, London, UK
| | - Ross D. King
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
- Department of Biology and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden
- The Alan Turing Institute, London, UK
| |
Collapse
|
39
|
Zhou Y, Jiang Y, Chen SJ. RNA-ligand molecular docking: advances and challenges. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL MOLECULAR SCIENCE 2022; 12:e1571. [PMID: 37293430 PMCID: PMC10250017 DOI: 10.1002/wcms.1571] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 07/20/2021] [Indexed: 12/16/2022]
Abstract
With rapid advances in computer algorithms and hardware, fast and accurate virtual screening has led to a drastic acceleration in selecting potent small molecules as drug candidates. Computational modeling of RNA-small molecule interactions has become an indispensable tool for RNA-targeted drug discovery. The current models for RNA-ligand binding have mainly focused on the docking-and-scoring method. Accurate docking and scoring should tackle four crucial problems: (1) conformational flexibility of ligand, (2) conformational flexibility of RNA, (3) efficient sampling of binding sites and binding poses, and (4) accurate scoring of different binding modes. Moreover, compared with the problem of protein-ligand docking, predicting ligand binding to RNA, a negatively charged polymer, is further complicated by additional effects such as metal ion effects. Thermodynamic models based on physics-based and knowledge-based scoring functions have shown highly encouraging success in predicting ligand binding poses and binding affinities. Recently, kinetic models for ligand binding have further suggested that including dissociation kinetics (residence time) in ligand docking would result in improved performance in estimating in vivo drug efficacy. More recently, the rise of deep-learning approaches has led to new tools for predicting RNA-small molecule binding. In this review, we present an overview of the recently developed computational methods for RNA-ligand docking and their advantages and disadvantages.
Collapse
Affiliation(s)
- Yuanzhe Zhou
- Department of Physics and Astronomy, Department of Biochemistry, Institute of Data Sciences and Informatics, University of Missouri, Columbia, MO 65211-7010, USA
| | - Yangwei Jiang
- Department of Physics and Astronomy, Department of Biochemistry, Institute of Data Sciences and Informatics, University of Missouri, Columbia, MO 65211-7010, USA
| | - Shi-Jie Chen
- Department of Physics and Astronomy, Department of Biochemistry, Institute of Data Sciences and Informatics, University of Missouri, Columbia, MO 65211-7010, USA
| |
Collapse
|
40
|
Kumar GS, Moustafa M, Sahoo AK, Malý P, Bharadwaj S. Computational Investigations on the Natural Small Molecule as an Inhibitor of Programmed Death Ligand 1 for Cancer Immunotherapy. Life (Basel) 2022; 12:659. [PMID: 35629327 PMCID: PMC9145275 DOI: 10.3390/life12050659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 04/22/2022] [Accepted: 04/27/2022] [Indexed: 11/24/2022] Open
Abstract
Several therapeutic monoclonal antibodies approved by the FDA are available against the PD-1/PD-L1 (programmed death 1/programmed death ligand 1) immune checkpoint axis, which has been an unprecedented success in cancer treatment. However, existing therapeutics against PD-L1, including small molecule inhibitors, have certain drawbacks such as high cost and drug resistance that challenge the currently available anti-PD-L1 therapy. Therefore, this study presents the screening of 32,552 compounds from the Natural Product Atlas database against PD-L1, including three steps of structure-based virtual screening followed by binding free energy to refine the ideal conformation of potent PD-L1 inhibitors. Subsequently, five natural compounds, i.e., Neoenactin B1, Actinofuranone I, Cosmosporin, Ganocapenoid A, and 3-[3-hydroxy-4-(3-methylbut-2-enyl)phenyl]-5-(4-hydroxybenzyl)-4-methyldihydrofuran-2(3H)-one, were collected based on the ADMET (absorption, distribution, metabolism, excretion, and toxicity) profiling and binding free energy (>−60 kcal/mol) for further computational investigation in comparison to co-crystallized ligand, i.e., JQT inhibitor. Based on interaction mapping, explicit 100 ns molecular dynamics simulation, and end-point binding free energy calculations, the selected natural compounds were marked for substantial stability with PD-L1 via intermolecular interactions (hydrogen and hydrophobic) with essential residues in comparison to the JQT inhibitor. Collectively, the calculated results advocate the selected natural compounds as the putative potent inhibitors of PD-L1 and, therefore, can be considered for further development of PD-L1 immune checkpoint inhibitors in cancer immunotherapy.
Collapse
Affiliation(s)
- Geethu S Kumar
- Department of Life Science, School of Basic Science and Research, Sharda University, Greater Noida 201310, Uttar Pradesh, India;
- Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida 201308, Uttar Pradesh, India
| | - Mahmoud Moustafa
- Department of Biology, Faculty of Science, King Khalid University, Abha 62529, Saudi Arabia;
- Department of Botany and Microbiology, Faculty of Science, South Valley University, Qena 83523, Egypt
| | - Amaresh Kumar Sahoo
- Department of Applied Sciences, Indian Institute of Information Technology Allahabad, Allahabad 211015, Uttar Pradesh, India
| | - Petr Malý
- Laboratory of Ligand Engineering, Institute of Biotechnology of the Czech Academy of Sciences, v.v.i., BIOCEV Research Center, 25250 Vestec, Czech Republic
| | - Shiv Bharadwaj
- Laboratory of Ligand Engineering, Institute of Biotechnology of the Czech Academy of Sciences, v.v.i., BIOCEV Research Center, 25250 Vestec, Czech Republic
| |
Collapse
|
41
|
Shulga DA, Ivanov NN, Palyulin VA. In Silico Structure-Based Approach for Group Efficiency Estimation in Fragment-Based Drug Design Using Evaluation of Fragment Contributions. Molecules 2022; 27:1985. [PMID: 35335347 PMCID: PMC8951103 DOI: 10.3390/molecules27061985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 03/10/2022] [Accepted: 03/15/2022] [Indexed: 12/10/2022] Open
Abstract
The notion of a contribution of a specific group in an organic molecule's property and/or activity is both common in our thinking and is still not strictly correct due to the inherent non-additivity of free energy with respect to molecular fragments composing a molecule. The fragment- based drug discovery (FBDD) approach has proven to be fruitful in addressing the above notions. The main difficulty of the FBDD, however, is in its reliance on the low throughput and expensive experimental means of determining the fragment-sized molecules binding. In this article we propose a way to enhance the throughput and availability of the FBDD methods by judiciously using an in silico means of assessing the contribution to ligand-receptor binding energy of fragments of a molecule under question using a previously developed in silico Reverse Fragment Based Drug Discovery (R-FBDD) approach. It has been shown that the proposed structure-based drug discovery (SBDD) type of approach fills in the vacant niche among the existing in silico approaches, which mainly stem from the ligand-based drug discovery (LBDD) counterparts. In order to illustrate the applicability of the approach, our work retrospectively repeats the findings of the use case of an FBDD hit-to-lead project devoted to the experimentally based determination of additive group efficiency (GE)-an analog of ligand efficiency (LE) for a group in the molecule-using the Free-Wilson (FW) decomposition. It is shown that in using our in silico approach to evaluate fragment contributions of a ligand and to estimate GE one can arrive at similar decisions as those made using the experimentally determined activity-based FW decomposition. It is also shown that the approach is rather robust to the choice of the scoring function, provided the latter demonstrates a decent scoring power. We argue that the proposed approach of in silico assessment of GE has a wider applicability domain and expect that it will be widely applicable to enhance the net throughput of drug discovery based on the FBDD paradigm.
Collapse
Affiliation(s)
- Dmitry A. Shulga
- Department of Chemistry, Lomonosov Moscow State University, 119991 Moscow, Russia;
| | | | - Vladimir A. Palyulin
- Department of Chemistry, Lomonosov Moscow State University, 119991 Moscow, Russia;
| |
Collapse
|
42
|
Choudhury C, Arul Murugan N, Deva Priyakumar U. Structure-based drug repurposing: traditional and advanced AI/ML-aided methods. Drug Discov Today 2022; 27:1847-1861. [PMID: 35301148 PMCID: PMC8920090 DOI: 10.1016/j.drudis.2022.03.006] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 02/16/2022] [Accepted: 03/10/2022] [Indexed: 02/08/2023]
Abstract
The current global health emergency in the form of the Coronavirus 2019 (COVID-19) pandemic has highlighted the need for fast, accurate, and efficient drug discovery pipelines. Traditional drug discovery projects relying on in vitro high-throughput screening (HTS) involve large investments and sophisticated experimental set-ups, affordable only to big biopharmaceutical companies. In this scenario, application of efficient state-of-the-art computational methods and modern artificial intelligence (AI)-based algorithms for rapid screening of repurposable chemical space [approved drugs and natural products (NPs) with proven pharmacokinetic profiles] to identify the initial leads is a powerful option to save resources and time. Structure-based drug repurposing is a popular in silico repurposing approach. In this review, we discuss traditional and modern AI-based computational methods and tools applied at various stages for structure-based drug discovery (SBDD) pipelines. Additionally, we highlight the role of generative models in generating molecules with scaffolds from repurposable chemical space. Teaser: This review highlights the importance of repurposable chemical space, and the contributions of conventional in silico approaches and modern machine-learning algorithms for rapid structure-based drug repurposing.
Collapse
Affiliation(s)
- Chinmayee Choudhury
- Department of Experimental Medicine and Biotechnology, Postgraduate Institute of Medical Education and Research, Sector-12, Chandigarh 160012, India
| | - N Arul Murugan
- Department of Computer Science, School of Electrical Engineering and Computer Sciences, KTH Royal Institute of Technology, S-100 44, Stockholm, Sweden; Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi 110020, India.
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| |
Collapse
|
43
|
Gorostiola González M, Janssen APA, IJzerman AP, Heitman LH, van Westen GJP. Oncological drug discovery: AI meets structure-based computational research. Drug Discov Today 2022; 27:1661-1670. [PMID: 35301149 DOI: 10.1016/j.drudis.2022.03.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 01/22/2022] [Accepted: 03/09/2022] [Indexed: 02/08/2023]
Abstract
The integration of machine learning and structure-based methods has proven valuable in the past as a way to prioritize targets and compounds in early drug discovery. In oncological research, these methods can be highly beneficial in addressing the diversity of neoplastic diseases portrayed by the different hallmarks of cancer. Here, we review six use case scenarios for integrated computational methods, namely driver prediction, computational mutagenesis, (off)-target prediction, binding site prediction, virtual screening, and allosteric modulation analysis. We address the heterogeneity of integration approaches and individual methods, while acknowledging their current limitations and highlighting their potential to bring drugs for personalized oncological therapies to the market faster.
Collapse
Affiliation(s)
- Marina Gorostiola González
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, The Netherlands; Oncode Institute, Utrecht, The Netherlands
| | - Antonius P A Janssen
- Oncode Institute, Utrecht, The Netherlands; Molecular Physiology, Leiden Institute of Chemistry, Leiden University, The Netherlands
| | - Adriaan P IJzerman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, The Netherlands
| | - Laura H Heitman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, The Netherlands; Oncode Institute, Utrecht, The Netherlands
| | - Gerard J P van Westen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, The Netherlands.
| |
Collapse
|
44
|
Staszak M, Staszak K, Wieszczycka K, Bajek A, Roszkowski K, Tylkowski B. Machine learning in drug design: Use of artificial intelligence to explore the chemical structure–biological activity relationship. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1568] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Maciej Staszak
- Institute of Technology and Chemical Engineering Poznan University of Technology Poznan Poland
| | - Katarzyna Staszak
- Institute of Technology and Chemical Engineering Poznan University of Technology Poznan Poland
| | - Karolina Wieszczycka
- Institute of Technology and Chemical Engineering Poznan University of Technology Poznan Poland
| | - Anna Bajek
- Department of Tissue Engineering Collegium Medicum, Nicolaus Copernicus University Bydgoszcz Poland
| | - Krzysztof Roszkowski
- Department of Oncology Collegium Medicum Nicolaus Copernicus University Bydgoszcz Poland
| | - Bartosz Tylkowski
- Department of Chemical Engineering University Rovira i Virgili Tarragona Spain
- Eurecat, Centre Tecnològic de Catalunya Chemical Technologies Unit Tarragona Spain
| |
Collapse
|
45
|
Murugan NA, Podobas A, Gadioli D, Vitali E, Palermo G, Markidis S. A Review on Parallel Virtual Screening Softwares for High-Performance Computers. Pharmaceuticals (Basel) 2022; 15:63. [PMID: 35056120 PMCID: PMC8780228 DOI: 10.3390/ph15010063] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Revised: 12/19/2021] [Accepted: 12/28/2021] [Indexed: 02/01/2023] Open
Abstract
Drug discovery is the most expensive, time-demanding, and challenging project in biopharmaceutical companies which aims at the identification and optimization of lead compounds from large-sized chemical libraries. The lead compounds should have high-affinity binding and specificity for a target associated with a disease, and, in addition, they should have favorable pharmacodynamic and pharmacokinetic properties (grouped as ADMET properties). Overall, drug discovery is a multivariable optimization and can be carried out in supercomputers using a reliable scoring function which is a measure of binding affinity or inhibition potential of the drug-like compound. The major problem is that the number of compounds in the chemical spaces is huge, making the computational drug discovery very demanding. However, it is cheaper and less time-consuming when compared to experimental high-throughput screening. As the problem is to find the most stable (global) minima for numerous protein-ligand complexes (on the order of 106 to 1012), the parallel implementation of in silico virtual screening can be exploited to ensure drug discovery in affordable time. In this review, we discuss such implementations of parallelization algorithms in virtual screening programs. The nature of different scoring functions and search algorithms are discussed, together with a performance analysis of several docking softwares ported on high-performance computing architectures.
Collapse
Affiliation(s)
- Natarajan Arul Murugan
- Department of Computer Science, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, SE-10044 Stockholm, Sweden;
| | - Artur Podobas
- Department of Computer Science, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, SE-10044 Stockholm, Sweden;
| | - Davide Gadioli
- Dipartimento di Elettronica, Infomazione e Bioingegneria, Politecnico di Milano, 20133 Milano, Italy; (D.G.); (E.V.); (G.P.)
| | - Emanuele Vitali
- Dipartimento di Elettronica, Infomazione e Bioingegneria, Politecnico di Milano, 20133 Milano, Italy; (D.G.); (E.V.); (G.P.)
| | - Gianluca Palermo
- Dipartimento di Elettronica, Infomazione e Bioingegneria, Politecnico di Milano, 20133 Milano, Italy; (D.G.); (E.V.); (G.P.)
| | - Stefano Markidis
- Department of Computer Science, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, SE-10044 Stockholm, Sweden;
| |
Collapse
|
46
|
Wang DD, Chan MT, Yan H. Structure-based protein-ligand interaction fingerprints for binding affinity prediction. Comput Struct Biotechnol J 2021; 19:6291-6300. [PMID: 34900139 PMCID: PMC8637032 DOI: 10.1016/j.csbj.2021.11.018] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 11/09/2021] [Accepted: 11/13/2021] [Indexed: 11/17/2022] Open
Abstract
Binding affinity prediction (BAP) using protein–ligand complex structures is crucial to computer-aided drug design, but remains a challenging problem. To achieve efficient and accurate BAP, machine-learning scoring functions (SFs) based on a wide range of descriptors have been developed. Among those descriptors, protein–ligand interaction fingerprints (IFPs) are competitive due to their simple representations, elaborate profiles of key interactions and easy collaborations with machine-learning algorithms. In this paper, we have adopted a building-block-based taxonomy to review a broad range of IFP models, and compared representative IFP-based SFs in target-specific and generic scoring tasks. Atom-pair-counts-based and substructure-based IFPs show great potential in these tasks.
Collapse
Affiliation(s)
- Debby D Wang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, 516 Jungong Rd, Shanghai 200093, China
| | - Moon-Tong Chan
- School of Science and Technology, Hong Kong Metropolitan University, 30 Good Shepherd St, Ho Man Tin, Hong Kong
| | - Hong Yan
- Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
| |
Collapse
|
47
|
Abstract
Virtual screening-predicting which compounds within a specified compound library bind to a target molecule, typically a protein-is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy. Furthermore, even theoretically rigorous, computationally intensive methods may fail to account for important effects relevant to whether a given compound will ultimately be usable as a drug. Here we investigate the virtual screening performance of the recently released Gnina molecular docking software, which uses deep convolutional networks to score protein-ligand structures. We find, on average, that Gnina outperforms conventional empirical scoring. The default scoring in Gnina outperforms the empirical AutoDock Vina scoring function on 89 of the 117 targets of the DUD-E and LIT-PCBA virtual screening benchmarks with a median 1% early enrichment factor that is more than twice that of Vina. However, we also find that issues of bias linger in these sets, even when not used directly to train models, and this bias obfuscates to what extent machine learning models are achieving their performance through a sophisticated interpretation of molecular interactions versus fitting to non-informative simplistic property distributions.
Collapse
Affiliation(s)
| | - David Ryan Koes
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, USA;
| |
Collapse
|
48
|
Li H, Lu G, Sze KH, Su X, Chan WY, Leung KS. Machine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmark. Brief Bioinform 2021; 22:bbab225. [PMID: 34169324 PMCID: PMC8575004 DOI: 10.1093/bib/bbab225] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 04/27/2021] [Accepted: 05/23/2021] [Indexed: 11/12/2022] Open
Abstract
The superior performance of machine-learning scoring functions for docking has caused a series of debates on whether it is due to learning knowledge from training data that are similar in some sense to the test data. With a systematically revised methodology and a blind benchmark realistically mimicking the process of prospective prediction of binding affinity, we have evaluated three broadly used classical scoring functions and five machine-learning counterparts calibrated with both random forest and extreme gradient boosting using both solo and hybrid features, showing for the first time that machine-learning scoring functions trained exclusively on a proportion of as low as 8% complexes dissimilar to the test set already outperform classical scoring functions, a percentage that is far lower than what has been recently reported on all the three CASF benchmarks. The performance of machine-learning scoring functions is underestimated due to the absence of similar samples in some artificially created training sets that discard the full spectrum of complexes to be found in a prospective environment. Given the inevitability of any degree of similarity contained in a large dataset, the criteria for scoring function selection depend on which one can make the best use of all available materials. Software code and data are provided at https://github.com/cusdulab/MLSF for interested readers to rapidly rebuild the scoring functions and reproduce our results, even to make extended analyses on their own benchmarks.
Collapse
Affiliation(s)
| | - Gang Lu
- School of Biomedical Sciences, Chinese University of Hong Kong, Hong Kong
| | - Kam-Heung Sze
- Bioinformatics Unit, Hong Kong Medical Technology Institute, Hong Kong
| | - Xianwei Su
- Chinese University of Hong Kong, Hong Kong
| | - Wai-Yee Chan
- CUHK-SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences, Chinese University of Hong Kong, Hong Kong
| | - Kwong-Sak Leung
- Computer Science and Engineering in the Chinese University of Hong Kong, Hong Kong
| |
Collapse
|
49
|
Shen C, Hu X, Gao J, Zhang X, Zhong H, Wang Z, Xu L, Kang Y, Cao D, Hou T. The impact of cross-docked poses on performance of machine learning classifier for protein-ligand binding pose prediction. J Cheminform 2021; 13:81. [PMID: 34656169 PMCID: PMC8520186 DOI: 10.1186/s13321-021-00560-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 10/05/2021] [Indexed: 02/06/2023] Open
Abstract
Structure-based drug design depends on the detailed knowledge of the three-dimensional (3D) structures of protein-ligand binding complexes, but accurate prediction of ligand-binding poses is still a major challenge for molecular docking due to deficiency of scoring functions (SFs) and ignorance of protein flexibility upon ligand binding. In this study, based on a cross-docking dataset dedicatedly constructed from the PDBbind database, we developed several XGBoost-trained classifiers to discriminate the near-native binding poses from decoys, and systematically assessed their performance with/without the involvement of the cross-docked poses in the training/test sets. The calculation results illustrate that using Extended Connectivity Interaction Features (ECIF), Vina energy terms and docking pose ranks as the features can achieve the best performance, according to the validation through the random splitting or refined-core splitting and the testing on the re-docked or cross-docked poses. Besides, it is found that, despite the significant decrease of the performance for the threefold clustered cross-validation, the inclusion of the Vina energy terms can effectively ensure the lower limit of the performance of the models and thus improve their generalization capability. Furthermore, our calculation results also highlight the importance of the incorporation of the cross-docked poses into the training of the SFs with wide application domain and high robustness for binding pose prediction. The source code and the newly-developed cross-docking datasets can be freely available at https://github.com/sc8668/ml_pose_prediction and https://zenodo.org/record/5525936 , respectively, under an open-source license. We believe that our study may provide valuable guidance for the development and assessment of new machine learning-based SFs (MLSFs) for the predictions of protein-ligand binding poses.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Xueping Hu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Junbo Gao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Haiyang Zhong
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan, 410013, People's Republic of China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China. .,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.
| |
Collapse
|
50
|
Ataeinia B, Heidari P. Artificial Intelligence and the Future of Diagnostic and Therapeutic Radiopharmaceutical Development:: In Silico Smart Molecular Design. PET Clin 2021; 16:513-523. [PMID: 34364818 PMCID: PMC8453048 DOI: 10.1016/j.cpet.2021.06.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Novel diagnostic and therapeutic radiopharmaceuticals are increasingly becoming a central part of personalized medicine. Continued innovation in the development of new radiopharmaceuticals is key to sustained growth and advancement of precision medicine. Artificial intelligence has been used in multiple fields of medicine to develop and validate better tools for patient diagnosis and therapy, including in radiopharmaceutical design. In this review, we first discuss common in silico approaches and focus on their usefulness and challenges in radiopharmaceutical development. Next, we discuss the practical applications of in silico modeling in design of radiopharmaceuticals in various diseases.
Collapse
Affiliation(s)
- Bahar Ataeinia
- Department of Radiology, Massachusetts General Hospital, 55 Fruit St, Wht 427, Boston, MA 02114, USA
| | - Pedram Heidari
- Department of Radiology, Massachusetts General Hospital, 55 Fruit St, Wht 427, Boston, MA 02114, USA.
| |
Collapse
|