1
|
de Azevedo WF, Quiroga R, Villarreal MA, da Silveira NJF, Bitencourt-Ferreira G, da Silva AD, Veit-Acosta M, Oliveira PR, Tutone M, Biziukova N, Poroikov V, Tarasova O, Baud S. SAnDReS 2.0: Development of machine-learning models to explore the scoring function space. J Comput Chem 2024; 45:2333-2346. [PMID: 38900052 DOI: 10.1002/jcc.27449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 05/04/2024] [Accepted: 06/02/2024] [Indexed: 06/21/2024]
Abstract
Classical scoring functions may exhibit low accuracy in determining ligand binding affinity for proteins. The availability of both protein-ligand structures and affinity data make it possible to develop machine-learning models focused on specific protein systems with superior predictive performance. Here, we report a new methodology named SAnDReS that combines AutoDock Vina 1.2 with 54 regression methods available in Scikit-Learn to calculate binding affinity based on protein-ligand structures. This approach allows exploration of the scoring function space. SAnDReS generates machine-learning models based on crystal, docked, and AlphaFold-generated structures. As a proof of concept, we examine the performance of SAnDReS-generated models in three case studies. For all three cases, our models outperformed classical scoring functions. Also, SAnDReS-generated models showed predictive performance close to or better than other machine-learning models such as KDEEP, CSM-lig, and ΔVinaRF20. SAnDReS 2.0 is available to download at https://github.com/azevedolab/sandres.
Collapse
Affiliation(s)
| | - Rodrigo Quiroga
- Instituto de Investigaciones en Fisicoquímica de Córdoba (INFIQC), CONICET-Departamento de Química Teórica y Computacional, Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Ciudad Universitaria, Córdoba, Argentina
| | - Marcos Ariel Villarreal
- Instituto de Investigaciones en Fisicoquímica de Córdoba (INFIQC), CONICET-Departamento de Química Teórica y Computacional, Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Ciudad Universitaria, Córdoba, Argentina
| | | | | | - Amauri Duarte da Silva
- Programa de Pós-Graduação em Tecnologias da Informação e Gestão em Saúde, Universidade Federal de Ciências da Saúde de Porto Alegre, Porto Alegre, Brazil
| | | | | | - Marco Tutone
- Dipartimento di Scienze e Tecnologie Biologiche Chimiche e Farmaceutiche (STEBICEF), Università di Palermo, Palermo, Italy
| | | | | | | | - Stéphaine Baud
- Laboratoire SiRMa, UMR CNRS/URCA 7369, UFR Sciences Exactes et Naturelles, Université de Reims Champagne-Ardenne, CNRS, MEDYC, Reims, France
| |
Collapse
|
2
|
Kumar N, Acharya V. Advances in machine intelligence-driven virtual screening approaches for big-data. Med Res Rev 2024; 44:939-974. [PMID: 38129992 DOI: 10.1002/med.21995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 07/15/2023] [Accepted: 10/29/2023] [Indexed: 12/23/2023]
Abstract
Virtual screening (VS) is an integral and ever-evolving domain of drug discovery framework. The VS is traditionally classified into ligand-based (LB) and structure-based (SB) approaches. Machine intelligence or artificial intelligence has wide applications in the drug discovery domain to reduce time and resource consumption. In combination with machine intelligence algorithms, VS has emerged into revolutionarily progressive technology that learns within robust decision orders for data curation and hit molecule screening from large VS libraries in minutes or hours. The exponential growth of chemical and biological data has evolved as "big-data" in the public domain demands modern and advanced machine intelligence-driven VS approaches to screen hit molecules from ultra-large VS libraries. VS has evolved from an individual approach (LB and SB) to integrated LB and SB techniques to explore various ligand and target protein aspects for the enhanced rate of appropriate hit molecule prediction. Current trends demand advanced and intelligent solutions to handle enormous data in drug discovery domain for screening and optimizing hits or lead with fewer or no false positive hits. Following the big-data drift and tremendous growth in computational architecture, we presented this review. Here, the article categorized and emphasized individual VS techniques, detailed literature presented for machine learning implementation, modern machine intelligence approaches, and limitations and deliberated the future prospects.
Collapse
Affiliation(s)
- Neeraj Kumar
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| | - Vishal Acharya
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| |
Collapse
|
3
|
Smith MD, Darryl Quarles L, Demerdash O, Smith JC. Drugging the entire human proteome: Are we there yet? Drug Discov Today 2024; 29:103891. [PMID: 38246414 DOI: 10.1016/j.drudis.2024.103891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/12/2024] [Accepted: 01/16/2024] [Indexed: 01/23/2024]
Abstract
Each of the ∼20,000 proteins in the human proteome is a potential target for compounds that bind to it and modify its function. The 3D structures of most of these proteins are now available. Here, we discuss the prospects for using these structures to perform proteome-wide virtual HTS (VHTS). We compare physics-based (docking) and AI VHTS approaches, some of which are now being applied with large databases of compounds to thousands of targets. Although preliminary proteome-wide screens are now within our grasp, further methodological developments are expected to improve the accuracy of the results.
Collapse
Affiliation(s)
- Micholas Dean Smith
- University of Tennessee/Oak Ridge National Laboratory Center for Molecular Biophysics, Oak Ridge, TN 37830, USA; Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA
| | - L Darryl Quarles
- Departments of Medicine, University of Tennessee Health Science Center, Memphis, TN 38163, USA; ORRxD LLC, 3404 Olney Drive, Durham, NC 27705, USA
| | - Omar Demerdash
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
| | - Jeremy C Smith
- University of Tennessee/Oak Ridge National Laboratory Center for Molecular Biophysics, Oak Ridge, TN 37830, USA; Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA.
| |
Collapse
|
4
|
Bitencourt-Ferreira G, Villarreal MA, Quiroga R, Biziukova N, Poroikov V, Tarasova O, de Azevedo Junior WF. Exploring Scoring Function Space: Developing Computational Models for Drug Discovery. Curr Med Chem 2024; 31:2361-2377. [PMID: 36944627 DOI: 10.2174/0929867330666230321103731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 12/15/2022] [Accepted: 12/29/2022] [Indexed: 03/23/2023]
Abstract
BACKGROUND The idea of scoring function space established a systems-level approach to address the development of models to predict the affinity of drug molecules by those interested in drug discovery. OBJECTIVE Our goal here is to review the concept of scoring function space and how to explore it to develop machine learning models to address protein-ligand binding affinity. METHODS We searched the articles available in PubMed related to the scoring function space. We also utilized crystallographic structures found in the protein data bank (PDB) to represent the protein space. RESULTS The application of systems-level approaches to address receptor-drug interactions allows us to have a holistic view of the process of drug discovery. The scoring function space adds flexibility to the process since it makes it possible to see drug discovery as a relationship involving mathematical spaces. CONCLUSION The application of the concept of scoring function space has provided us with an integrated view of drug discovery methods. This concept is useful during drug discovery, where we see the process as a computational search of the scoring function space to find an adequate model to predict receptor-drug binding affinity.
Collapse
Affiliation(s)
| | - Marcos A Villarreal
- CONICET-Departamento de Matemática y Física, Instituto de Investigaciones en Fisicoquímica de Córdoba (INFIQC), Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Ciudad Universitaria, Córdoba, Argentina
| | - Rodrigo Quiroga
- CONICET-Departamento de Matemática y Física, Instituto de Investigaciones en Fisicoquímica de Córdoba (INFIQC), Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Ciudad Universitaria, Córdoba, Argentina
| | - Nadezhda Biziukova
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow, 119121, Russia
| | - Vladimir Poroikov
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow, 119121, Russia
| | - Olga Tarasova
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow, 119121, Russia
| | - Walter F de Azevedo Junior
- Pontifical Catholic University of Rio Grande do Sul - PUCRS, Porto Alegre-RS, Brazil
- Specialization Program in Bioinformatics, The Pontifical Catholic University of Rio Grande do Sul (PUCRS), Av. Ipiranga, 6681 Porto Alegre / RS 90619-900, Brazil
| |
Collapse
|
5
|
Jin Z, Wei Z. Molecular simulation for food protein-ligand interactions: A comprehensive review on principles, current applications, and emerging trends. Compr Rev Food Sci Food Saf 2024; 23:e13280. [PMID: 38284571 DOI: 10.1111/1541-4337.13280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 11/19/2023] [Accepted: 11/22/2023] [Indexed: 01/30/2024]
Abstract
In recent years, investigations on molecular interaction mechanisms between food proteins and ligands have attracted much interest. The interaction mechanisms can supply much useful information for many fields in the food industry, including nutrient delivery, food processing, auxiliary detection, and others. Molecular simulation has offered extraordinary insights into the interaction mechanisms. It can reflect binding conformation, interaction forces, binding affinity, key residues, and other information that physicochemical experiments cannot reveal in a fast and detailed manner. The simulation results have proven to be consistent with the results of physicochemical experiments. Molecular simulation holds great potential for future applications in the field of food protein-ligand interactions. This review elaborates on the principles of molecular docking and molecular dynamics simulation. Besides, their applications in food protein-ligand interactions are summarized. Furthermore, challenges, perspectives, and trends in molecular simulation of food protein-ligand interactions are proposed. Based on the results of molecular simulation, the mechanisms of interfacial behavior, enzyme-substrate binding, and structural changes during food processing can be reflected, and strategies for hazardous substance detection and food flavor adjustment can be generated. Moreover, molecular simulation can accelerate food development and reduce animal experiments. However, there are still several challenges to applying molecular simulation to food protein-ligand interaction research. The future trends will be a combination of international cooperation and data sharing, quantum mechanics/molecular mechanics, advanced computational techniques, and machine learning, which contribute to promoting food protein-ligand interaction simulation. Overall, the use of molecular simulation to study food protein-ligand interactions has a promising prospect.
Collapse
Affiliation(s)
- Zihan Jin
- State Key Laboratory of Marine Food Processing & Safety Control, College of Food Science and Engineering, Ocean University of China, Qingdao, China
| | - Zihao Wei
- State Key Laboratory of Marine Food Processing & Safety Control, College of Food Science and Engineering, Ocean University of China, Qingdao, China
| |
Collapse
|
6
|
Liao J, Wang Q, Wu F, Huang Z. In Silico Methods for Identification of Potential Active Sites of Therapeutic Targets. Molecules 2022; 27:7103. [PMID: 36296697 PMCID: PMC9609013 DOI: 10.3390/molecules27207103] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/12/2022] [Accepted: 08/25/2022] [Indexed: 07/30/2023] Open
Abstract
Target identification is an important step in drug discovery, and computer-aided drug target identification methods are attracting more attention compared with traditional drug target identification methods, which are time-consuming and costly. Computer-aided drug target identification methods can greatly reduce the searching scope of experimental targets and associated costs by identifying the diseases-related targets and their binding sites and evaluating the druggability of the predicted active sites for clinical trials. In this review, we introduce the principles of computer-based active site identification methods, including the identification of binding sites and assessment of druggability. We provide some guidelines for selecting methods for the identification of binding sites and assessment of druggability. In addition, we list the databases and tools commonly used with these methods, present examples of individual and combined applications, and compare the methods and tools. Finally, we discuss the challenges and limitations of binding site identification and druggability assessment at the current stage and provide some recommendations and future perspectives.
Collapse
Affiliation(s)
- Jianbo Liao
- Key Laboratory of Big Data Mining and Precision Drug Design of Guangdong Medical University, Key Laboratory of Computer-Aided Drug Design of Dongguan City, Key Laboratory for Research and Development of Natural Drugs of Guangdong Province, School of Pharmacy, Guangdong Medical University, Dongguan 523808, China
- The Second School of Clinical Medicine, Guangdong Medical University, Dongguan 523808, China
| | - Qinyu Wang
- Key Laboratory of Big Data Mining and Precision Drug Design of Guangdong Medical University, Key Laboratory of Computer-Aided Drug Design of Dongguan City, Key Laboratory for Research and Development of Natural Drugs of Guangdong Province, School of Pharmacy, Guangdong Medical University, Dongguan 523808, China
| | - Fengxu Wu
- Hubei Key Laboratory of Wudang Local Chinese Medicine Research, School of Pharmaceutical Sciences, Hubei University of Medicine, Shiyan 442000, China
| | - Zunnan Huang
- Key Laboratory of Big Data Mining and Precision Drug Design of Guangdong Medical University, Key Laboratory of Computer-Aided Drug Design of Dongguan City, Key Laboratory for Research and Development of Natural Drugs of Guangdong Province, School of Pharmacy, Guangdong Medical University, Dongguan 523808, China
- Marine Biomedical Research Institute of Guangdong Zhanjiang, Zhanjiang 524023, China
| |
Collapse
|
7
|
Veríssimo GC, Serafim MSM, Kronenberger T, Ferreira RS, Honorio KM, Maltarollo VG. Designing drugs when there is low data availability: one-shot learning and other approaches to face the issues of a long-term concern. Expert Opin Drug Discov 2022; 17:929-947. [PMID: 35983695 DOI: 10.1080/17460441.2022.2114451] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
INTRODUCTION Modern drug discovery generally is accessed by useful information from previous large databases or uncovering novel data. The lack of biological and/or chemical data tends to slow the development of scientific research and innovation. Here, approaches that may help provide solutions to generate or obtain enough relevant data or improve/accelerate existing methods within the last five years were reviewed. AREAS COVERED One-shot learning (OSL) approaches, structural modeling, molecular docking, scoring function space (SFS), molecular dynamics (MD), and quantum mechanics (QM) may be used to amplify the amount of available data to drug design and discovery campaigns, presenting methods, their perspectives, and discussions to be employed in the near future. EXPERT OPINION Recent works have successfully used these techniques to solve a range of issues in the face of data scarcity, including complex problems such as the challenging scenario of drug design aimed at intrinsically disordered proteins and the evaluation of potential adverse effects in a clinical scenario. These examples show that it is possible to improve and kickstart research from scarce available data to design and discover new potential drugs.
Collapse
Affiliation(s)
- Gabriel C Veríssimo
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Mateus Sá M Serafim
- Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Thales Kronenberger
- Department of Medical Oncology and Pneumology, Internal Medicine VIII, University Hospital of Tübingen, Tübingen, Germany.,School of Pharmacy, Faculty of Health Sciences, University of Eastern Finland, Kuopio, Finland
| | - Rafaela S Ferreira
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Kathia M Honorio
- Escola de Artes, Ciências e Humanidades, Universidade de São Paulo (USP), São Paulo, Brazil.,Centro de Ciências Naturais e Humanas, Universidade Federal do ABC (UFABC), Santo André, Brazil
| | - Vinícius G Maltarollo
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| |
Collapse
|
8
|
Zhao L, Zhu Y, Wang J, Wen N, Wang C, Cheng L. A brief review of protein-ligand interaction prediction. Comput Struct Biotechnol J 2022; 20:2831-2838. [PMID: 35765652 PMCID: PMC9189993 DOI: 10.1016/j.csbj.2022.06.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/30/2022] [Accepted: 06/01/2022] [Indexed: 01/21/2023] Open
Abstract
The task of identifying protein–ligand interactions (PLIs) plays a prominent role in the field of drug discovery. However, it is infeasible to identify potential PLIs via costly and laborious in vitro experiments. There is a need to develop PLI computational prediction approaches to speed up the drug discovery process. In this review, we summarize a brief introduction to various computation-based PLIs. We discuss these approaches, in particular, machine learning-based methods, with illustrations of different emphases based on mainstream trends. Moreover, we analyzed three research dynamics that can be further explored in future studies.
Collapse
Affiliation(s)
- Lingling Zhao
- Faculty of Computing, Harbin Institute of Technology, Harbin, China
| | - Yan Zhu
- Faculty of Computing, Harbin Institute of Technology, Harbin, China
| | - Junjie Wang
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Naifeng Wen
- School of Mechanical and Electrical Engineering, Dalian Minzu University, Dalian, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, China
- Corresponding authors.
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- NHC and CAMS Key Laboratory of Molecular Probe and Targeted Theranostics, Harbin Medical University, Harbin, China
- Corresponding authors.
| |
Collapse
|
9
|
Chelur VR, Priyakumar UD. BiRDS - Binding Residue Detection from Protein Sequences Using Deep ResNets. J Chem Inf Model 2022; 62:1809-1818. [PMID: 35414182 DOI: 10.1021/acs.jcim.1c00972] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Protein-drug interactions play important roles in many biological processes and therapeutics. Predicting the binding sites of a protein helps to discover such interactions. New drugs can be designed to optimize these interactions, improving protein function. The tertiary structure of a protein decides the binding sites available to the drug molecule, but the determination of the 3D structure is slow and expensive. Conversely, the determination of the amino acid sequence is swift and economical. Although quick and accurate prediction of the binding site using just the sequence is challenging, the application of Deep Learning, which has been hugely successful in several biochemical tasks, makes it feasible. BiRDS is a Residual Neural Network that predicts the protein's most active binding site using sequence information. SC-PDB, an annotated database of druggable binding sites, is used for training the network. Multiple Sequence Alignments of the proteins in the database are generated using DeepMSA, and features such as Position-Specific Scoring Matrix, Secondary Structure, and Relative Solvent Accessibility are extracted. During training, a weighted binary cross-entropy loss function is used to counter the substantial imbalance in the two classes of binding and nonbinding residues. A novel test set SC6K is introduced to compare binding-site prediction methods. BiRDS achieves an AUROC score of 0.87, and the center of 25% of its predicted binding sites lie within 4 Å of the center of the actual binding site.
Collapse
Affiliation(s)
- Vineeth R Chelur
- Center for Computational Natural Sciences & Bioinformatics International Institute of Information Technology Hyderabad 500032, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences & Bioinformatics International Institute of Information Technology Hyderabad 500032, India
| |
Collapse
|
10
|
Rezaei MA, Li Y, Wu D, Li X, Li C. Deep Learning in Drug Design: Protein-Ligand Binding Affinity Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:407-417. [PMID: 33360998 PMCID: PMC8942327 DOI: 10.1109/tcbb.2020.3046945] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Computational drug design relies on the calculation of binding strength between two biological counterparts especially a chemical compound, i.e., a ligand, and a protein. Predicting the affinity of protein-ligand binding with reasonable accuracy is crucial for drug discovery, and enables the optimization of compounds to achieve better interaction with their target protein. In this paper, we propose a data-driven framework named DeepAtom to accurately predict the protein-ligand binding affinity. With 3D Convolutional Neural Network (3D-CNN) architecture, DeepAtom could automatically extract binding related atomic interaction patterns from the voxelized complex structure. Compared with the other CNN based approaches, our light-weight model design effectively improves the model representational capacity, even with the limited available training data. We carried out validation experiments on the PDBbind v.2016 benchmark and the independent Astex Diverse Set. We demonstrate that the less feature engineering dependent DeepAtom approach consistently outperforms the other baseline scoring methods. We also compile and propose a new benchmark dataset to further improve the model performances. With the new dataset as training input, DeepAtom achieves Pearson's R=0.83 and RMSE=1.23 pK units on the PDBbind v.2016 core set. The promising results demonstrate that DeepAtom models can be potentially adopted in computational drug development protocols such as molecular docking and virtual screening.
Collapse
Affiliation(s)
- Mohammad A. Rezaei
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development (CNPD3), University of Florida
| | - Yanjun Li
- Large-scale Intelligent Systems Laboratory, NSF Center for Big Learning, University of Florida Gainesville, FL, USA
| | - Dapeng Wu
- Large-scale Intelligent Systems Laboratory, NSF Center for Big Learning, University of Florida Gainesville, FL, USA
| | - Xiaolin Li
- Cognization Lab, Palo Alto, California, USA
| | - Chenglong Li
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development (CNPD3), University of Florida
- Large-scale Intelligent Systems Laboratory, NSF Center for Big Learning, University of Florida Gainesville, FL, USA
| |
Collapse
|
11
|
Hermansyah O, Bustamam A, Yanuar A. Virtual screening of dipeptidyl peptidase-4 inhibitors using quantitative structure-activity relationship-based artificial intelligence and molecular docking of hit compounds. Comput Biol Chem 2021; 95:107597. [PMID: 34800858 DOI: 10.1016/j.compbiolchem.2021.107597] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2021] [Revised: 10/25/2021] [Accepted: 10/26/2021] [Indexed: 12/31/2022]
Abstract
Dipeptidyl peptidase-4 (DPP-4) inhibitors are becoming an essential drug in the treatment of type 2 diabetes mellitus; however, some classes of these drugs exert side effects, including joint pain and pancreatitis. Studies suggest that these side effects might be related to secondary inhibition of DPP-8 and DPP-9. In this study, we identified DPP-4-inhibitor hit compounds selective against DPP-8 and DPP-9. We built a virtual screening workflow using a quantitative structure-activity relationship (QSAR) strategy based on artificial intelligence to allow faster screening of millions of molecules for the DPP-4 target relative to other screening methods. Five regression machine learning algorithms and four classification machine learning algorithms were applied to build virtual screening workflows, with the QSAR model applied using support vector regression (R2pred 0.78) and the classification QSAR model using the random forest algorithm with 92.2% accuracy. Virtual screening results of > 10 million molecules obtained 2 716 hits compounds with a pIC50 value of > 7.5. Additionally, molecular docking results of several potential hit compounds for DPP-4, DPP-8, and DPP-9 identified CH0002 as showing high inhibitory potential against DPP-4 and low inhibitory potential for DPP-8 and DPP-9 enzymes. These results demonstrated the effectiveness of this technique for identifying DPP-4-inhibitor hit compounds selective for DPP-4 and against DPP-8 and DPP-9 and suggest its potential efficacy for applications to discover hit compounds of other targets.
Collapse
Affiliation(s)
- Oky Hermansyah
- Laboratory of Biomedical Computation and Drug Design, Faculty of Pharmacy, Universitas Indonesia, Depok 16424, Indonesia
| | - Alhadi Bustamam
- Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Indonesia, Depok 16424, Indonesia
| | - Arry Yanuar
- Laboratory of Biomedical Computation and Drug Design, Faculty of Pharmacy, Universitas Indonesia, Depok 16424, Indonesia.
| |
Collapse
|
12
|
Crampon K, Giorkallos A, Deldossi M, Baud S, Steffenel LA. Machine-learning methods for ligand-protein molecular docking. Drug Discov Today 2021; 27:151-164. [PMID: 34560276 DOI: 10.1016/j.drudis.2021.09.007] [Citation(s) in RCA: 96] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 07/14/2021] [Accepted: 09/15/2021] [Indexed: 12/22/2022]
Abstract
Artificial intelligence (AI) is often presented as a new Industrial Revolution. Many domains use AI, including molecular simulation for drug discovery. In this review, we provide an overview of ligand-protein molecular docking and how machine learning (ML), especially deep learning (DL), a subset of ML, is transforming the field by tackling the associated challenges.
Collapse
Affiliation(s)
- Kevin Crampon
- Université de Reims Champagne Ardenne, CNRS, MEDyC UMR 7369, 51097 Reims, France; Université de Reims Champagne Ardenne, LICIIS - LRC CEA DIGIT, 51100 Reims, France; Atos SE, Center of Excellence in Advanced Computing, 38130 Echirolles, France
| | - Alexis Giorkallos
- Atos SE, Center of Excellence in Advanced Computing, 38130 Echirolles, France
| | - Myrtille Deldossi
- Atos SE, Center of Excellence in Advanced Computing, 38130 Echirolles, France
| | - Stéphanie Baud
- Université de Reims Champagne Ardenne, CNRS, MEDyC UMR 7369, 51097 Reims, France
| | | |
Collapse
|
13
|
Veit-Acosta M, de Azevedo Junior WF. Computational Prediction of Binding Affinity for CDK2-ligand Complexes. A Protein Target for Cancer Drug Discovery. Curr Med Chem 2021; 29:2438-2455. [PMID: 34365938 DOI: 10.2174/0929867328666210806105810] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 06/15/2021] [Accepted: 06/22/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND CDK2 participates in the control of eukaryotic cell-cycle progression. Due to the great interest in CDK2 for drug development and the relative easiness in crystallizing this enzyme, we have over 400 structural studies focused on this protein target. This structural data is the basis for the development of computational models to estimate CDK2-ligand binding affinity. OBJECTIVE This work focuses on the recent developments in the application of supervised machine learning modeling to develop scoring functions to predict the binding affinity of CDK2. METHOD We employed the structures available at the protein data bank and the ligand information accessed from the BindingDB, Binding MOAD, and PDBbind to evaluate the predictive performance of machine learning techniques combined with physical modeling used to calculate binding affinity. We compared this hybrid methodology with classical scoring functions available in docking programs. RESULTS Our comparative analysis of previously published models indicated that a model created using a combination of a mass-spring system and cross-validated Elastic Net to predict the binding affinity of CDK2-inhibitor complexes outperformed classical scoring functions available in AutoDock4 and AutoDock Vina. CONCLUSION All studies reviewed here suggest that targeted machine learning models are superior to classical scoring functions to calculate binding affinities. Specifically for CDK2, we see that the combination of physical modeling with supervised machine learning techniques exhibits improved predictive performance to calculate the protein-ligand binding affinity. These results find theoretical support in the application of the concept of scoring function space.
Collapse
Affiliation(s)
- Martina Veit-Acosta
- Western Michigan University, 1903 Western, Michigan Ave, Kalamazoo, MI 49008. United States
| | | |
Collapse
|
14
|
Abbasi WA, Abbas SA, Andleeb S. PANDA: Predicting the change in proteins binding affinity upon mutations by finding a signal in primary structures. J Bioinform Comput Biol 2021; 19:2150015. [PMID: 34126874 DOI: 10.1142/s0219720021500153] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Accurately determining a change in protein binding affinity upon mutations is important to find novel therapeutics and to assist mutagenesis studies. Determination of change in binding affinity upon mutations requires sophisticated, expensive, and time-consuming wet-lab experiments that can be supported with computational methods. Most of the available computational prediction techniques depend upon protein structures that bound their applicability to only protein complexes with recognized 3D structures. In this work, we explore the sequence-based prediction of change in protein binding affinity upon mutation and question the effectiveness of [Formula: see text]-fold cross-validation (CV) across mutations adopted in previous studies to assess the generalization ability of such predictors with no known mutation during training. We have used protein sequence information instead of protein structures along with machine learning techniques to accurately predict the change in protein binding affinity upon mutation. Our proposed sequence-based novel change in protein binding affinity predictor called PANDA performs comparably to the existing methods gauged through an appropriate CV scheme and an external independent test dataset. On an external test dataset, our proposed method gives a maximum Pearson correlation coefficient of 0.52 in comparison to the state-of-the-art existing protein structure-based method called MutaBind which gives a maximum Pearson correlation coefficient of 0.59. Our proposed protein sequence-based method, to predict a change in binding affinity upon mutations, has wide applicability and comparable performance in comparison to existing protein structure-based methods. We made PANDA easily accessible through a cloud-based webserver and python code available at https://sites.google.com/view/wajidarshad/software and https://github.com/wajidarshad/panda, respectively.
Collapse
Affiliation(s)
- Wajid Arshad Abbasi
- Computational Biology and Data Analysis Lab., Department of Computer Sciences & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K 13100, Pakistan
| | - Syed Ali Abbas
- Computational Biology and Data Analysis Lab., Department of Computer Sciences & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K 13100, Pakistan
| | - Saiqa Andleeb
- Biotechnology Lab., Department of Zoology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K 13100, Pakistan
| |
Collapse
|
15
|
Bitencourt-Ferreira G, Rizzotto C, de Azevedo Junior WF. Machine Learning-Based Scoring Functions, Development and Applications with SAnDReS. Curr Med Chem 2021; 28:1746-1756. [PMID: 32410551 DOI: 10.2174/0929867327666200515101820] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 04/06/2020] [Accepted: 04/07/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. OBJECTIVE Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. METHODS SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding and thermodynamic data to create targeted scoring functions. RESULTS Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. CONCLUSION Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker and AutoDock Vina.
Collapse
Affiliation(s)
| | - Camila Rizzotto
- Pontifical Catholic University of Rio Grande do Sul - PUCRS, Porto Alegre-RS, Brazil
| | | |
Collapse
|
16
|
Bitencourt-Ferreira G, Duarte da Silva A, Filgueira de Azevedo W. Application of Machine Learning Techniques to Predict Binding Affinity for Drug Targets: A Study of Cyclin-Dependent Kinase 2. Curr Med Chem 2021; 28:253-265. [PMID: 31729287 DOI: 10.2174/2213275912666191102162959] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 08/22/2019] [Accepted: 09/24/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. OBJECTIVE Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. METHODS We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. RESULTS Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. CONCLUSION Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.
Collapse
Affiliation(s)
- Gabriela Bitencourt-Ferreira
- Laboratory of Computational Systems Biology. Pontifical Catholic University of Rio Grande do Sul (PUCRS). Av. Ipiranga, 6681 Porto Alegre/RS 90619-900 , Brazil
| | - Amauri Duarte da Silva
- Specialization Program in Bioinformatics. Pontifical Catholic University of Rio Grande do Sul (PUCRS). Av. Ipiranga, 6681 Porto Alegre/RS 90619-900, Brazil
| | - Walter Filgueira de Azevedo
- Laboratory of Computational Systems Biology. Pontifical Catholic University of Rio Grande do Sul (PUCRS). Av. Ipiranga, 6681 Porto Alegre/RS 90619-900 , Brazil
| |
Collapse
|
17
|
Taguchi AT, Boyd J, Diehnelt CW, Legutki JB, Zhao ZG, Woodbury NW. Comprehensive Prediction of Molecular Recognition in a Combinatorial Chemical Space Using Machine Learning. ACS COMBINATORIAL SCIENCE 2020; 22:500-508. [PMID: 32786325 DOI: 10.1021/acscombsci.0c00003] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
In combinatorial chemical approaches, optimizing the composition and arrangement of building blocks toward a particular function has been done using a number of methods, including high throughput molecular screening, molecular evolution, and computational prescreening. Here, a different approach is considered that uses sparse measurements of library molecules as the input to a machine learning algorithm which generates a comprehensive, quantitative relationship between covalent molecular structure and function that can then be used to predict the function of any molecule in the possible combinatorial space. To test the feasibility of the approach, a defined combinatorial chemical space consisting of ∼1012 possible linear combinations of 16 different amino acids was used. The binding of a very sparse, but nearly random, sampling of this amino acid sequence space to 9 different protein targets is measured and used to generate a general relationship between peptide sequence and binding for each target. Surprisingly, measuring as little as a few hundred to a few thousand of the ∼1012 possible molecules provides sufficient training to be highly predictive of the binding of the remaining molecules in the combinatorial space. Furthermore, measuring only amino acid sequences that bind weakly to a target allows the accurate prediction of which sequences will bind 10-100 times more strongly. Thus, the molecular recognition information contained in a tiny fraction of molecules in this combinatorial space is sufficient to characterize any set of molecules randomly selected from the entire space, a fact that potentially has significant implications for the design of new chemical function using combinatorial chemical libraries.
Collapse
Affiliation(s)
| | - James Boyd
- HealthTell, Inc., 145 S 79th Street, Chandler, Arizona 85226, United States
| | - Chris W. Diehnelt
- Center for Innovations in Medicine at the Biodesign Institute, Arizona State University, Tempe, Arizona 85287, United States
| | - Joseph B. Legutki
- HealthTell, Inc., 145 S 79th Street, Chandler, Arizona 85226, United States
| | - Zhan-Gong Zhao
- Center for Innovations in Medicine at the Biodesign Institute, Arizona State University, Tempe, Arizona 85287, United States
| | - Neal W. Woodbury
- Center for Innovations in Medicine at the Biodesign Institute, Arizona State University, Tempe, Arizona 85287, United States
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States
| |
Collapse
|
18
|
Asati V, Agarwal S, Mishra M, Das R, Kashaw SK. Structural prediction of novel pyrazolo-pyrimidine derivatives against PIM-1 kinase: In-silico drug design studies. J Mol Struct 2020. [DOI: 10.1016/j.molstruc.2020.128375] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
19
|
Fresnais L, Ballester PJ. The impact of compound library size on the performance of scoring functions for structure-based virtual screening. Brief Bioinform 2020; 22:5855396. [PMID: 32568385 DOI: 10.1093/bib/bbaa095] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Revised: 04/20/2020] [Accepted: 04/28/2020] [Indexed: 12/20/2022] Open
Abstract
Larger training datasets have been shown to improve the accuracy of machine learning (ML)-based scoring functions (SFs) for structure-based virtual screening (SBVS). In addition, massive test sets for SBVS, known as ultra-large compound libraries, have been demonstrated to enable the fast discovery of selective drug leads with low-nanomolar potency. This proof-of-concept was carried out on two targets using a single docking tool along with its SF. It is thus unclear whether this high level of performance would generalise to other targets, docking tools and SFs. We found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF. Furthermore, we established that a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs (we found this to be true in four of the six targets; the difference was not significant in the remaining two targets). A 3-fold increase in average hit rate across targets was also achieved by the ML-based SFs. Lastly, we observed that classical and ML-based SFs often find different actives, which supports using both types of SFs on those targets.
Collapse
|
20
|
de Ávila MB, Bitencourt-Ferreira G, de Azevedo WF. Structural Basis for Inhibition of Enoyl-[Acyl Carrier Protein] Reductase (InhA) from Mycobacterium tuberculosis. Curr Med Chem 2020; 27:745-759. [DOI: 10.2174/0929867326666181203125229] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Revised: 07/26/2018] [Accepted: 11/14/2018] [Indexed: 12/18/2022]
Abstract
Background::
The enzyme trans-enoyl-[acyl carrier protein] reductase (InhA) is a central
protein for the development of antitubercular drugs. This enzyme is the target for the pro-drug
isoniazid, which is catalyzed by the enzyme catalase-peroxidase (KatG) to become active.
Objective::
Our goal here is to review the studies on InhA, starting with general aspects and focusing on
the recent structural studies, with emphasis on the crystallographic structures of complexes involving
InhA and inhibitors.
Method::
We start with a literature review, and then we describe recent studies on InhA crystallographic
structures. We use this structural information to depict protein-ligand interactions. We also analyze the
structural basis for inhibition of InhA. Furthermore, we describe the application of computational
methods to predict binding affinity based on the crystallographic position of the ligands.
Results::
Analysis of the structures in complex with inhibitors revealed the critical residues responsible
for the specificity against InhA. Most of the intermolecular interactions involve the hydrophobic residues
with two exceptions, the residues Ser 94 and Tyr 158. Examination of the interactions has shown
that many of the key residues for inhibitor binding were found in mutations of the InhA gene in the
isoniazid-resistant Mycobacterium tuberculosis. Computational prediction of the binding affinity for
InhA has indicated a moderate uphill relationship with experimental values.
Conclusion::
Analysis of the structures involving InhA inhibitors shows that small modifications on
these molecules could modulate their inhibition, which may be used to design novel antitubercular
drugs specific for multidrug-resistant strains.
Collapse
Affiliation(s)
- Maurício Boff de Ávila
- Laboratory of Computational Systems Biology, School of Sciences - Pontifical Catholic University of Rio, Grande do Sul (PUCRS), Av. Ipiranga, 6681, Porto Alegre-RS 90619-900, Brazil
| | - Gabriela Bitencourt-Ferreira
- Laboratory of Computational Systems Biology, School of Sciences - Pontifical Catholic University of Rio, Grande do Sul (PUCRS), Av. Ipiranga, 6681, Porto Alegre-RS 90619-900, Brazil
| | - Walter Filgueira de Azevedo
- Laboratory of Computational Systems Biology, School of Sciences - Pontifical Catholic University of Rio, Grande do Sul (PUCRS), Av. Ipiranga, 6681, Porto Alegre-RS 90619-900, Brazil
| |
Collapse
|
21
|
Zhao J, Cao Y, Zhang L. Exploring the computational methods for protein-ligand binding site prediction. Comput Struct Biotechnol J 2020; 18:417-426. [PMID: 32140203 PMCID: PMC7049599 DOI: 10.1016/j.csbj.2020.02.008] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 01/23/2020] [Accepted: 02/11/2020] [Indexed: 12/21/2022] Open
Abstract
Proteins participate in various essential processes in vivo via interactions with other molecules. Identifying the residues participating in these interactions not only provides biological insights for protein function studies but also has great significance for drug discoveries. Therefore, predicting protein-ligand binding sites has long been under intense research in the fields of bioinformatics and computer aided drug discovery. In this review, we first introduce the research background of predicting protein-ligand binding sites and then classify the methods into four categories, namely, 3D structure-based, template similarity-based, traditional machine learning-based and deep learning-based methods. We describe representative algorithms in each category and elaborate on machine learning and deep learning-based prediction methods in more detail. Finally, we discuss the trends and challenges of the current research such as molecular dynamics simulation based cryptic binding sites prediction, and highlight prospective directions for the near future.
Collapse
Affiliation(s)
- Jingtian Zhao
- College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Yang Cao
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Le Zhang
- College of Computer Science, Sichuan University, Chengdu 610065, China
| |
Collapse
|