1
|
Guichaoua G, Pinel P, Hoffmann B, Azencott CA, Stoven V. Drug-Target Interactions Prediction at Scale: The Komet Algorithm with the LCIdb Dataset. J Chem Inf Model 2024. [PMID: 39237105 DOI: 10.1021/acs.jcim.4c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
Drug-target interactions (DTIs) prediction algorithms are used at various stages of the drug discovery process. In this context, specific problems such as deorphanization of a new therapeutic target or target identification of a drug candidate arising from phenotypic screens require large-scale predictions across the protein and molecule spaces. DTI prediction heavily relies on supervised learning algorithms that use known DTIs to learn associations between molecule and protein features, allowing for the prediction of new interactions based on learned patterns. The algorithms must be broadly applicable to enable reliable predictions, even in regions of the protein or molecule spaces where data may be scarce. In this paper, we address two key challenges to fulfill these goals: building large, high-quality training datasets and designing prediction methods that can scale, in order to be trained on such large datasets. First, we introduce LCIdb, a curated, large-sized dataset of DTIs, offering extensive coverage of both the molecule and druggable protein spaces. Notably, LCIdb contains a much higher number of molecules than publicly available benchmarks, expanding coverage of the molecule space. Second, we propose Komet (Kronecker Optimized METhod), a DTI prediction pipeline designed for scalability without compromising performance. Komet leverages a three-step framework, incorporating efficient computation choices tailored for large datasets and involving the Nyström approximation. Specifically, Komet employs a Kronecker interaction module for (molecule, protein) pairs, which efficiently captures determinants in DTIs, and whose structure allows for reduced computational complexity and quasi-Newton optimization, ensuring that the model can handle large training sets, without compromising on performance. Our method is implemented in open-source software, leveraging GPU parallel computation for efficiency. We demonstrate the interest of our pipeline on various datasets, showing that Komet displays superior scalability and prediction performance compared to state-of-the-art deep learning approaches. Additionally, we illustrate the generalization properties of Komet by showing its performance on an external dataset, and on the publicly available L H benchmark designed for scaffold hopping problems. Komet is available open source at https://komet.readthedocs.io and all datasets, including LCIdb, can be found at https://zenodo.org/records/10731712.
Collapse
Affiliation(s)
- Gwenn Guichaoua
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
| | - Philippe Pinel
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
- Iktos SAS, 75017 Paris, France
| | | | - Chloé-Agathe Azencott
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
| | - Véronique Stoven
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
| |
Collapse
|
2
|
Birgül Iyison N, Abboud C, Abboud D, Abdulrahman AO, Bondar AN, Dam J, Georgoussi Z, Giraldo J, Horvat A, Karoussiotis C, Paz-Castro A, Scarpa M, Schihada H, Scholz N, Güvenc Tuna B, Vardjan N. ERNEST COST action overview on the (patho)physiology of GPCRs and orphan GPCRs in the nervous system. Br J Pharmacol 2024. [PMID: 38825750 DOI: 10.1111/bph.16389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 02/09/2024] [Accepted: 02/24/2024] [Indexed: 06/04/2024] Open
Abstract
G protein-coupled receptors (GPCRs) are a large family of cell surface receptors that play a critical role in nervous system function by transmitting signals between cells and their environment. They are involved in many, if not all, nervous system processes, and their dysfunction has been linked to various neurological disorders representing important drug targets. This overview emphasises the GPCRs of the nervous system, which are the research focus of the members of ERNEST COST action (CA18133) working group 'Biological roles of signal transduction'. First, the (patho)physiological role of the nervous system GPCRs in the modulation of synapse function is discussed. We then debate the (patho)physiology and pharmacology of opioid, acetylcholine, chemokine, melatonin and adhesion GPCRs in the nervous system. Finally, we address the orphan GPCRs, their implication in the nervous system function and disease, and the challenges that need to be addressed to deorphanize them.
Collapse
Affiliation(s)
- Necla Birgül Iyison
- Department of Molecular Biology and Genetics, University of Bogazici, Istanbul, Turkey
| | - Clauda Abboud
- Laboratory of Molecular Pharmacology, GIGA-Molecular Biology of Diseases, University of Liege, Liege, Belgium
| | - Dayana Abboud
- Laboratory of Molecular Pharmacology, GIGA-Molecular Biology of Diseases, University of Liege, Liege, Belgium
| | | | - Ana-Nicoleta Bondar
- Faculty of Physics, University of Bucharest, Magurele, Romania
- Forschungszentrum Jülich, Institute for Computational Biomedicine (IAS-5/INM-9), Jülich, Germany
| | - Julie Dam
- Institut Cochin, CNRS, INSERM, Université Paris Cité, Paris, France
| | - Zafiroula Georgoussi
- Laboratory of Cellular Signalling and Molecular Pharmacology, Institute of Biosciences and Applications, National Center for Scientific Research "Demokritos", Athens, Greece
| | - Jesús Giraldo
- Laboratory of Molecular Neuropharmacology and Bioinformatics, Unitat de Bioestadística and Institut de Neurociències, Universitat Autònoma de Barcelona, Bellaterra, Spain
- Instituto de Salud Carlos III, Centro de Investigación Biomédica en Red de Salud Mental, CIBERSAM, Madrid, Spain
- Unitat de Neurociència Traslacional, Parc Taulí Hospital Universitari, Institut d'Investigació i Innovació Parc Taulí (I3PT), Institut de Neurociències, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Anemari Horvat
- Laboratory of Neuroendocrinology - Molecular Cell Physiology, Institute of Pathophysiology, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
- Laboratory of Cell Engineering, Celica Biomedical, Ljubljana, Slovenia
| | - Christos Karoussiotis
- Laboratory of Cellular Signalling and Molecular Pharmacology, Institute of Biosciences and Applications, National Center for Scientific Research "Demokritos", Athens, Greece
| | - Alba Paz-Castro
- Molecular Pharmacology of GPCRs research group, Center for Research in Molecular Medicine and Chronic Diseases (CiMUS), Universidade de Santiago de Compostela, Santiago, Spain
- Instituto de Investigación Sanitaria de Santiago de Compostela (IDIS), Santiago, Spain
| | - Miriam Scarpa
- Division of Clinical Geriatrics, Center for Alzheimer Research, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden
| | - Hannes Schihada
- Department of Pharmaceutical Chemistry, Philipps-University Marburg, Marburg, Germany
| | - Nicole Scholz
- Rudolf Schönheimer Institute of Biochemistry, Division of General Biochemistry, Medical Faculty, Leipzig University, Leipzig, Germany
| | - Bilge Güvenc Tuna
- Department of Biophysics, School of Medicine, Yeditepe University, Istanbul, Turkey
| | - Nina Vardjan
- Laboratory of Neuroendocrinology - Molecular Cell Physiology, Institute of Pathophysiology, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
- Laboratory of Cell Engineering, Celica Biomedical, Ljubljana, Slovenia
| |
Collapse
|
3
|
Curcio A, Rocca R, Alcaro S, Artese A. The Histone Deacetylase Family: Structural Features and Application of Combined Computational Methods. Pharmaceuticals (Basel) 2024; 17:620. [PMID: 38794190 PMCID: PMC11124352 DOI: 10.3390/ph17050620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 05/03/2024] [Accepted: 05/08/2024] [Indexed: 05/26/2024] Open
Abstract
Histone deacetylases (HDACs) are crucial in gene transcription, removing acetyl groups from histones. They also influence the deacetylation of non-histone proteins, contributing to the regulation of various biological processes. Thus, HDACs play pivotal roles in various diseases, including cancer, neurodegenerative disorders, and inflammatory conditions, highlighting their potential as therapeutic targets. This paper reviews the structure and function of the four classes of human HDACs. While four HDAC inhibitors are currently available for treating hematological malignancies, numerous others are undergoing clinical trials. However, their non-selective toxicity necessitates ongoing research into safer and more efficient class-selective or isoform-selective inhibitors. Computational methods have aided the discovery of HDAC inhibitors with the desired potency and/or selectivity. These methods include ligand-based approaches, such as scaffold hopping, pharmacophore modeling, three-dimensional quantitative structure-activity relationships, and structure-based virtual screening (molecular docking). Moreover, recent developments in the field of molecular dynamics simulations, combined with Poisson-Boltzmann/molecular mechanics generalized Born surface area techniques, have improved the prediction of ligand binding affinity. In this review, we delve into the ways in which these methods have contributed to designing and identifying HDAC inhibitors.
Collapse
Affiliation(s)
- Antonio Curcio
- Dipartimento di Scienze della Salute, Campus “S. Venuta”, Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy; (A.C.); (S.A.); (A.A.)
| | - Roberta Rocca
- Dipartimento di Scienze della Salute, Campus “S. Venuta”, Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy; (A.C.); (S.A.); (A.A.)
- Net4Science S.r.l., Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy
| | - Stefano Alcaro
- Dipartimento di Scienze della Salute, Campus “S. Venuta”, Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy; (A.C.); (S.A.); (A.A.)
- Net4Science S.r.l., Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy
| | - Anna Artese
- Dipartimento di Scienze della Salute, Campus “S. Venuta”, Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy; (A.C.); (S.A.); (A.A.)
- Net4Science S.r.l., Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy
| |
Collapse
|
4
|
Jobe A, Vijayan R. Orphan G protein-coupled receptors: the ongoing search for a home. Front Pharmacol 2024; 15:1349097. [PMID: 38495099 PMCID: PMC10941346 DOI: 10.3389/fphar.2024.1349097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 02/15/2024] [Indexed: 03/19/2024] Open
Abstract
G protein-coupled receptors (GPCRs) make up the largest receptor superfamily, accounting for 4% of protein-coding genes. Despite the prevalence of such transmembrane receptors, a significant number remain orphans, lacking identified endogenous ligands. Since their conception, the reverse pharmacology approach has been used to characterize such receptors. However, the multifaceted and nuanced nature of GPCR signaling poses a great challenge to their pharmacological elucidation. Considering their therapeutic relevance, the search for native orphan GPCR ligands continues. Despite limited structural input in terms of 3D crystallized structures, with advances in machine-learning approaches, there has been great progress with respect to accurate ligand prediction. Though such an approach proves valuable given that ligand scarcity is the greatest hurdle to orphan GPCR deorphanization, the future pairings of the remaining orphan GPCRs may not necessarily take a one-size-fits-all approach but should be more comprehensive in accounting for numerous nuanced possibilities to cover the full spectrum of GPCR signaling.
Collapse
Affiliation(s)
- Amie Jobe
- Department of Biology, College of Science, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Ranjit Vijayan
- Department of Biology, College of Science, United Arab Emirates University, Al Ain, United Arab Emirates
- The Big Data Analytics Center, United Arab Emirates University, Al Ain, United Arab Emirates
- Zayed Bin Sultan Center for Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
| |
Collapse
|
5
|
Pinel P, Guichaoua G, Najm M, Labouille S, Drizard N, Gaston-Mathé Y, Hoffmann B, Stoven V. Exploring isofunctional molecules: Design of a benchmark and evaluation of prediction performance. Mol Inform 2023; 42:e2200216. [PMID: 36633361 DOI: 10.1002/minf.202200216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 12/19/2022] [Accepted: 01/11/2023] [Indexed: 01/13/2023]
Abstract
Identification of novel chemotypes with biological activity similar to a known active molecule is an important challenge in drug discovery called 'scaffold hopping'. Small-, medium-, and large-step scaffold hopping efforts may lead to increasing degrees of chemical structure novelty with respect to the parent compound. In the present paper, we focus on the problem of large-step scaffold hopping. We assembled a high quality and well characterized dataset of scaffold hopping examples comprising pairs of active molecules and including a variety of protein targets. This dataset was used to build a benchmark corresponding to the setting of real-life applications: one active molecule is known, and the second active is searched among a set of decoys chosen in a way to avoid statistical bias. This allowed us to evaluate the performance of computational methods for solving large-step scaffold hopping problems. In particular, we assessed how difficult these problems are, particularly for classical 2D and 3D ligand-based methods. We also showed that a machine-learning chemogenomic algorithm outperforms classical methods and we provided some useful hints for future improvements.
Collapse
Affiliation(s)
- Philippe Pinel
- Center for Computational Biology, Mines Paris-PSL, PSL Research University, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75428, Paris, France.,Iktos SAS, 75017, Paris, France
| | - Gwenn Guichaoua
- Center for Computational Biology, Mines Paris-PSL, PSL Research University, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75428, Paris, France
| | - Matthieu Najm
- Center for Computational Biology, Mines Paris-PSL, PSL Research University, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75428, Paris, France
| | | | | | | | | | - Véronique Stoven
- Center for Computational Biology, Mines Paris-PSL, PSL Research University, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75428, Paris, France
| |
Collapse
|
6
|
Daoud S, Taha M. Ligand-Based Modeling of CXC Chemokine Receptor 4 and Identification of Inhibitors of Novel Chemotypes as Potential Leads towards New Anti-COVID-19 Treatments. Med Chem 2022; 18:871-883. [PMID: 35040417 DOI: 10.2174/1573406418666220118153541] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 12/05/2021] [Accepted: 12/08/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND Chemokines are involved in several human diseases and in different stages of COVID-19 infection and play critical role in the pathophysiology of the associated acute respiratory disease syndrome, a major complication leading to death among COVID-19 patients. In particular, CXC chemokine receptor 4 (CXCR4) was found to be highly expressed in COVID-19 patients. METHODS We herein describe a computational workflow based on combining pharmacophore modeling and QSAR analysis towards the discovery of novel CXCR4 inhibitors. Subsequent virtual screening identified two promising CXCR4 inhibitors from the National Cancer Institute (NCI) list of compounds. The most active hit showed in vitro IC50 value of 24.4 µM. RESULTS AND CONCLUSION These results prove the validity of the QSAR model and associated pharmacophore models as means to screen virtual databases towards new CXCR4 inhibitors as leads for the development of new COVID-19 therapies.
Collapse
Affiliation(s)
- Safa Daoud
- Department of Pharmaceutical Chemistry and Pharmacognosy, Faculty of Pharmacy, Applied Sciences Private University, Amman, Jordan
| | - Mutasem Taha
- Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan, Amman, Jordan
| |
Collapse
|
7
|
Predicting Drug-Target Interactions Based on the Ensemble Models of Multiple Feature Pairs. Int J Mol Sci 2021; 22:ijms22126598. [PMID: 34202954 PMCID: PMC8234024 DOI: 10.3390/ijms22126598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 06/09/2021] [Accepted: 06/16/2021] [Indexed: 11/30/2022] Open
Abstract
Backgroud: The prediction of drug–target interactions (DTIs) is of great significance in drug development. It is time-consuming and expensive in traditional experimental methods. Machine learning can reduce the cost of prediction and is limited by the characteristics of imbalanced datasets and problems of essential feature selection. Methods: The prediction method based on the Ensemble model of Multiple Feature Pairs (Ensemble-MFP) is introduced. Firstly, three negative sets are generated according to the Euclidean distance of three feature pairs. Then, the negative samples of the validation set/test set are randomly selected from the union set of the three negative sets in the validation set/test set. At the same time, the ensemble model with weight is optimized and applied to the test set. Results: The area under the receiver operating characteristic curve (area under ROC, AUC) in three out of four sub-datasets in gold standard datasets was more than 94.0% in the prediction of new drugs. The effectiveness of the proposed method is also shown with the comparison of state-of-the-art methods and demonstration of predicted drug–target pairs. Conclusion: The Ensemble-MFP can weigh the existing feature pairs and has a good prediction effect for general prediction on new drugs.
Collapse
|
8
|
Yang S, Ye Q, Ding J, Yin, Lu A, Chen X, Hou T, Cao D. Current advances in ligand‐based target prediction. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1504] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Su‐Qing Yang
- Xiangya School of Pharmaceutical Sciences Central South University Changsha Hunan China
| | - Qing Ye
- College of Pharmaceutical Sciences Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University Hangzhou, Zhejiang China
| | - Jun‐Jie Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing China
| | - Yin
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital Central South University Changsha Hunan China
| | - Ai‐Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong China
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital Central South University Changsha Hunan China
| | - Ting‐Jun Hou
- College of Pharmaceutical Sciences Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University Hangzhou, Zhejiang China
| | - Dong‐Sheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha Hunan China
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong China
| |
Collapse
|
9
|
Mervin LH, Afzal AM, Engkvist O, Bender A. Comparison of Scaling Methods to Obtain Calibrated Probabilities of Activity for Protein–Ligand Predictions. J Chem Inf Model 2020; 60:4546-4559. [DOI: 10.1021/acs.jcim.0c00476] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Lewis H. Mervin
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Cambridge CB2 0AA, U.K
| | - Avid M. Afzal
- Data Sciences & Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB2 0AA, U.K
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Mölndal SE-431 83, Sweden
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge CB2 1TN, U.K
| |
Collapse
|
10
|
Gong J, Chen Y, Pu F, Sun P, He F, Zhang L, Li Y, Ma Z, Wang H. Understanding Membrane Protein Drug Targets in Computational Perspective. Curr Drug Targets 2020; 20:551-564. [PMID: 30516106 DOI: 10.2174/1389450120666181204164721] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 09/03/2018] [Accepted: 09/04/2018] [Indexed: 01/16/2023]
Abstract
Membrane proteins play crucial physiological roles in vivo and are the major category of drug targets for pharmaceuticals. The research on membrane protein is a significant part in the drug discovery. The biological process is a cycled network, and the membrane protein is a vital hub in the network since most drugs achieve the therapeutic effect via interacting with the membrane protein. In this review, typical membrane protein targets are described, including GPCRs, transporters and ion channels. Also, we conclude network servers and databases that are referring to the drug, drug-target information and their relevant data. Furthermore, we chiefly introduce the development and practice of modern medicines, particularly demonstrating a series of state-of-the-art computational models for the prediction of drug-target interaction containing network-based approach and machine-learningbased approach as well as showing current achievements. Finally, we discuss the prospective orientation of drug repurposing and drug discovery as well as propose some improved framework in bioactivity data, created or improved predicted approaches, alternative understanding approaches of drugs bioactivity and their biological processes.
Collapse
Affiliation(s)
- Jianting Gong
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Yongbing Chen
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Feng Pu
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Pingping Sun
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Fei He
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Li Zhang
- School of Computer Science and Engineering, Changchun University of Technology, Changchun, China
| | - Yanwen Li
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Han Wang
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| |
Collapse
|
11
|
Playe B, Stoven V. Evaluation of deep and shallow learning methods in chemogenomics for the prediction of drugs specificity. J Cheminform 2020; 12:11. [PMID: 33431042 PMCID: PMC7011501 DOI: 10.1186/s13321-020-0413-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 01/27/2020] [Indexed: 01/09/2023] Open
Abstract
Chemogenomics, also called proteochemometrics, covers a range of computational methods that can be used to predict protein–ligand interactions at large scales in the protein and chemical spaces. They differ from more classical ligand-based methods (also called QSAR) that predict ligands for a given protein receptor. In the context of drug discovery process, chemogenomics allows to tackle the question of predicting off-target proteins for drug candidates, one of the main causes of undesirable side-effects and failure within drugs development processes. The present study compares shallow and deep machine-learning approaches for chemogenomics, and explores data augmentation techniques for deep learning algorithms in chemogenomics. Shallow machine-learning algorithms rely on expert-based chemical and protein descriptors, while recent developments in deep learning algorithms enable to learn abstract numerical representations of molecular graphs and protein sequences, in order to optimise the performance of the prediction task. We first propose a formulation of chemogenomics with deep learning, called the chemogenomic neural network (CN), as a feed-forward neural network taking as input the combination of molecule and protein representations learnt by molecular graph and protein sequence encoders. We show that, on large datasets, the deep learning CN model outperforms state-of-the-art shallow methods, and competes with deep methods with expert-based descriptors. However, on small datasets, shallow methods present better prediction performance than deep learning methods. Then, we evaluate data augmentation techniques, namely multi-view and transfer learning, to improve the prediction performance of the chemogenomic neural network. We conclude that a promising research direction is to integrate heterogeneous sources of data such as auxiliary tasks for which large datasets are available, or independently, multiple molecule and protein attribute views.
Collapse
Affiliation(s)
- Benoit Playe
- Center for Computational Biology, Mines ParisTech, PSL Research University, 60 Bd Saint-Michel, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75248, Paris, France
| | - Veronique Stoven
- Center for Computational Biology, Mines ParisTech, PSL Research University, 60 Bd Saint-Michel, 75006, Paris, France. .,Institut Curie, 75248, Paris, France. .,INSERM U900, 75248, Paris, France.
| |
Collapse
|
12
|
Russo S, De Azevedo WF. Advances in the Understanding of the Cannabinoid Receptor 1 – Focusing on the Inverse Agonists Interactions. Curr Med Chem 2019; 26:1908-1919. [DOI: 10.2174/0929867325666180417165247] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Revised: 02/21/2018] [Accepted: 04/03/2018] [Indexed: 12/31/2022]
Abstract
Background:
Cannabinoid Receptor 1 (CB1) is a membrane protein prevalent in
the central nervous system, whose crystallographic structure has recently been solved. Studies
will be needed to investigate CB1 complexes with its ligands and its role in the development
of new drugs.
Objective:
Our goal here is to review the studies on CB1, starting with general aspects and
focusing on the recent structural studies, with emphasis on the inverse agonists bound structures.
Methods:
We start with a literature review, and then we describe recent studies on CB 1 crystallographic
structure and docking simulations. We use this structural information to depict
protein-ligand interactions. We also describe the molecular docking method to obtain complex
structures of CB 1 with inverse agonists.
Results:
Analysis of the crystallographic structure and docking results revealed the residues
responsible for the specificity of the inverse agonists for CB 1. Most of the intermolecular interactions
involve hydrophobic residues, with the participation of the residues Phe 170 and
Leu 359 in all complex structures investigated in the present study. For the complexes with
otenabant and taranabant, we observed intermolecular hydrogen bonds involving residues His
178 (otenabant) and Thr 197 and Ser 383 (taranabant).
Conclusion:
Analysis of the structures involving inverse agonists and CB 1 revealed the pivotal
role played by residues Phe 170 and Leu 359 in their interactions and the strong intermolecular
hydrogen bonds highlighting the importance of the exploration of intermolecular interactions
in the development of novel inverse agonists.
Collapse
Affiliation(s)
- Silvana Russo
- Laboratory of Computational Systems Biology, School of Sciences, Pontifical Catholic University of Rio Grande do Sul (PUCRS), Av. Ipiranga, 6681, Porto Alegre-RS 90619-900, Brazil
| | - Walter Filgueira De Azevedo
- Laboratory of Computational Systems Biology, School of Sciences, Pontifical Catholic University of Rio Grande do Sul (PUCRS), Av. Ipiranga, 6681, Porto Alegre-RS 90619-900, Brazil
| |
Collapse
|
13
|
Lin A, Horvath D, Marcou G, Beck B, Varnek A. Multi-task generative topographic mapping in virtual screening. J Comput Aided Mol Des 2019; 33:331-343. [PMID: 30739238 DOI: 10.1007/s10822-019-00188-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 02/02/2019] [Indexed: 12/16/2022]
Abstract
The previously reported procedure to generate "universal" Generative Topographic Maps (GTMs) of the drug-like chemical space is in practice a multi-task learning process, in which both operational GTM parameters (example: map grid size) and hyperparameters (key example: the molecular descriptor space to be used) are being chosen by an evolutionary process in order to fit/select "universal" GTM manifolds. After selection (a one-time task aimed at optimizing the compromise in terms of neighborhood behavior compliance, over a large pool of various biological targets), for any further use the manifolds are ready to provide "fit-free" predictive models. Using any structure-activity set-irrespectively whether the associated target served at map fitting stage or not-the generation or "coloring" a property landscape enables predicting the property for any external molecule, with zero additional fitable parameters involved. While previous works have signaled the excellent behavior of such models in aggressive three-fold cross-validation assessments of their predictive power, the present work wished to explore their behavior in Virtual Screening (VS), here simulated on hand of external DUD ligand and decoy series that are fully disjoint from the ChEMBL-extracted landscape coloring sets. Beyond the rather robust results of the universal GTM manifolds in this challenge, it could be shown that the descriptor spaces selected by the evolutionary multi-task learner were intrinsically able to serve as an excellent support for many other VS procedures, starting from parameter-free similarity searching, to local (target-specific) GTM models, to parameter-rich, nonlinear Random Forest and Neural Network approaches.
Collapse
Affiliation(s)
- Arkadii Lin
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France.,Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorferstrasse 65, 88397, Biberach an der Riss, Germany
| | - Dragos Horvath
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France
| | - Bernd Beck
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorferstrasse 65, 88397, Biberach an der Riss, Germany
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France.
| |
Collapse
|
14
|
Playe B, Azencott CA, Stoven V. Efficient multi-task chemogenomics for drug specificity prediction. PLoS One 2018; 13:e0204999. [PMID: 30286165 PMCID: PMC6171913 DOI: 10.1371/journal.pone.0204999] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Accepted: 09/18/2018] [Indexed: 01/10/2023] Open
Abstract
Adverse drug reactions, also called side effects, range from mild to fatal clinical events and significantly affect the quality of care. Among other causes, side effects occur when drugs bind to proteins other than their intended target. As experimentally testing drug specificity against the entire proteome is out of reach, we investigate the application of chemogenomics approaches. We formulate the study of drug specificity as a problem of predicting interactions between drugs and proteins at the proteome scale. We build several benchmark datasets, and propose NN-MT, a multi-task Support Vector Machine (SVM) algorithm that is trained on a limited number of data points, in order to solve the computational issues or proteome-wide SVM for chemogenomics. We compare NN-MT to different state-of-the-art methods, and show that its prediction performances are similar or better, at an efficient calculation cost. Compared to its competitors, the proposed method is particularly efficient to predict (protein, ligand) interactions in the difficult double-orphan case, i.e. when no interactions are previously known for the protein nor for the ligand. The NN-MT algorithm appears to be a good default method providing state-of-the-art or better performances, in a wide range of prediction scenario that are considered in the present study: proteome-wide prediction, protein family prediction, test (protein, ligand) pairs dissimilar to pairs in the train set, and orphan cases.
Collapse
Affiliation(s)
- Benoit Playe
- Center for Computational Biology, Mines ParisTech, PSL Research University, Paris, France
- Institut Curie F-75248, Paris, France
- INSERM U900, F-75248, Paris, France
- * E-mail:
| | - Chloé-Agathe Azencott
- Center for Computational Biology, Mines ParisTech, PSL Research University, Paris, France
- Institut Curie F-75248, Paris, France
- INSERM U900, F-75248, Paris, France
| | - Véronique Stoven
- Center for Computational Biology, Mines ParisTech, PSL Research University, Paris, France
- Institut Curie F-75248, Paris, France
- INSERM U900, F-75248, Paris, France
| |
Collapse
|
15
|
Abstract
Most drugs produce their phenotypic effects by interacting with target proteins, and understanding the molecular features that underpin drug-target interactions is crucial when designing a novel drug. In this chapter, we introduce the protocols that have driven recent advances in sparse modeling methods for analyzing drug-target interaction networks within a chemogenomic framework. In this approach, the chemical structures of candidate drug compounds are correlated with the genomic sequences of the candidate target proteins. We demonstrate the use of sparse canonical correspondence analysis and sparsity-induced binary classifiers to extract the underlying molecular features that are most strongly involved in drug-target interactions. We focus on drug chemical substructures and protein domains. Workflows for applying these methods are presented, and an application is described in detail. We consider the characteristics of each method and suggest possible directions for future research.
Collapse
|
16
|
Cross JB. Methods for Virtual Screening of GPCR Targets: Approaches and Challenges. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2017; 1705:233-264. [PMID: 29188566 DOI: 10.1007/978-1-4939-7465-8_11] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Virtual screening (VS) has become an integral part of the drug discovery process and is a valuable tool for finding novel chemical starting points for GPCR targets. Ligand-based VS makes use of biochemical data for known, active compounds and has been applied successfully to many diverse GPCRs. Recent progress in GPCR X-ray crystallography has made it possible to incorporate detailed structural information into the VS process. This chapter outlines the latest VS techniques along with examples that highlight successful applications of these methods. Best practices for increasing the likelihood of VS success, as well as ongoing challenges, are also discussed.
Collapse
Affiliation(s)
- Jason B Cross
- University of Texas MD Anderson Cancer Center, Houston, TX, 77054, USA.
| |
Collapse
|
17
|
Lenselink EB, Ten Dijke N, Bongers B, Papadatos G, van Vlijmen HWT, Kowalczyk W, IJzerman AP, van Westen GJP. Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set. J Cheminform 2017; 9:45. [PMID: 29086168 PMCID: PMC5555960 DOI: 10.1186/s13321-017-0232-0] [Citation(s) in RCA: 173] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Accepted: 07/31/2017] [Indexed: 11/10/2022] Open
Abstract
The increase of publicly available bioactivity data in recent years has fueled and catalyzed research in chemogenomics, data mining, and modeling approaches. As a direct result, over the past few years a multitude of different methods have been reported and evaluated, such as target fishing, nearest neighbor similarity-based methods, and Quantitative Structure Activity Relationship (QSAR)-based protocols. However, such studies are typically conducted on different datasets, using different validation strategies, and different metrics. In this study, different methods were compared using one single standardized dataset obtained from ChEMBL, which is made available to the public, using standardized metrics (BEDROC and Matthews Correlation Coefficient). Specifically, the performance of Naïve Bayes, Random Forests, Support Vector Machines, Logistic Regression, and Deep Neural Networks was assessed using QSAR and proteochemometric (PCM) methods. All methods were validated using both a random split validation and a temporal validation, with the latter being a more realistic benchmark of expected prospective execution. Deep Neural Networks are the top performing classifiers, highlighting the added value of Deep Neural Networks over other more conventional methods. Moreover, the best method ('DNN_PCM') performed significantly better at almost one standard deviation higher than the mean performance. Furthermore, Multi-task and PCM implementations were shown to improve performance over single task Deep Neural Networks. Conversely, target prediction performed almost two standard deviations under the mean performance. Random Forests, Support Vector Machines, and Logistic Regression performed around mean performance. Finally, using an ensemble of DNNs, alongside additional tuning, enhanced the relative performance by another 27% (compared with unoptimized 'DNN_PCM'). Here, a standardized set to test and evaluate different machine learning algorithms in the context of multi-task learning is offered by providing the data and the protocols. Graphical Abstract .
Collapse
Affiliation(s)
- Eelke B Lenselink
- Division of Medicinal Chemistry, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - Niels Ten Dijke
- Leiden Institute of Advanced Computer Science, Leiden University, P.O. Box 9512, 2300 RA, Leiden, The Netherlands
| | - Brandon Bongers
- Division of Medicinal Chemistry, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - George Papadatos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK.,GlaxoSmithKline, Medicines Research Centre, Gunnels Wood Road, Stevenage, Herts, SG1 2NY, UK
| | - Herman W T van Vlijmen
- Division of Medicinal Chemistry, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - Wojtek Kowalczyk
- Leiden Institute of Advanced Computer Science, Leiden University, P.O. Box 9512, 2300 RA, Leiden, The Netherlands
| | - Adriaan P IJzerman
- Division of Medicinal Chemistry, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - Gerard J P van Westen
- Division of Medicinal Chemistry, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands.
| |
Collapse
|
18
|
Rasti B, Namazi M, Karimi-Jafari MH, Ghasemi JB. Proteochemometric Modeling of the Interaction Space of Carbonic Anhydrase and its Inhibitors: An Assessment of Structure-based and Sequence-based Descriptors. Mol Inform 2016; 36. [PMID: 27860295 DOI: 10.1002/minf.201600102] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2015] [Accepted: 10/26/2016] [Indexed: 11/08/2022]
Abstract
Due to its physiological and clinical roles, carbonic anhydrase (CA) is one of the most interesting case studies. There are different classes of CAinhibitors including sulfonamides, polyamines, coumarins and dithiocarbamates (DTCs). However, many of them hardly act as a selective inhibitor against a specific isoform. Therefore, finding highly selective inhibitors for different isoforms of CA is still an ongoing project. Proteochemometrics modeling (PCM) is able to model the bioactivity of multiple compounds against different isoforms of a protein. Therefore, it would be extremely applicable when investigating the selectivity of different ligands towards different receptors. Given the facts, we applied PCM to investigate the interaction space and structural properties that lead to the selective inhibition of CA isoforms by some dithiocarbamates. Our models have provided interesting structural information that can be considered to design compounds capable of inhibiting different isoforms of CA in an improved selective manner. Validity and predictivity of the models were confirmed by both internal and external validation methods; while Y-scrambling approach was applied to assess the robustness of the models. To prove the reliability and the applicability of our findings, we showed how ligands-receptors selectivity can be affected by removing any of these critical findings from the modeling process.
Collapse
Affiliation(s)
- Behnam Rasti
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Mohsen Namazi
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - M H Karimi-Jafari
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Jahan B Ghasemi
- Department of Analytical Chemistry, School of Chemistry, College of Science, University of Tehran, Tehran, Iran
| |
Collapse
|
19
|
Exact and efficient top-K inference for multi-target prediction by querying separable linear relational models. Data Min Knowl Discov 2016. [DOI: 10.1007/s10618-016-0456-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
20
|
Mervin LH, Afzal AM, Drakakis G, Lewis R, Engkvist O, Bender A. Target prediction utilising negative bioactivity data covering large chemical space. J Cheminform 2015; 7:51. [PMID: 26500705 PMCID: PMC4619454 DOI: 10.1186/s13321-015-0098-y] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2015] [Accepted: 09/29/2015] [Indexed: 02/25/2023] Open
Abstract
BACKGROUND In silico analyses are increasingly being used to support mode-of-action investigations; however many such approaches do not utilise the large amounts of inactive data held in chemogenomic repositories. The objective of this work is concerned with the integration of such bioactivity data in the target prediction of orphan compounds to produce the probability of activity and inactivity for a range of targets. To this end, a novel human bioactivity data set was constructed through the assimilation of over 195 million bioactivity data points deposited in the ChEMBL and PubChem repositories, and the subsequent application of a sphere-exclusion selection algorithm to oversample presumed inactive compounds. RESULTS A Bernoulli Naïve Bayes algorithm was trained using the data and evaluated using fivefold cross-validation, achieving a mean recall and precision of 67.7 and 63.8 % for active compounds and 99.6 and 99.7 % for inactive compounds, respectively. We show the performances of the models are considerably influenced by the underlying intraclass training similarity, the size of a given class of compounds, and the degree of additional oversampling. The method was also validated using compounds extracted from WOMBAT producing average precision-recall AUC and BEDROC scores of 0.56 and 0.85, respectively. Inactive data points used for this test are based on presumed inactivity, producing an approximated indication of the true extrapolative ability of the models. A distance-based applicability domain analysis was also conducted; indicating an average Tanimoto Coefficient distance of 0.3 or greater between a test and training set can be used to give a global measure of confidence in model predictions. A final comparison to a method trained solely on active data from ChEMBL performed with precision-recall AUC and BEDROC scores of 0.45 and 0.76. CONCLUSIONS The inclusion of inactive data for model training produces models with superior AUC and improved early recognition capabilities, although the results from internal and external validation of the models show differing performance between the breadth of models. The realised target prediction protocol is available at https://github.com/lhm30/PIDGIN.Graphical abstractThe inclusion of large scale negative training data for in silico target prediction improves the precision and recall AUC and BEDROC scores for target models.
Collapse
Affiliation(s)
- Lewis H. Mervin
- />Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| | - Avid M. Afzal
- />Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| | - Georgios Drakakis
- />Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| | - Richard Lewis
- />Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| | - Ola Engkvist
- />Discovery Sciences, Chemistry Innovation Centre, AstraZeneca R&D, 43183 Mölndal, Sweden
| | - Andreas Bender
- />Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| |
Collapse
|
21
|
Cortés-Ciriano I, van Westen GJP, Bouvier G, Nilges M, Overington JP, Bender A, Malliavin TE. Improved large-scale prediction of growth inhibition patterns using the NCI60 cancer cell line panel. Bioinformatics 2015; 32:85-95. [PMID: 26351271 PMCID: PMC4681992 DOI: 10.1093/bioinformatics/btv529] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 08/26/2015] [Indexed: 01/28/2023] Open
Abstract
MOTIVATION Recent large-scale omics initiatives have catalogued the somatic alterations of cancer cell line panels along with their pharmacological response to hundreds of compounds. In this study, we have explored these data to advance computational approaches that enable more effective and targeted use of current and future anticancer therapeutics. RESULTS We modelled the 50% growth inhibition bioassay end-point (GI50) of 17,142 compounds screened against 59 cancer cell lines from the NCI60 panel (941,831 data-points, matrix 93.08% complete) by integrating the chemical and biological (cell line) information. We determine that the protein, gene transcript and miRNA abundance provide the highest predictive signal when modelling the GI50 endpoint, which significantly outperformed the DNA copy-number variation or exome sequencing data (Tukey's Honestly Significant Difference, P <0.05). We demonstrate that, within the limits of the data, our approach exhibits the ability to both interpolate and extrapolate compound bioactivities to new cell lines and tissues and, although to a lesser extent, to dissimilar compounds. Moreover, our approach outperforms previous models generated on the GDSC dataset. Finally, we determine that in the cases investigated in more detail, the predicted drug-pathway associations and growth inhibition patterns are mostly consistent with the experimental data, which also suggests the possibility of identifying genomic markers of drug sensitivity for novel compounds on novel cell lines. CONTACT terez@pasteur.fr; ab454@ac.cam.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR 3825, Structural Biology and Chemistry Department, 75 724 Paris, France
| | - Gerard J P van Westen
- Medicinal Chemistry, Leiden Academic Centre for Drug Research, Einsteinweg 55, 2333CC, Leiden
| | - Guillaume Bouvier
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR 3825, Structural Biology and Chemistry Department, 75 724 Paris, France
| | - Michael Nilges
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR 3825, Structural Biology and Chemistry Department, 75 724 Paris, France
| | - John P Overington
- European Molecular Biology Laboratory European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Hinxton, Cambridge, UK and
| | - Andreas Bender
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, CB2 1EW Cambridge, UK
| | - Thérèse E Malliavin
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR 3825, Structural Biology and Chemistry Department, 75 724 Paris, France
| |
Collapse
|
22
|
Ain QU, Méndez-Lucio O, Ciriano IC, Malliavin T, van Westen GJP, Bender A. Modelling ligand selectivity of serine proteases using integrative proteochemometric approaches improves model performance and allows the multi-target dependent interpretation of features. Integr Biol (Camb) 2015; 6:1023-33. [PMID: 25255469 DOI: 10.1039/c4ib00175c] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Serine proteases, implicated in important physiological functions, have a high intra-family similarity, which leads to unwanted off-target effects of inhibitors with insufficient selectivity. However, the availability of sequence and structure data has now made it possible to develop approaches to design pharmacological agents that can discriminate successfully between their related binding sites. In this study, we have quantified the relationship between 12,625 distinct protease inhibitors and their bioactivity against 67 targets of the serine protease family (20,213 data points) in an integrative manner, using proteochemometric modelling (PCM). The benchmarking of 21 different target descriptors motivated the usage of specific binding pocket amino acid descriptors, which helped in the identification of active site residues and selective compound chemotypes affecting compound affinity and selectivity. PCM models performed better than alternative approaches (models trained using exclusively compound descriptors on all available data, QSAR) employed for comparison with R(2)/RMSE values of 0.64 ± 0.23/0.66 ± 0.20 vs. 0.35 ± 0.27/1.05 ± 0.27 log units, respectively. Moreover, the interpretation of the PCM model singled out various chemical substructures responsible for bioactivity and selectivity towards particular proteases (thrombin, trypsin and coagulation factor 10) in agreement with the literature. For instance, absence of a tertiary sulphonamide was identified to be responsible for decreased selective activity (by on average 0.27 ± 0.65 pChEMBL units) on FA10. Among the binding pocket residues, the amino acids (arginine, leucine and tyrosine) at positions 35, 39, 60, 93, 140 and 207 were observed as key contributing residues for selective affinity on these three targets.
Collapse
Affiliation(s)
- Qurrat U Ain
- Centre for Molecular Informatics, Department of Chemistry, Lensfield Road, CB2 1EW, University of Cambridge, UK.
| | | | | | | | | | | |
Collapse
|
23
|
Giguère S, Laviolette F, Marchand M, Tremblay D, Moineau S, Liang X, Biron É, Corbeil J. Machine learning assisted design of highly active peptides for drug discovery. PLoS Comput Biol 2015; 11:e1004074. [PMID: 25849257 PMCID: PMC4388847 DOI: 10.1371/journal.pcbi.1004074] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 12/05/2014] [Indexed: 01/15/2023] Open
Abstract
The discovery of peptides possessing high biological activity is very challenging due to the enormous diversity for which only a minority have the desired properties. To lower cost and reduce the time to obtain promising peptides, machine learning approaches can greatly assist in the process and even partly replace expensive laboratory experiments by learning a predictor with existing data or with a smaller amount of data generation. Unfortunately, once the model is learned, selecting peptides having the greatest predicted bioactivity often requires a prohibitive amount of computational time. For this combinatorial problem, heuristics and stochastic optimization methods are not guaranteed to find adequate solutions. We focused on recent advances in kernel methods and machine learning to learn a predictive model with proven success. For this type of model, we propose an efficient algorithm based on graph theory, that is guaranteed to find the peptides for which the model predicts maximal bioactivity. We also present a second algorithm capable of sorting the peptides of maximal bioactivity. Extensive analyses demonstrate how these algorithms can be part of an iterative combinatorial chemistry procedure to speed up the discovery and the validation of peptide leads. Moreover, the proposed approach does not require the use of known ligands for the target protein since it can leverage recent multi-target machine learning predictors where ligands for similar targets can serve as initial training data. Finally, we validated the proposed approach in vitro with the discovery of new cationic antimicrobial peptides. Source code freely available at http://graal.ift.ulaval.ca/peptide-design/. Part of the complexity of drug discovery is the sheer chemical diversity to explore combined to all requirements a compound must meet to become a commercial drug. Hence, it makes sense to automate this chemical exploration endeavor in a wise, informed, and efficient fashion. Here, we focused on peptides as they have properties that make them excellent drug starting points. Machine learning techniques may replace expensive in-vitro laboratory experiments by learning an accurate model of it. However, computational models also suffer from the combinatorial explosion due to the enormous chemical diversity. Indeed, applying the model to every peptides would take an astronomical amount of computer time. Therefore, given a model, is it possible to determine, using reasonable computational time, the peptide that has the best properties and chance for success? This exact question is what motivated our work. We focused on recent advances in kernel methods and machine learning to learn a model that already had excellent results. We demonstrate that this class of model has mathematical properties that makes it possible to rapidly identify and sort the best peptides. Finally, in-vitro and in-silico results are provided to support and validate this theoretical discovery.
Collapse
Affiliation(s)
- Sébastien Giguère
- Department of Computer Science and Software Engineering, Université Laval, Québec, Canada
- * E-mail:
| | - François Laviolette
- Department of Computer Science and Software Engineering, Université Laval, Québec, Canada
| | - Mario Marchand
- Department of Computer Science and Software Engineering, Université Laval, Québec, Canada
| | - Denise Tremblay
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, Canada
| | - Sylvain Moineau
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, Canada
| | - Xinxia Liang
- Faculty of Pharmacy, Université Laval, Québec, Canada
| | - Éric Biron
- Faculty of Pharmacy, Université Laval, Québec, Canada
| | - Jacques Corbeil
- Department of Molecular Medicine, Université Laval, Québec, Canada
| |
Collapse
|
24
|
Marine natural products as breast cancer resistance protein inhibitors. Mar Drugs 2015; 13:2010-29. [PMID: 25854646 PMCID: PMC4413197 DOI: 10.3390/md13042010] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Revised: 03/19/2015] [Accepted: 03/24/2015] [Indexed: 02/08/2023] Open
Abstract
Breast cancer resistance protein (BCRP) is a protein belonging to the ATP-binding cassette (ABC) transporter superfamily that has clinical relevance due to its multi-drug resistance properties in cancer. BCRP can be associated with clinical cancer drug resistance, in particular acute myelogenous or acute lymphocytic leukemias. The overexpression of BCRP contributes to the resistance of several chemotherapeutic drugs, such as topotecan, methotrexate, mitoxantrone, doxorubicin and daunorubicin. The Food and Drugs Administration has already recognized that BCRP is clinically one of the most important drug transporters, mainly because it leads to a reduction of clinical efficacy of various anticancer drugs through its ATP-dependent drug efflux pump function as well as its apparent participation in drug resistance. This review article aims to summarize the different research findings on marine natural products with BCRP inhibiting activity. In this sense, the potential modulation of physiological targets of BCRP by natural or synthetic compounds offers a great possibility for the discovery of new drugs and valuable research tools to recognize the function of the complex ABC-transporters.
Collapse
|
25
|
Cortés-Ciriano I, Ain QU, Subramanian V, Lenselink EB, Méndez-Lucio O, IJzerman AP, Wohlfahrt G, Prusis P, Malliavin TE, van Westen GJP, Bender A. Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. MEDCHEMCOMM 2015. [DOI: 10.1039/c4md00216d] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Proteochemometric (PCM) modelling is a computational method to model the bioactivity of multiple ligands against multiple related protein targets simultaneously.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Unité de Bioinformatique Structurale
- Institut Pasteur and CNRS UMR 3825
- Structural Biology and Chemistry Department
- 75 724 Paris
- France
| | - Qurrat Ul Ain
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| | | | - Eelke B. Lenselink
- Division of Medicinal Chemistry
- Leiden Academic Centre for Drug Research
- Leiden
- The Netherlands
| | - Oscar Méndez-Lucio
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| | - Adriaan P. IJzerman
- Division of Medicinal Chemistry
- Leiden Academic Centre for Drug Research
- Leiden
- The Netherlands
| | - Gerd Wohlfahrt
- Computer-Aided Drug Design
- Orion Pharma
- FIN-02101 Espoo
- Finland
| | - Peteris Prusis
- Computer-Aided Drug Design
- Orion Pharma
- FIN-02101 Espoo
- Finland
| | - Thérèse E. Malliavin
- Unité de Bioinformatique Structurale
- Institut Pasteur and CNRS UMR 3825
- Structural Biology and Chemistry Department
- 75 724 Paris
- France
| | - Gerard J. P. van Westen
- European Molecular Biology Laboratory
- European Bioinformatics Institute
- Wellcome Trust Genome Campus
- Hinxton
- UK
| | - Andreas Bender
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| |
Collapse
|
26
|
Lavecchia A. Machine-learning approaches in drug discovery: methods and applications. Drug Discov Today 2014; 20:318-31. [PMID: 25448759 DOI: 10.1016/j.drudis.2014.10.012] [Citation(s) in RCA: 353] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Revised: 09/27/2014] [Accepted: 10/24/2014] [Indexed: 12/19/2022]
Abstract
During the past decade, virtual screening (VS) has evolved from traditional similarity searching, which utilizes single reference compounds, into an advanced application domain for data mining and machine-learning approaches, which require large and representative training-set compounds to learn robust decision rules. The explosive growth in the amount of public domain-available chemical and biological data has generated huge effort to design, analyze, and apply novel learning methodologies. Here, I focus on machine-learning techniques within the context of ligand-based VS (LBVS). In addition, I analyze several relevant VS studies from recent publications, providing a detailed view of the current state-of-the-art in this field and highlighting not only the problematic issues, but also the successes and opportunities for further advances.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Department of Pharmacy, Drug Discovery Laboratory, University of Napoli 'Federico II', via D. Montesano 49, I-80131 Napoli, Italy.
| |
Collapse
|
27
|
Korkmaz S, Zararsiz G, Goksuluk D. Drug/nondrug classification using Support Vector Machines with various feature selection strategies. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 117:51-60. [PMID: 25224081 DOI: 10.1016/j.cmpb.2014.08.009] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 08/15/2014] [Accepted: 08/27/2014] [Indexed: 06/03/2023]
Abstract
In conjunction with the advance in computer technology, virtual screening of small molecules has been started to use in drug discovery. Since there are thousands of compounds in early-phase of drug discovery, a fast classification method, which can distinguish between active and inactive molecules, can be used for screening large compound collections. In this study, we used Support Vector Machines (SVM) for this type of classification task. SVM is a powerful classification tool that is becoming increasingly popular in various machine-learning applications. The data sets consist of 631 compounds for training set and 216 compounds for a separate test set. In data pre-processing step, the Pearson's correlation coefficient used as a filter to eliminate redundant features. After application of the correlation filter, a single SVM has been applied to this reduced data set. Moreover, we have investigated the performance of SVM with different feature selection strategies, including SVM-Recursive Feature Elimination, Wrapper Method and Subset Selection. All feature selection methods generally represent better performance than a single SVM while Subset Selection outperforms other feature selection methods. We have tested SVM as a classification tool in a real-life drug discovery problem and our results revealed that it could be a useful method for classification task in early-phase of drug discovery.
Collapse
Affiliation(s)
- Selcuk Korkmaz
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey.
| | - Gokmen Zararsiz
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey
| | - Dincer Goksuluk
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey
| |
Collapse
|
28
|
Cao DS, Zhang LX, Tan GS, Xiang Z, Zeng WB, Xu QS, Chen AF. Computational Prediction of DrugTarget Interactions Using Chemical, Biological, and Network Features. Mol Inform 2014; 33:669-81. [PMID: 27485302 DOI: 10.1002/minf.201400009] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2014] [Accepted: 04/22/2014] [Indexed: 02/02/2023]
Abstract
Drugtarget interactions (DTIs) are central to current drug discovery processes. Efforts have been devoted to the development of methodology for predicting DTIs and drugtarget interaction networks. Most existing methods mainly focus on the application of information about drug or protein structure features. In the present work, we proposed a computational method for DTI prediction by combining the information from chemical, biological and network properties. The method was developed based on a learning algorithm-random forest (RF) combined with integrated features for predicting DTIs. Four classes of drugtarget interaction networks in humans involving enzymes, ion channels, G-protein-coupled receptors (GPCRs) and nuclear receptors, are independently used for establishing predictive models. The RF models gave prediction accuracy of 93.52 %, 94.84 %, 89.68 % and 84.72 % for four pharmaceutically useful datasets, respectively. The prediction ability of our approach is comparative to or even better than that of other DTI prediction methods. These comparative results demonstrated the relevance of the network topology as source of information for predicting DTIs. Further analysis confirmed that among our top ranked predictions of DTIs, several DTIs are supported by databases, while the others represent novel potential DTIs. We believe that our proposed approach can help to limit the search space of DTIs and provide a new way towards repositioning old drugs and identifying targets.
Collapse
Affiliation(s)
- Dong-Sheng Cao
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013, P.R. China.
| | - Liu-Xia Zhang
- The 163rdHospital of The Chinese People's Liberation Army, Changsha 410003, P.R. China
| | - Gui-Shan Tan
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013, P.R. China
| | - Zheng Xiang
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou 325035, P.R. China
| | - Wen-Bin Zeng
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013, P.R. China
| | - Qing-Song Xu
- School of Mathematics and Statistics, Central South University, Changsha 410083, P.R. China
| | - Alex F Chen
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013, P.R. China.
| |
Collapse
|
29
|
Sugaya N. Ligand efficiency-based support vector regression models for predicting bioactivities of ligands to drug target proteins. J Chem Inf Model 2014; 54:2751-63. [PMID: 25220713 DOI: 10.1021/ci5003262] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
The concept of ligand efficiency (LE) indices is widely accepted throughout the drug design community and is frequently used in a retrospective manner in the process of drug development. For example, LE indices are used to investigate LE optimization processes of already-approved drugs and to re-evaluate hit compounds obtained from structure-based virtual screening methods and/or high-throughput experimental assays. However, LE indices could also be applied in a prospective manner to explore drug candidates. Here, we describe the construction of machine learning-based regression models in which LE indices are adopted as an end point and show that LE-based regression models can outperform regression models based on pIC50 values. In addition to pIC50 values traditionally used in machine learning studies based on chemogenomics data, three representative LE indices (ligand lipophilicity efficiency (LLE), binding efficiency index (BEI), and surface efficiency index (SEI)) were adopted, then used to create four types of training data. We constructed regression models by applying a support vector regression (SVR) method to the training data. In cross-validation tests of the SVR models, the LE-based SVR models showed higher correlations between the observed and predicted values than the pIC50-based models. Application tests to new data displayed that, generally, the predictive performance of SVR models follows the order SEI > BEI > LLE > pIC50. Close examination of the distributions of the activity values (pIC50, LLE, BEI, and SEI) in the training and validation data implied that the performance order of the SVR models may be ascribed to the much higher diversity of the LE-based training and validation data. In the application tests, the LE-based SVR models can offer better predictive performance of compound-protein pairs with a wider range of ligand potencies than the pIC50-based models. This finding strongly suggests that LE-based SVR models are better than pIC50-based models at predicting bioactivities of compounds that could exhibit a much higher (or lower) potency.
Collapse
Affiliation(s)
- Nobuyoshi Sugaya
- Drug Discovery Department, Research & Development Division, PharmaDesign, Inc. , Hatchobori 2-19-8, Chuo-ku, Tokyo 104-0032, Japan
| |
Collapse
|
30
|
Computational chemogenomics: is it more than inductive transfer? J Comput Aided Mol Des 2014; 28:597-618. [PMID: 24771144 DOI: 10.1007/s10822-014-9743-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Accepted: 04/11/2014] [Indexed: 10/25/2022]
Abstract
High-throughput assays challenge us to extract knowledge from multi-ligand, multi-target activity data. In QSAR, weights are statically fitted to each ligand descriptor with respect to a single endpoint or target. However, computational chemogenomics (CG) has demonstrated benefits of learning from entire grids of data at once, rather than building target-specific QSARs. A possible reason for this is the emergence of inductive knowledge transfer (IT) between targets, providing statistical robustness to the model, with no assumption about the structure of the targets. Relevant protein descriptors in CG should allow one to learn how to dynamically adjust ligand attribute weights with respect to protein structure. Hence, models built through explicit learning (EL) by including protein information, while benefitting from IT enhancement, should provide additional predictive capability, notably for protein deorphanization. This interplay between IT and EL in CG modeling is not sufficiently studied. While IT is likely to occur irrespective of the injected target information, it is not clear whether and when boosting due to EL may occur. EL is only possible if protein description is appropriate to the target set under investigation. The key issue here is the search for evidence of genuine EL exceeding expectations based on pure IT. We explore the problem in the context of Support Vector Regression, using more than 9,400 pKi values of 31 GPCRs, where compound-protein interactions are represented by the concatenation of vectorial descriptions of compounds and proteins. This provides a unified framework to generate both IT-enhanced and potentially EL-enabled models, where the difference is toggled by supplied protein information. For EL-enabled models, protein information includes genuine protein descriptors such as typical sequence-based terms, but also the experimentally determined affinity cross-correlation fingerprints. These latter benchmark the expected behavior of a quasi-ideal descriptor capturing the actual functional protein-protein relatedness, and therefore thought to be the most likely to enable EL. EL- and IT-based methods were benchmarked alongside classical QSAR, with respect to cross-validation and deorphanization challenges. A rational method for projecting benchmarked methodologies into a strategy space is given, in the aims that the projection will provide directions for the types of molecule designs possible using a given methodology. While EL-enabled strategies outperform classical QSARs and favorably compare to similar published results, they are, in all respects evaluated herein, not strongly distinguished from IT-enhanced models. Moreover, EL-enabled strategies failed to prove superior in deorphanization challenges. Therefore, this paper raises caution that, contrary to common belief and intuitive expectation, the benefits of chemogenomics models over classical QSAR are quite possibly due less to the injection of protein-related information, and rather impacted more by the effect of inductive transfer, due to simultaneous learning from all of the modeled endpoints. These results show that the field of protein descriptor research needs further improvements to truly realize the expected benefit of EL.
Collapse
|
31
|
van Laarhoven T, Marchiori E. Biases of Drug–Target Interaction Network Data. PATTERN RECOGNITION IN BIOINFORMATICS 2014. [DOI: 10.1007/978-3-319-09192-1_3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
32
|
|
33
|
Brown JB, Niijima S, Okuno Y. CompoundProtein Interaction Prediction Within Chemogenomics: Theoretical Concepts, Practical Usage, and Future Directions. Mol Inform 2013; 32:906-21. [DOI: 10.1002/minf.201300101] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Accepted: 08/06/2013] [Indexed: 11/08/2022]
|
34
|
|
35
|
The complexity of G-protein coupled receptor-ligand interactions. Sci China Chem 2013. [DOI: 10.1007/s11426-013-4911-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
36
|
Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets. J Cheminform 2013; 5:42. [PMID: 24059743 PMCID: PMC4015169 DOI: 10.1186/1758-2946-5-42] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 09/18/2013] [Indexed: 11/10/2022] Open
Abstract
Background While a large body of work exists on comparing and benchmarking descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 amino acid descriptor sets have been benchmarked with respect to their ability of establishing bioactivity models. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI, BLOSUM, a novel protein descriptor set (termed ProtFP (4 variants)), and in addition we created and benchmarked three pairs of descriptor combinations. Prediction performance was evaluated in seven structure-activity benchmarks which comprise Angiotensin Converting Enzyme (ACE) dipeptidic inhibitor data, and three proteochemometric data sets, namely (1) GPCR ligands modeled against a GPCR panel, (2) enzyme inhibitors (NNRTIs) with associated bioactivities against a set of HIV enzyme mutants, and (3) enzyme inhibitors (PIs) with associated bioactivities on a large set of HIV enzyme mutants. Results The amino acid descriptor sets compared here show similar performance (<0.1 log units RMSE difference and <0.1 difference in MCC), while errors for individual proteins were in some cases found to be larger than those resulting from descriptor set differences ( > 0.3 log units RMSE difference and >0.7 difference in MCC). Combining different descriptor sets generally leads to better modeling performance than utilizing individual sets. The best performers were Z-scales (3) combined with ProtFP (Feature), or Z-Scales (3) combined with an average Z-Scale value for each target, while ProtFP (PCA8), ST-Scales, and ProtFP (Feature) rank last. Conclusions While amino acid descriptor sets capture different aspects of amino acids their ability to be used for bioactivity modeling is still – on average – surprisingly similar. Still, combining sets describing complementary information consistently leads to small but consistent improvement in modeling performance (average MCC 0.01 better, average RMSE 0.01 log units lower). Finally, performance differences exist between the targets compared thereby underlining that choosing an appropriate descriptor set is of fundamental for bioactivity modeling, both from the ligand- as well as the protein side.
Collapse
|
37
|
Sugaya N. Training based on ligand efficiency improves prediction of bioactivities of ligands and drug target proteins in a machine learning approach. J Chem Inf Model 2013; 53:2525-37. [PMID: 24020509 DOI: 10.1021/ci400240u] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Machine learning methods based on ligand-protein interaction data in bioactivity databases are one of the current strategies for efficiently finding novel lead compounds as the first step in the drug discovery process. Although previous machine learning studies have succeeded in predicting novel ligand-protein interactions with high performance, all of the previous studies to date have been heavily dependent on the simple use of raw bioactivity data of ligand potencies measured by IC50, EC50, K(i), and K(d) deposited in databases. ChEMBL provides us with a unique opportunity to investigate whether a machine-learning-based classifier created by reflecting ligand efficiency other than the IC50, EC50, K(i), and Kd values can also offer high predictive performance. Here we report that classifiers created from training data based on ligand efficiency show higher performance than those from data based on IC50 or K(i) values. Utilizing GPCRSARfari and KinaseSARfari databases in ChEMBL, we created IC50- or K(i)-based training data and binding efficiency index (BEI) based training data then constructed classifiers using support vector machines (SVMs). The SVM classifiers from the BEI-based training data showed slightly higher area under curve (AUC), accuracy, sensitivity, and specificity in the cross-validation tests. Application of the classifiers to the validation data demonstrated that the AUCs and specificities of the BEI-based classifiers dramatically increased in comparison with the IC50- or K(i)-based classifiers. The improvement of the predictive power by the BEI-based classifiers can be attributed to (i) the more separated distributions of positives and negatives, (ii) the higher diversity of negatives in the BEI-based training data in a feature space of SVMs, and (iii) a more balanced number of positives and negatives in the BEI-based training data. These results strongly suggest that training data based on ligand efficiency as well as data based on classical IC50, EC50, K(d), and K(i) values are important when creating a classifier using a machine learning approach based on bioactivity data.
Collapse
Affiliation(s)
- Nobuyoshi Sugaya
- Drug Discovery Department, Research & Development Division, PharmaDesign, Inc. , Hatchobori 2-19-8, Chuo-ku, Tokyo, 104-0032, Japan
| |
Collapse
|
38
|
van Westen GJ, Swier RF, Wegner JK, Ijzerman AP, van Vlijmen HW, Bender A. Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets. J Cheminform 2013; 5:41. [PMID: 24059694 PMCID: PMC3848949 DOI: 10.1186/1758-2946-5-41] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 09/18/2013] [Indexed: 11/10/2022] Open
Abstract
Background While a large body of work exists on comparing and benchmarking of descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 different protein descriptor sets have been compared with respect to their behavior in perceiving similarities between amino acids. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI and BLOSUM, and a novel protein descriptor set termed ProtFP (4 variants). We investigate to which extent descriptor sets show collinear as well as orthogonal behavior via principal component analysis (PCA). Results In describing amino acid similarities, MSWHIM, T-scales and ST-scales show related behavior, as do the VHSE, FASGAI, and ProtFP (PCA3) descriptor sets. Conversely, the ProtFP (PCA5), ProtFP (PCA8), Z-Scales (Binned), and BLOSUM descriptor sets show behavior that is distinct from one another as well as both of the clusters above. Generally, the use of more principal components (>3 per amino acid, per descriptor) leads to a significant differences in the way amino acids are described, despite that the later principal components capture less variation per component of the original input data. Conclusion In this work a comparison is provided of how similar (and differently) currently available amino acids descriptor sets behave when converting structure to property space. The results obtained enable molecular modelers to select suitable amino acid descriptor sets for structure-activity analyses, e.g. those showing complementary behavior.
Collapse
Affiliation(s)
- Gerard Jp van Westen
- Division of Medicinal Chemistry, Leiden / Amsterdam Center for Drug Research, Einsteinweg 55, Leiden 2333, CC, The Netherlands.
| | | | | | | | | | | |
Collapse
|
39
|
Cao DS, Liang YZ, Deng Z, Hu QN, He M, Xu QS, Zhou GH, Zhang LX, Deng ZX, Liu S. Genome-scale screening of drug-target associations relevant to Ki using a chemogenomics approach. PLoS One 2013; 8:e57680. [PMID: 23577055 PMCID: PMC3618265 DOI: 10.1371/journal.pone.0057680] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2012] [Accepted: 01/27/2013] [Indexed: 11/18/2022] Open
Abstract
The identification of interactions between drugs and target proteins plays a key role in genomic drug discovery. In the present study, the quantitative binding affinities of drug-target pairs are differentiated as a measurement to define whether a drug interacts with a protein or not, and then a chemogenomics framework using an unbiased set of general integrated features and random forest (RF) is employed to construct a predictive model which can accurately classify drug-target pairs. The predictability of the model is further investigated and validated by several independent validation sets. The built model is used to predict drug-target associations, some of which were confirmed by comparing experimental data from public biological resources. A drug-target interaction network with high confidence drug-target pairs was also reconstructed. This network provides further insight for the action of drugs and targets. Finally, a web-based server called PreDPI-Ki was developed to predict drug-target interactions for drug discovery. In addition to providing a high-confidence list of drug-target associations for subsequent experimental investigation guidance, these results also contribute to the understanding of drug-target interactions. We can also see that quantitative information of drug-target associations could greatly promote the development of more accurate models. The PreDPI-Ki server is freely available via: http://sdd.whu.edu.cn/dpiki.
Collapse
Affiliation(s)
- Dong-Sheng Cao
- Research Center of Modernization of Traditional Chinese Medicines, Central South University, Changsha, P. R. China
| | - Yi-Zeng Liang
- Research Center of Modernization of Traditional Chinese Medicines, Central South University, Changsha, P. R. China
- * E-mail: (YZL); (QNH)
| | - Zhe Deng
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery (Wuhan University), Ministry of Education, and Wuhan University School of Pharmaceutical Sciences, Wuhan, P. R. China
| | - Qian-Nan Hu
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery (Wuhan University), Ministry of Education, and Wuhan University School of Pharmaceutical Sciences, Wuhan, P. R. China
- * E-mail: (YZL); (QNH)
| | - Min He
- Research Center of Modernization of Traditional Chinese Medicines, Central South University, Changsha, P. R. China
| | - Qing-Song Xu
- School of Mathematics and Statistics, Central South University, Changsha, P. R. China
| | - Guang-Hua Zhou
- The 163rd Hospital of The Chinese People's Liberation Army, Changsha, P. R. China
| | - Liu-Xia Zhang
- The 163rd Hospital of The Chinese People's Liberation Army, Changsha, P. R. China
| | - Zi-xin Deng
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery (Wuhan University), Ministry of Education, and Wuhan University School of Pharmaceutical Sciences, Wuhan, P. R. China
| | - Shao Liu
- Xiangya Hospital, Central South University, Changsha, P. R. China
| |
Collapse
|
40
|
Learning a peptide-protein binding affinity predictor with kernel ridge regression. BMC Bioinformatics 2013; 14:82. [PMID: 23497081 PMCID: PMC3651388 DOI: 10.1186/1471-2105-14-82] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2012] [Accepted: 02/21/2013] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND The cellular function of a vast majority of proteins is performed through physical interactions with other biomolecules, which, most of the time, are other proteins. Peptides represent templates of choice for mimicking a secondary structure in order to modulate protein-protein interaction. They are thus an interesting class of therapeutics since they also display strong activity, high selectivity, low toxicity and few drug-drug interactions. Furthermore, predicting peptides that would bind to a specific MHC alleles would be of tremendous benefit to improve vaccine based therapy and possibly generate antibodies with greater affinity. Modern computational methods have the potential to accelerate and lower the cost of drug and vaccine discovery by selecting potential compounds for testing in silico prior to biological validation. RESULTS We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalizes eight kernels, comprised of the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it's approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of predicting the binding affinity of any peptide to any protein with reasonable accuracy. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. CONCLUSION On all benchmarks, our method significantly (p-value ≤ 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. Moreover, generating reliable peptide-protein binding affinities will also improve system biology modelling of interaction pathways. Lastly, the method should be of value to a large segment of the research community with the potential to accelerate the discovery of peptide-based drugs and facilitate vaccine development. The proposed kernel is freely available at http://graal.ift.ulaval.ca/downloads/gs-kernel/.
Collapse
|
41
|
Gloriam DE. Chemogenomics of allosteric binding sites in GPCRs. DRUG DISCOVERY TODAY. TECHNOLOGIES 2013; 10:e307-e313. [PMID: 24050282 DOI: 10.1016/j.ddtec.2012.07.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Chemogenomic techniques connect the chemical and biological domains to establish ligand and target relationships not evident from the individual disciplines. Chemogenomics has been applied in lead generation, target classification, focused library design as well as selectivity and polypharmacology profiling. This review describes recent developments structured into ligand-, target- and combined chemogenomic techniques and applications to allosteric GPCR ligands. It also outlines relative strengths and limitations of these techniques and the impact of the increasing crystallographic data.
Collapse
|
42
|
Abstract
The identification of drug-target interactions from heterogeneous biological data is critical in the drug development. In this chapter, we review recently developed in silico chemogenomic approaches to infer unknown drug-target interactions from chemical information of drugs and genomic information of target proteins. We review several kernel-based statistical methods from two different viewpoints: binary classification and dimension reduction. In the results, we demonstrate the usefulness of the methods on the prediction of drug-target interactions from chemical structure data and genomic sequence data. We also discuss the characteristics of each method, and show some perspectives toward future research direction.
Collapse
|
43
|
Affiliation(s)
- Michael Bieler
- Boehringer Ingelheim Pharma GmbH & Co. KG; Lead Discovery and Optimization Support; 88397; Biberach/Riss; Germany
| | - Herbert Koeppen
- Boehringer Ingelheim Pharma GmbH & Co. KG; Lead Discovery and Optimization Support; 88397; Biberach/Riss; Germany
| |
Collapse
|
44
|
Wu D, Huang Q, Zhang Y, Zhang Q, Liu Q, Gao J, Cao Z, Zhu R. Screening of selective histone deacetylase inhibitors by proteochemometric modeling. BMC Bioinformatics 2012; 13:212. [PMID: 22913517 PMCID: PMC3542186 DOI: 10.1186/1471-2105-13-212] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Accepted: 08/16/2012] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Histone deacetylase (HDAC) is a novel target for the treatment of cancer and it can be classified into three classes, i.e., classes I, II, and IV. The inhibitors selectively targeting individual HDAC have been proved to be the better candidate antitumor drugs. To screen selective HDAC inhibitors, several proteochemometric (PCM) models based on different combinations of three kinds of protein descriptors, two kinds of ligand descriptors and multiplication cross-terms were constructed in our study. RESULTS The results show that structure similarity descriptors are better than sequence similarity descriptors and geometry descriptors in the leftacterization of HDACs. Furthermore, the predictive ability was not improved by introducing the cross-terms in our models. Finally, a best PCM model based on protein structure similarity descriptors and 32-dimensional general descriptors was derived (R2 = 0.9897, Qtest2 = 0.7542), which shows a powerful ability to screen selective HDAC inhibitors. CONCLUSIONS Our best model not only predict the activities of inhibitors for each HDAC isoform, but also screen and distinguish class-selective inhibitors and even more isoform-selective inhibitors, thus it provides a potential way to discover or design novel candidate antitumor drugs with reduced side effect.
Collapse
Affiliation(s)
- Dingfeng Wu
- School of Life Sciences and Technology, Tongji University, Shanghai, 200092, P.R. China
| | - Qi Huang
- School of Life Sciences and Technology, Tongji University, Shanghai, 200092, P.R. China
| | - Yida Zhang
- School of Life Sciences and Technology, Tongji University, Shanghai, 200092, P.R. China
| | - Qingchen Zhang
- School of Life Sciences and Technology, Tongji University, Shanghai, 200092, P.R. China
| | - Qi Liu
- School of Life Sciences and Technology, Tongji University, Shanghai, 200092, P.R. China
| | - Jun Gao
- School of Life Sciences and Technology, Tongji University, Shanghai, 200092, P.R. China
- School of Information Engineering, Shanghai Maritime University, Shanghai, 201306, P.R. China
| | - Zhiwei Cao
- School of Life Sciences and Technology, Tongji University, Shanghai, 200092, P.R. China
| | - Ruixin Zhu
- School of Life Sciences and Technology, Tongji University, Shanghai, 200092, P.R. China
- Institute for Advanced Study of Translational Medicine, Tongji University, Shanghai, 200092, P.R. China
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, Liaoning, 116600, P.R. China
| |
Collapse
|
45
|
García-Sosa AT, Oja M, Hetényi C, Maran U. DrugLogit: logistic discrimination between drugs and nondrugs including disease-specificity by assigning probabilities based on molecular properties. J Chem Inf Model 2012; 52:2165-80. [PMID: 22830445 DOI: 10.1021/ci200587h] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The increasing knowledge of both structure and activity of compounds provides a good basis for enhancing the pharmacological characterization of chemical libraries. In addition, pharmacology can be seen as incorporating both advances from molecular biology as well as chemical sciences, with innovative insight provided from studying target-ligand data from a ligand molecular point of view. Predictions and profiling of libraries of drug candidates have previously focused mainly on certain cases of oral bioavailability. Inclusion of other administration routes and disease-specificity would improve the precision of drug profiling. In this work, recent data are extended, and a probability-based approach is introduced for quantitative and gradual classification of compounds into categories of drugs/nondrugs, as well as for disease- or organ-specificity. Using experimental data of over 1067 compounds and multivariate logistic regressions, the classification shows good performance in training and independent test cases. The regressions have high statistical significance in terms of the robustness of coefficients and 95% confidence intervals provided by a 1000-fold bootstrapping resampling. Besides their good predictive power, the classification functions remain chemically interpretable, containing only one to five variables in total, and the physicochemical terms involved can be easily calculated. The present approach is useful for an improved description and filtering of compound libraries. It can also be applied sequentially or in combinations of filters, as well as adapted to particular use cases. The scores and equations may be able to suggest possible routes for compound or library modification. The data is made available for reuse by others, and the equations are freely accessible at http://hermes.chem.ut.ee/~alfx/druglogit.html.
Collapse
|
46
|
Cheng F, Zhou Y, Li J, Li W, Liu G, Tang Y. Prediction of chemical-protein interactions: multitarget-QSAR versus computational chemogenomic methods. MOLECULAR BIOSYSTEMS 2012; 8:2373-84. [PMID: 22751809 DOI: 10.1039/c2mb25110h] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Elucidation of chemical-protein interactions (CPI) is the basis of target identification and drug discovery. It is time-consuming and costly to determine CPI experimentally, and computational methods will facilitate the determination of CPI. In this study, two methods, multitarget quantitative structure-activity relationship (mt-QSAR) and computational chemogenomics, were developed for CPI prediction. Two comprehensive data sets were collected from the ChEMBL database for method assessment. One data set consisted of 81 689 CPI pairs among 50 924 compounds and 136 G-protein coupled receptors (GPCRs), while the other one contained 43 965 CPI pairs among 23 376 compounds and 176 kinases. The range of the area under the receiver operating characteristic curve (AUC) for the test sets was 0.95 to 1.0 and 0.82 to 1.0 for 100 GPCR mt-QSAR models and 100 kinase mt-QSAR models, respectively. The AUC of 5-fold cross validation were about 0.92 for both 176 kinases and 136 GPCRs using the chemogenomic method. However, the performance of the chemogenomic method was worse than that of mt-QSAR for the external validation set. Further analysis revealed that there was a high false positive rate for the external validation set when using the chemogenomic method. In addition, we developed a web server named CPI-Predictor, , which is available for free. The methods and tool have potential applications in network pharmacology and drug repositioning.
Collapse
Affiliation(s)
- Feixiong Cheng
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | | | | | | | | | | |
Collapse
|
47
|
Madala PK, Fairlie DP, Bodén M. Matching Cavities in G Protein-Coupled Receptors to Infer Ligand-Binding Sites. J Chem Inf Model 2012; 52:1401-10. [DOI: 10.1021/ci2005498] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Praveen K. Madala
- Institute
for Molecular Bioscience, ‡School of Chemistry and Molecular Biosciences, and §School of Information
Technology and Electrical Engineering, The University of Queensland, St. Lucia, QLD 4072, Australia
| | - David P. Fairlie
- Institute
for Molecular Bioscience, ‡School of Chemistry and Molecular Biosciences, and §School of Information
Technology and Electrical Engineering, The University of Queensland, St. Lucia, QLD 4072, Australia
| | - Mikael Bodén
- Institute
for Molecular Bioscience, ‡School of Chemistry and Molecular Biosciences, and §School of Information
Technology and Electrical Engineering, The University of Queensland, St. Lucia, QLD 4072, Australia
| |
Collapse
|
48
|
Heifetz A, Morris GB, Biggin PC, Barker O, Fryatt T, Bentley J, Hallett D, Manikowski D, Pal S, Reifegerste R, Slack M, Law R. Study of Human Orexin-1 and -2 G-Protein-Coupled Receptors with Novel and Published Antagonists by Modeling, Molecular Dynamics Simulations, and Site-Directed Mutagenesis. Biochemistry 2012; 51:3178-97. [DOI: 10.1021/bi300136h] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Affiliation(s)
- Alexander Heifetz
- Evotec (U.K.) Ltd., 114 Milton Park, Abingdon, Oxfordshire OX14 4SA, United Kingdom
| | - G. Benjamin Morris
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, United Kingdom
| | - Oliver Barker
- Evotec (U.K.) Ltd., 114 Milton Park, Abingdon, Oxfordshire OX14 4SA, United Kingdom
| | - Tara Fryatt
- Evotec (U.K.) Ltd., 114 Milton Park, Abingdon, Oxfordshire OX14 4SA, United Kingdom
| | - Jonathan Bentley
- Evotec (U.K.) Ltd., 114 Milton Park, Abingdon, Oxfordshire OX14 4SA, United Kingdom
| | - David Hallett
- Evotec (U.K.) Ltd., 114 Milton Park, Abingdon, Oxfordshire OX14 4SA, United Kingdom
| | | | - Sandeep Pal
- Evotec (U.K.) Ltd., 114 Milton Park, Abingdon, Oxfordshire OX14 4SA, United Kingdom
| | - Rita Reifegerste
- Evotec AG, Manfred Eigen Campus, Essener Bogen 7, 22419 Hamburg, Germany
| | - Mark Slack
- Evotec AG, Manfred Eigen Campus, Essener Bogen 7, 22419 Hamburg, Germany
| | - Richard Law
- Evotec (U.K.) Ltd., 114 Milton Park, Abingdon, Oxfordshire OX14 4SA, United Kingdom
| |
Collapse
|
49
|
Periwal V, Kishtapuram S, Scaria V. Computational models for in-vitro anti-tubercular activity of molecules based on high-throughput chemical biology screening datasets. BMC Pharmacol 2012; 12:1. [PMID: 22463123 PMCID: PMC3342097 DOI: 10.1186/1471-2210-12-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2011] [Accepted: 03/31/2012] [Indexed: 01/31/2023] Open
Abstract
Background The emergence of Multi-drug resistant tuberculosis in pandemic proportions throughout the world and the paucity of novel therapeutics for tuberculosis have re-iterated the need to accelerate the discovery of novel molecules with anti-tubercular activity. Though high-throughput screens for anti-tubercular activity are available, they are expensive, tedious and time-consuming to be performed on large scales. Thus, there remains an unmet need to prioritize the molecules that are taken up for biological screens to save on cost and time. Computational methods including Machine Learning have been widely employed to build classifiers for high-throughput virtual screens to prioritize molecules for further analysis. The availability of datasets based on high-throughput biological screens or assays in public domain makes computational methods a plausible proposition for building predictive models. In addition, this approach would save significantly on the cost, effort and time required to run high throughput screens. Results We show that by using four supervised state-of-the-art classifiers (SMO, Random Forest, Naive Bayes and J48) we are able to generate in-silico predictive models on an extremely imbalanced (minority class ratio: 0.6%) large dataset of anti-tubercular molecules with reasonable AROC (0.6-0.75) and BCR (60-66%) values. Moreover, these models are able to provide 3-4 fold enrichment over random selection. Conclusions In the present study, we have used the data from in-vitro screens for anti-tubercular activity from a high-throughput screen available in public domain to build highly accurate classifiers based on molecular descriptors of the molecules. We show that Machine Learning tools can be used to build highly effective predictive models for virtual high-throughput screens to prioritize molecules from large molecular libraries.
Collapse
Affiliation(s)
- Vinita Periwal
- GN Ramachandran Knowledge Center for Genome Informatics, Institute of Genomics and Integrative Biology (CSIR), New Delhi 110007, India
| | | | | | | |
Collapse
|
50
|
Ning X, Walters M, Karypisxy G. Improved Machine Learning Models for Predicting Selective Compounds. J Chem Inf Model 2011; 52:38-50. [DOI: 10.1021/ci200346b] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Xia Ning
- Department of Computer Science & Engineering and ‡College of Pharmacy, University of Minnesota, Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Michael Walters
- Department of Computer Science & Engineering and ‡College of Pharmacy, University of Minnesota, Twin Cities, Minneapolis, Minnesota 55455, United States
| | - George Karypisxy
- Department of Computer Science & Engineering and ‡College of Pharmacy, University of Minnesota, Twin Cities, Minneapolis, Minnesota 55455, United States
| |
Collapse
|