1
|
Najm M, Azencott CA, Playe B, Stoven V. Drug Target Identification with Machine Learning: How to Choose Negative Examples. Int J Mol Sci 2021; 22:ijms22105118. [PMID: 34066072 PMCID: PMC8151112 DOI: 10.3390/ijms22105118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 04/30/2021] [Accepted: 05/07/2021] [Indexed: 11/24/2022] Open
Abstract
Identification of the protein targets of hit molecules is essential in the drug discovery process. Target prediction with machine learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, leading to a high number of false positives, thus increasing time and cost of experimental validation campaigns. To minimize the number of false positives among predicted targets, we propose a new scheme for choosing negative examples, so that each protein and each drug appears an equal number of times in positive and negative examples. We artificially reproduce the process of target identification for three specific drugs, and more globally for 200 approved drugs. For the detailed three drug examples, and for the larger set of 200 drugs, training with the proposed scheme for the choice of negative examples improved target prediction results: the average number of false positives among the top ranked predicted targets decreased, and overall, the rank of the true targets was improved.Our method corrects databases’ statistical bias and reduces the number of false positive predictions, and therefore the number of useless experiments potentially undertaken.
Collapse
Affiliation(s)
- Matthieu Najm
- Center for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France; (C.-A.A.); (B.P.); (V.S.)
- Institut Curie, 75248 Paris, France
- INSERM U900, 75428 Paris, France
- Correspondence:
| | - Chloé-Agathe Azencott
- Center for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France; (C.-A.A.); (B.P.); (V.S.)
- Institut Curie, 75248 Paris, France
- INSERM U900, 75428 Paris, France
| | - Benoit Playe
- Center for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France; (C.-A.A.); (B.P.); (V.S.)
- Institut Curie, 75248 Paris, France
- INSERM U900, 75428 Paris, France
| | - Véronique Stoven
- Center for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France; (C.-A.A.); (B.P.); (V.S.)
- Institut Curie, 75248 Paris, France
- INSERM U900, 75428 Paris, France
| |
Collapse
|
2
|
Hao M, Bryant SH, Wang Y. Open-source chemogenomic data-driven algorithms for predicting drug-target interactions. Brief Bioinform 2020; 20:1465-1474. [PMID: 29420684 DOI: 10.1093/bib/bby010] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 01/18/2018] [Indexed: 12/25/2022] Open
Abstract
While novel technologies such as high-throughput screening have advanced together with significant investment by pharmaceutical companies during the past decades, the success rate for drug development has not yet been improved prompting researchers looking for new strategies of drug discovery. Drug repositioning is a potential approach to solve this dilemma. However, experimental identification and validation of potential drug targets encoded by the human genome is both costly and time-consuming. Therefore, effective computational approaches have been proposed to facilitate drug repositioning, which have proved to be successful in drug discovery. Doubtlessly, the availability of open-accessible data from basic chemical biology research and the success of human genome sequencing are crucial to develop effective in silico drug repositioning methods allowing the identification of potential targets for existing drugs. In this work, we review several chemogenomic data-driven computational algorithms with source codes publicly accessible for predicting drug-target interactions (DTIs). We organize these algorithms by model properties and model evolutionary relationships. We re-implemented five representative algorithms in R programming language, and compared these algorithms by means of mean percentile ranking, a new recall-based evaluation metric in the DTI prediction research field. We anticipate that this review will be objective and helpful to researchers who would like to further improve existing algorithms or need to choose appropriate algorithms to infer potential DTIs in the projects. The source codes for DTI predictions are available at: https://github.com/minghao2016/chemogenomicAlg4DTIpred.
Collapse
|
3
|
Sam E, Athri P. Web-based drug repurposing tools: a survey. Brief Bioinform 2017; 20:299-316. [DOI: 10.1093/bib/bbx125] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Indexed: 12/15/2022] Open
Affiliation(s)
- Elizabeth Sam
- Department of Computer Science & Engineering Amrita, University Bengaluru, India
| | - Prashanth Athri
- Department of Computer Science & Engineering Amrita, University Bengaluru, India
| |
Collapse
|
4
|
Cheng T, Hao M, Takeda T, Bryant SH, Wang Y. Large-Scale Prediction of Drug-Target Interaction: a Data-Centric Review. AAPS J 2017; 19:1264-1275. [PMID: 28577120 PMCID: PMC11097213 DOI: 10.1208/s12248-017-0092-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Accepted: 04/25/2017] [Indexed: 11/30/2022] Open
Abstract
The prediction of drug-target interactions (DTIs) is of extraordinary significance to modern drug discovery in terms of suggesting new drug candidates and repositioning old drugs. Despite technological advances, large-scale experimental determination of DTIs is still expensive and laborious. Effective and low-cost computational alternatives remain in strong need. Meanwhile, open-access resources have been rapidly growing with massive amount of bioactivity data becoming available, creating unprecedented opportunities for the development of novel in silico models for large-scale DTI prediction. In this work, we review the state-of-the-art computational approaches for identifying DTIs from a data-centric perspective: what the underlying data are and how they are utilized in each study. We also summarize popular public data resources and online tools for DTI prediction. It is found that various types of data were employed including properties of chemical structures, drug therapeutic effects and side effects, drug-target binding, drug-drug interactions, bioactivity data of drug molecules across multiple biological targets, and drug-induced gene expressions. More often, the heterogeneous data were integrated to offer better performance. However, challenges remain such as handling data imbalance, incorporating negative samples and quantitative bioactivity data, as well as maintaining cross-links among different data sources, which are essential for large-scale and automated information integration.
Collapse
Affiliation(s)
- Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Ming Hao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Takako Takeda
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Stephen H Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Yanli Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
5
|
Cerisier N, Regad L, Triki D, Petitjean M, Flatters D, Camproux AC. Statistical Profiling of One Promiscuous Protein Binding Site: Illustrated by Urokinase Catalytic Domain. Mol Inform 2017; 36. [PMID: 28696518 DOI: 10.1002/minf.201700040] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 06/26/2017] [Indexed: 12/21/2022]
Abstract
While recent literature focuses on drug promiscuity, the characterization of promiscuous binding sites (ability to bind several ligands) remains to be explored. Here, we present a proteochemometric modeling approach to analyze diverse ligands and corresponding multiple binding sub-pockets associated with one promiscuous binding site to characterize protein-ligand recognition. We analyze both geometrical and physicochemical profile correspondences. This approach was applied to examine the well-studied druggable urokinase catalytic domain inhibitor binding site, which results in a large number of complex structures bound to various ligands. This approach emphasizes the importance of jointly characterizing pocket and ligand spaces to explore the impact of ligand diversity on sub-pocket properties and to establish their main profile correspondences. This work supports an interest in mining available 3D holo structures associated with a promiscuous binding site to explore its main protein-ligand recognition tendency.
Collapse
Affiliation(s)
- Natacha Cerisier
- INSERM, UMRS-973, MTi,35, rue Hélène Brion, 75205, PARIS CEDEX 13.,University Paris Diderot, Sorbonne Paris Cité, UMRS-973, MTi
| | - Leslie Regad
- INSERM, UMRS-973, MTi,35, rue Hélène Brion, 75205, PARIS CEDEX 13.,University Paris Diderot, Sorbonne Paris Cité, UMRS-973, MTi
| | - Dhoha Triki
- INSERM, UMRS-973, MTi,35, rue Hélène Brion, 75205, PARIS CEDEX 13.,University Paris Diderot, Sorbonne Paris Cité, UMRS-973, MTi
| | - Michel Petitjean
- INSERM, UMRS-973, MTi,35, rue Hélène Brion, 75205, PARIS CEDEX 13.,University Paris Diderot, Sorbonne Paris Cité, UMRS-973, MTi
| | - Delphine Flatters
- INSERM, UMRS-973, MTi,35, rue Hélène Brion, 75205, PARIS CEDEX 13.,University Paris Diderot, Sorbonne Paris Cité, UMRS-973, MTi
| | - Anne-Claude Camproux
- INSERM, UMRS-973, MTi,35, rue Hélène Brion, 75205, PARIS CEDEX 13.,University Paris Diderot, Sorbonne Paris Cité, UMRS-973, MTi
| |
Collapse
|
6
|
Shaikh N, Sharma M, Garg P. An improved approach for predicting drug-target interaction: proteochemometrics to molecular docking. MOLECULAR BIOSYSTEMS 2016; 12:1006-14. [PMID: 26822863 DOI: 10.1039/c5mb00650c] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Proteochemometric (PCM) methods, which use descriptors of both the interacting species, i.e. drug and the target, are being successfully employed for the prediction of drug-target interactions (DTI). However, unavailability of non-interacting dataset and determining the applicability domain (AD) of model are a main concern in PCM modeling. In the present study, traditional PCM modeling was improved by devising novel methodologies for reliable negative dataset generation and fingerprint based AD analysis. In addition, various types of descriptors and classifiers were evaluated for their performance. The Random Forest and Support Vector Machine models outperformed the other classifiers (accuracies >98% and >89% for 10-fold cross validation and external validation, respectively). The type of protein descriptors had negligible effect on the developed models, encouraging the use of sequence-based descriptors over the structure-based descriptors. To establish the practical utility of built models, targets were predicted for approved anticancer drugs of natural origin. The molecular recognition interactions between the predicted drug-target pair were quantified with the help of a reverse molecular docking approach. The majority of predicted targets are known for anticancer therapy. These results thus correlate well with anticancer potential of the selected drugs. Interestingly, out of all predicted DTIs, thirty were found to be reported in the ChEMBL database, further validating the adopted methodology. The outcome of this study suggests that the proposed approach, involving use of the improved PCM methodology and molecular docking, can be successfully employed to elucidate the intricate mode of action for drug molecules as well as repositioning them for new therapeutic applications.
Collapse
Affiliation(s)
- Naeem Shaikh
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER), S. A. S. Nagar, Punjab 160062, India.
| | - Mahesh Sharma
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER), S. A. S. Nagar, Punjab 160062, India.
| | - Prabha Garg
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER), S. A. S. Nagar, Punjab 160062, India.
| |
Collapse
|
7
|
Rasti B, Namazi M, Karimi-Jafari MH, Ghasemi JB. Proteochemometric Modeling of the Interaction Space of Carbonic Anhydrase and its Inhibitors: An Assessment of Structure-based and Sequence-based Descriptors. Mol Inform 2016; 36. [PMID: 27860295 DOI: 10.1002/minf.201600102] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2015] [Accepted: 10/26/2016] [Indexed: 11/08/2022]
Abstract
Due to its physiological and clinical roles, carbonic anhydrase (CA) is one of the most interesting case studies. There are different classes of CAinhibitors including sulfonamides, polyamines, coumarins and dithiocarbamates (DTCs). However, many of them hardly act as a selective inhibitor against a specific isoform. Therefore, finding highly selective inhibitors for different isoforms of CA is still an ongoing project. Proteochemometrics modeling (PCM) is able to model the bioactivity of multiple compounds against different isoforms of a protein. Therefore, it would be extremely applicable when investigating the selectivity of different ligands towards different receptors. Given the facts, we applied PCM to investigate the interaction space and structural properties that lead to the selective inhibition of CA isoforms by some dithiocarbamates. Our models have provided interesting structural information that can be considered to design compounds capable of inhibiting different isoforms of CA in an improved selective manner. Validity and predictivity of the models were confirmed by both internal and external validation methods; while Y-scrambling approach was applied to assess the robustness of the models. To prove the reliability and the applicability of our findings, we showed how ligands-receptors selectivity can be affected by removing any of these critical findings from the modeling process.
Collapse
Affiliation(s)
- Behnam Rasti
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Mohsen Namazi
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - M H Karimi-Jafari
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Jahan B Ghasemi
- Department of Analytical Chemistry, School of Chemistry, College of Science, University of Tehran, Tehran, Iran
| |
Collapse
|
8
|
Qiu T, Qiu J, Feng J, Wu D, Yang Y, Tang K, Cao Z, Zhu R. The recent progress in proteochemometric modelling: focusing on target descriptors, cross-term descriptors and application scope. Brief Bioinform 2016; 18:125-136. [PMID: 26873661 DOI: 10.1093/bib/bbw004] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Revised: 12/09/2015] [Indexed: 12/17/2022] Open
Abstract
As an extension of the conventional quantitative structure activity relationship models, proteochemometric (PCM) modelling is a computational method that can predict the bioactivity relations between multiple ligands and multiple targets. Traditional PCM modelling includes three essential elements: descriptors (including target descriptors, ligand descriptors and cross-term descriptors), bioactivity data and appropriate learning functions that link the descriptors to the bioactivity data. Since its appearance, PCM modelling has developed rapidly over the past decade by taking advantage of the progress of different descriptors and machine learning techniques, along with the increasing amounts of available bioactivity data. Specifically, the new emerging target descriptors and cross-term descriptors not only significantly increased the performance of PCM modelling but also expanded its application scope from traditional protein-ligand interaction to more abundant interactions, including protein-peptide, protein-DNA and even protein-protein interactions. In this review, target descriptors and cross-term descriptors, as well as the corresponding application scope, are intensively summarized. Additionally, we look forward to seeing PCM modelling extend into new application scopes, such as Target-Catalyst-Ligand systems, with the further development of descriptors, machine learning techniques and increasing amounts of available bioactivity data.
Collapse
|
9
|
Jasial S, Balfer J, Vogt M, Bajorath J. Determination of Meta-Parameters for Support Vector Machine Linear Combinations. Mol Inform 2015; 34:127-33. [DOI: 10.1002/minf.201400163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 12/16/2014] [Indexed: 11/05/2022]
|
10
|
Cortés-Ciriano I, Ain QU, Subramanian V, Lenselink EB, Méndez-Lucio O, IJzerman AP, Wohlfahrt G, Prusis P, Malliavin TE, van Westen GJP, Bender A. Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. MEDCHEMCOMM 2015. [DOI: 10.1039/c4md00216d] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Proteochemometric (PCM) modelling is a computational method to model the bioactivity of multiple ligands against multiple related protein targets simultaneously.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Unité de Bioinformatique Structurale
- Institut Pasteur and CNRS UMR 3825
- Structural Biology and Chemistry Department
- 75 724 Paris
- France
| | - Qurrat Ul Ain
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| | | | - Eelke B. Lenselink
- Division of Medicinal Chemistry
- Leiden Academic Centre for Drug Research
- Leiden
- The Netherlands
| | - Oscar Méndez-Lucio
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| | - Adriaan P. IJzerman
- Division of Medicinal Chemistry
- Leiden Academic Centre for Drug Research
- Leiden
- The Netherlands
| | - Gerd Wohlfahrt
- Computer-Aided Drug Design
- Orion Pharma
- FIN-02101 Espoo
- Finland
| | - Peteris Prusis
- Computer-Aided Drug Design
- Orion Pharma
- FIN-02101 Espoo
- Finland
| | - Thérèse E. Malliavin
- Unité de Bioinformatique Structurale
- Institut Pasteur and CNRS UMR 3825
- Structural Biology and Chemistry Department
- 75 724 Paris
- France
| | - Gerard J. P. van Westen
- European Molecular Biology Laboratory
- European Bioinformatics Institute
- Wellcome Trust Genome Campus
- Hinxton
- UK
| | - Andreas Bender
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| |
Collapse
|
11
|
Lavecchia A. Machine-learning approaches in drug discovery: methods and applications. Drug Discov Today 2014; 20:318-31. [PMID: 25448759 DOI: 10.1016/j.drudis.2014.10.012] [Citation(s) in RCA: 353] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Revised: 09/27/2014] [Accepted: 10/24/2014] [Indexed: 12/19/2022]
Abstract
During the past decade, virtual screening (VS) has evolved from traditional similarity searching, which utilizes single reference compounds, into an advanced application domain for data mining and machine-learning approaches, which require large and representative training-set compounds to learn robust decision rules. The explosive growth in the amount of public domain-available chemical and biological data has generated huge effort to design, analyze, and apply novel learning methodologies. Here, I focus on machine-learning techniques within the context of ligand-based VS (LBVS). In addition, I analyze several relevant VS studies from recent publications, providing a detailed view of the current state-of-the-art in this field and highlighting not only the problematic issues, but also the successes and opportunities for further advances.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Department of Pharmacy, Drug Discovery Laboratory, University of Napoli 'Federico II', via D. Montesano 49, I-80131 Napoli, Italy.
| |
Collapse
|
12
|
Korkmaz S, Zararsiz G, Goksuluk D. Drug/nondrug classification using Support Vector Machines with various feature selection strategies. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 117:51-60. [PMID: 25224081 DOI: 10.1016/j.cmpb.2014.08.009] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 08/15/2014] [Accepted: 08/27/2014] [Indexed: 06/03/2023]
Abstract
In conjunction with the advance in computer technology, virtual screening of small molecules has been started to use in drug discovery. Since there are thousands of compounds in early-phase of drug discovery, a fast classification method, which can distinguish between active and inactive molecules, can be used for screening large compound collections. In this study, we used Support Vector Machines (SVM) for this type of classification task. SVM is a powerful classification tool that is becoming increasingly popular in various machine-learning applications. The data sets consist of 631 compounds for training set and 216 compounds for a separate test set. In data pre-processing step, the Pearson's correlation coefficient used as a filter to eliminate redundant features. After application of the correlation filter, a single SVM has been applied to this reduced data set. Moreover, we have investigated the performance of SVM with different feature selection strategies, including SVM-Recursive Feature Elimination, Wrapper Method and Subset Selection. All feature selection methods generally represent better performance than a single SVM while Subset Selection outperforms other feature selection methods. We have tested SVM as a classification tool in a real-life drug discovery problem and our results revealed that it could be a useful method for classification task in early-phase of drug discovery.
Collapse
Affiliation(s)
- Selcuk Korkmaz
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey.
| | - Gokmen Zararsiz
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey
| | - Dincer Goksuluk
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey
| |
Collapse
|
13
|
Cortés-Cabrera A, Morris GM, Finn PW, Morreale A, Gago F. Comparison of ultra-fast 2D and 3D ligand and target descriptors for side effect prediction and network analysis in polypharmacology. Br J Pharmacol 2014; 170:557-67. [PMID: 23826885 DOI: 10.1111/bph.12294] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2013] [Revised: 06/24/2013] [Accepted: 07/02/2013] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND AND PURPOSE Some existing computational methods are used to infer protein targets of small molecules and can therefore be used to find new targets for existing drugs, with the goals of re-directing the molecule towards a different therapeutic purpose or explaining off-target effects due to multiple targeting. Inherent limitations, however, arise from the fact that chemical analogy is calculated on the basis of common frameworks or scaffolds and also because target information is neglected. The method we present addresses these issues by taking into account 3D information from both the ligand and the target. EXPERIMENTAL APPROACH ElectroShape is an established method for ultra-fast comparison of the shapes and charge distributions of ligands that is validated here for prediction of on-target activities, off-target profiles and adverse effects of drugs and drug-like molecules taken from the DrugBank database. KEY RESULTS The method is shown to predict polypharmacology profiles and relate targets from two complementary viewpoints (ligand- and target-based networks). CONCLUSIONS AND IMPLICATIONS The open-access web tool presented here (http://ub.cbm.uam.es/chemogenomics/) allows interactive navigation in a unified 'pharmacological space' from the viewpoints of both ligands and targets. It also enables prediction of pharmacological profiles, including likely side effects, for new compounds. We hope this web interface will help many pharmacologists to become aware of this new paradigm (up to now mostly used in the realm of the so-called 'chemical biology') and encourage its use with a view to revealing 'hidden' relationships between new and existing compounds and pharmacologically relevant targets.
Collapse
Affiliation(s)
- Alvaro Cortés-Cabrera
- Unidad de Bioinformática, Centro de Biología Molecular Severo Ochoa (CSIC/UAM), Madrid, Spain; Departamento de Ciencias Biomédicas, Universidad de Alcalá, Madrid, Spain
| | | | | | | | | |
Collapse
|
14
|
Mousavian Z, Masoudi-Nejad A. Drug-target interaction prediction via chemogenomic space: learning-based methods. Expert Opin Drug Metab Toxicol 2014; 10:1273-87. [PMID: 25112457 DOI: 10.1517/17425255.2014.950222] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
INTRODUCTION Identification of the interaction between drugs and target proteins is a crucial task in genomic drug discovery. The in silico prediction is an appropriate alternative for the laborious and costly experimental process of drug-target interaction prediction. Developing a variety of computational methods opens a new direction in analyzing and detecting new drug-target pairs. AREAS COVERED In this review, we will focus on chemogenomic methods which have established a learning framework for predicting drug-target interactions. Learning-based methods are classified into supervised and semi-supervised, and the supervised learning methods are studied as two separate parts including similarity-based methods and feature-based methods. EXPERT OPINION In spite of many improvements for pharmacology applications by learning-based methods, there are many over simplification settings in construction of predictive models that may lead to over-optimistic results on drug-target interaction prediction.
Collapse
Affiliation(s)
- Zaynab Mousavian
- University of Tehran, Institute of Biochemistry and Biophysics, Laboratory of Systems Biology and Bioinformatics (LBB) , Tehran , Iran +98 21 6695 9256 ; +98 21 6640 4680 ;
| | | |
Collapse
|
15
|
Tang J, Aittokallio T. Network pharmacology strategies toward multi-target anticancer therapies: from computational models to experimental design principles. Curr Pharm Des 2014; 20:23-36. [PMID: 23530504 PMCID: PMC3894695 DOI: 10.2174/13816128113199990470] [Citation(s) in RCA: 84] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2013] [Accepted: 03/18/2013] [Indexed: 12/12/2022]
Abstract
Polypharmacology has emerged as novel means in drug discovery for improving treatment response in clinical use. However, to really capitalize on the polypharmacological effects of drugs, there is a critical need to better model and understand how the complex interactions between drugs and their cellular targets contribute to drug efficacy and possible side effects. Network graphs provide a convenient modeling framework for dealing with the fact that most drugs act on cellular systems through targeting multiple proteins both through on-target and off-target binding. Network pharmacology models aim at addressing questions such as how and where in the disease network should one target to inhibit disease phenotypes, such as cancer growth, ideally leading to therapies that are less vulnerable to drug resistance and side effects by means of attacking the disease network at the systems level through synergistic and synthetic lethal interactions. Since the exponentially increasing number of potential drug target combinations makes pure experimental approach quickly unfeasible, this review depicts a number of computational models and algorithms that can effectively reduce the search space for determining the most promising combinations for experimental evaluation. Such computational-experimental strategies are geared toward realizing the full potential of multi-target treatments in different disease phenotypes. Our specific focus is on system-level network approaches to polypharmacology designs in anticancer drug discovery, where we give representative examples of how network-centric modeling may offer systematic strategies toward better understanding and even predicting the phenotypic responses to multi-target therapies.
Collapse
|
16
|
|
17
|
|
18
|
Subramanian V, Prusis P, Pietilä LO, Xhaard H, Wohlfahrt G. Visually interpretable models of kinase selectivity related features derived from field-based proteochemometrics. J Chem Inf Model 2013; 53:3021-30. [PMID: 24116714 DOI: 10.1021/ci400369z] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Achieving selectivity for small organic molecules toward biological targets is a main focus of drug discovery but has been proven difficult, for example, for kinases because of the high similarity of their ATP binding pockets. To support the design of more selective inhibitors with fewer side effects or with altered target profiles for improved efficacy, we developed a method combining ligand- and receptor-based information. Conventional QSAR models enable one to study the interactions of multiple ligands toward a single protein target, but in order to understand the interactions between multiple ligands and multiple proteins, we have used proteochemometrics, a multivariate statistics method that aims to combine and correlate both ligand and protein descriptions with affinity to receptors. The superimposed binding sites of 50 unique kinases were described by molecular interaction fields derived from knowledge-based potentials and Schrödinger's WaterMap software. Eighty ligands were described by Mold(2), Open Babel, and Volsurf descriptors. Partial least-squares regression including cross-terms, which describe the selectivity, was used for model building. This combination of methods allows interpretation and easy visualization of the models within the context of ligand binding pockets, which can be translated readily into the design of novel inhibitors.
Collapse
|
19
|
Meslamani J, Bhajun R, Martz F, Rognan D. Computational profiling of bioactive compounds using a target-dependent composite workflow. J Chem Inf Model 2013; 53:2322-33. [PMID: 23941602 DOI: 10.1021/ci400303n] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Computational target fishing is a chemoinformatic method aimed at determining main and secondary targets of bioactive compounds in order to explain their mechanism of action, anticipate potential side effects, or repurpose existing drugs for novel therapeutic indications. Many existing successes in this area have been based on a use of a single computational method to estimate potentially new target-ligand associations. We herewith present an automated workflow using several methods to optimally browse target-ligand space according to existing knowledge on either ligand and target space under investigation. The protocol uses four ligand-based (SVM classification, SVR affinity prediction, nearest neighbors interpolation, shape similarity) and two structure-based approaches (docking, protein-ligand pharmacophore match) in series, according to well-defined ligand and target property checks. The workflow was remarkably accurate (72%) in identifying the main target of 189 clinical candidates and proposed two novel off-targets which could be experimentally validated. Rolofylline, an adenosine A1 receptor antagonist, was confirmed to inhibit phosphodiesterase 5 with a moderate affinity (IC50 = 13.8 μM). More interestingly, we describe a strong binding (IC50 = 142 nM) of a claimed selective phosphodiesterase 10 A inhibitor (PF-2545920) with the cysteinyl leukotriene type 1 G protein-coupled receptor.
Collapse
Affiliation(s)
- Jamel Meslamani
- Laboratory for Therapeutical Innovation, UMR 7200 Université de Strasbourg/CNRS, MEDALIS Drug Discovery Center , F-67400 Illkirch, France
| | | | | | | |
Collapse
|
20
|
Medina-Franco JL, Aguayo-Ortiz R. Progress in the Visualization and Mining of Chemical and Target Spaces. Mol Inform 2013; 32:942-53. [DOI: 10.1002/minf.201300041] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2013] [Accepted: 05/06/2013] [Indexed: 01/15/2023]
|
21
|
Pérot S, Regad L, Reynès C, Spérandio O, Miteva MA, Villoutreix BO, Camproux AC. Insights into an original pocket-ligand pair classification: a promising tool for ligand profile prediction. PLoS One 2013; 8:e63730. [PMID: 23840299 PMCID: PMC3688729 DOI: 10.1371/journal.pone.0063730] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2012] [Accepted: 04/05/2013] [Indexed: 11/18/2022] Open
Abstract
Pockets are today at the cornerstones of modern drug discovery projects and at the crossroad of several research fields, from structural biology to mathematical modeling. Being able to predict if a small molecule could bind to one or more protein targets or if a protein could bind to some given ligands is very useful for drug discovery endeavors, anticipation of binding to off- and anti-targets. To date, several studies explore such questions from chemogenomic approach to reverse docking methods. Most of these studies have been performed either from the viewpoint of ligands or targets. However it seems valuable to use information from both ligands and target binding pockets. Hence, we present a multivariate approach relating ligand properties with protein pocket properties from the analysis of known ligand-protein interactions. We explored and optimized the pocket-ligand pair space by combining pocket and ligand descriptors using Principal Component Analysis and developed a classification engine on this paired space, revealing five main clusters of pocket-ligand pairs sharing specific and similar structural or physico-chemical properties. These pocket-ligand pair clusters highlight correspondences between pocket and ligand topological and physico-chemical properties and capture relevant information with respect to protein-ligand interactions. Based on these pocket-ligand correspondences, a protocol of prediction of clusters sharing similarity in terms of recognition characteristics is developed for a given pocket-ligand complex and gives high performances. It is then extended to cluster prediction for a given pocket in order to acquire knowledge about its expected ligand profile or to cluster prediction for a given ligand in order to acquire knowledge about its expected pocket profile. This prediction approach shows promising results and could contribute to predict some ligand properties critical for binding to a given pocket, and conversely, some key pocket properties for ligand binding.
Collapse
Affiliation(s)
- Stéphanie Pérot
- INSERM, UMRS 973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMRS 973, MTi, Paris, France
| | - Leslie Regad
- INSERM, UMRS 973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMRS 973, MTi, Paris, France
| | - Christelle Reynès
- INSERM, UMRS 973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMRS 973, MTi, Paris, France
| | - Olivier Spérandio
- INSERM, UMRS 973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMRS 973, MTi, Paris, France
| | - Maria A. Miteva
- INSERM, UMRS 973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMRS 973, MTi, Paris, France
| | - Bruno O. Villoutreix
- INSERM, UMRS 973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMRS 973, MTi, Paris, France
| | - Anne-Claude Camproux
- INSERM, UMRS 973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMRS 973, MTi, Paris, France
- * E-mail:
| |
Collapse
|
22
|
Koch U, Hamacher M, Nussbaumer P. Cheminformatics at the interface of medicinal chemistry and proteomics. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:156-61. [PMID: 23707564 DOI: 10.1016/j.bbapap.2013.05.010] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2012] [Revised: 04/26/2013] [Accepted: 05/13/2013] [Indexed: 10/26/2022]
Abstract
Multiple factors have to be optimized in the course of a drug discovery project. Traditionally this includes potency on a single target, eventually specificity as well as the pharmacokinetic, physicochemical and the safety profile. Recently an additional dimension has been added by realizing that the therapeutic outcome of a drug is often determined not only by its activity on a single target but also by its activity profile across a variety of biological targets. To address the polypharmacology of drug candidates many compounds are tested on a set of targets or in phenotypic screens generating a tremendous amount of data. To extract useful information computational methods at the interface of proteomics and cheminformatics are indispensable. This review will focus on some recent developments in this field. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
Collapse
Affiliation(s)
- Uwe Koch
- Lead Discovery Center GmbH, Otto-Hahn-Str. 15, D-44227 Dortmund, Germany.
| | | | | |
Collapse
|
23
|
Vogt M, Bajorath J. Chemoinformatics: A view of the field and current trends in method development. Bioorg Med Chem 2012; 20:5317-23. [DOI: 10.1016/j.bmc.2012.03.030] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2012] [Revised: 03/09/2012] [Accepted: 03/12/2012] [Indexed: 12/18/2022]
|
24
|
Schuffenhauer A. Computational methods for scaffold hopping. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2012. [DOI: 10.1002/wcms.1106] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
25
|
Taboureau O, Baell JB, Fernández-Recio J, Villoutreix BO. Established and emerging trends in computational drug discovery in the structural genomics era. ACTA ACUST UNITED AC 2012; 19:29-41. [PMID: 22284352 DOI: 10.1016/j.chembiol.2011.12.007] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2011] [Revised: 12/05/2011] [Accepted: 12/08/2011] [Indexed: 12/01/2022]
Abstract
Bioinformatics and chemoinformatics approaches contribute to hit discovery, hit-to-lead optimization, safety profiling, and target identification and enhance our overall understanding of the health and disease states. A vast repertoire of computational methods has been reported and increasingly combined in order to address more and more challenging targets or complex molecular mechanisms in the context of large-scale integration of structure and bioactivity data produced by private and public drug research. This review explores some key computational methods directly linked to drug discovery and chemical biology with a special emphasis on compound collection preparation, virtual screening, protein docking, and systems pharmacology. A list of generally freely available software packages and online resources is provided, and examples of successful applications are briefly commented upon.
Collapse
Affiliation(s)
- Olivier Taboureau
- Center for Biological Sequences Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kongens Lyngby, Denmark
| | | | | | | |
Collapse
|
26
|
Sanders MPA, McGuire R, Roumen L, de Esch IJP, de Vlieg J, Klomp JPG, de Graaf C. From the protein's perspective: the benefits and challenges of protein structure-based pharmacophore modeling. MEDCHEMCOMM 2012. [DOI: 10.1039/c1md00210d] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Protein structure-based pharmacophore (SBP) models derive the molecular features a ligand must contain to be biologically active by conversion of protein properties to reciprocal ligand space. SBPs improve molecular understanding of ligand–protein interactions and can be used as valuable tools for hit and lead optimization, compound library design, and target hopping.
Collapse
Affiliation(s)
- Marijn P. A. Sanders
- Computational Drug Discovery Group
- CMBI
- Radboud University Nijmegen
- Nijmegen
- The Netherlands
| | | | - Luc Roumen
- Division of Medicinal Chemistry
- LACDR
- VU University Amsterdam
- Amsterdam
- The Netherlands
| | - Iwan J. P. de Esch
- Division of Medicinal Chemistry
- LACDR
- VU University Amsterdam
- Amsterdam
- The Netherlands
| | - Jacob de Vlieg
- Computational Drug Discovery Group
- CMBI
- Radboud University Nijmegen
- Nijmegen
- The Netherlands
| | | | - Chris de Graaf
- Division of Medicinal Chemistry
- LACDR
- VU University Amsterdam
- Amsterdam
- The Netherlands
| |
Collapse
|
27
|
van Westen GJP, Wegner JK, Geluykens P, Kwanten L, Vereycken I, Peeters A, IJzerman AP, van Vlijmen HWT, Bender A. Which compound to select in lead optimization? Prospectively validated proteochemometric models guide preclinical development. PLoS One 2011; 6:e27518. [PMID: 22132107 PMCID: PMC3223189 DOI: 10.1371/journal.pone.0027518] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2011] [Accepted: 10/18/2011] [Indexed: 11/19/2022] Open
Abstract
In quite a few diseases, drug resistance due to target variability poses a serious problem in pharmacotherapy. This is certainly true for HIV, and hence, it is often unknown which drug is best to use or to develop against an individual HIV strain. In this work we applied 'proteochemometric' modeling of HIV Non-Nucleoside Reverse Transcriptase (NNRTI) inhibitors to support preclinical development by predicting compound performance on multiple mutants in the lead selection stage. Proteochemometric models are based on both small molecule and target properties and can thus capture multi-target activity relationships simultaneously, the targets in this case being a set of 14 HIV Reverse Transcriptase (RT) mutants. We validated our model by experimentally confirming model predictions for 317 untested compound-mutant pairs, with a prediction error comparable with assay variability (RMSE 0.62). Furthermore, dependent on the similarity of a new mutant to the training set, we could predict with high accuracy which compound will be most effective on a sequence with a previously unknown genotype. Hence, our models allow the evaluation of compound performance on untested sequences and the selection of the most promising leads for further preclinical research. The modeling concept is likely to be applicable also to other target families with genetic variability like other viruses or bacteria, or with similar orthologs like GPCRs.
Collapse
Affiliation(s)
- Gerard J. P. van Westen
- Division of Medicinal Chemistry, Leiden/Amsterdam Center for Drug Research, Leiden, The Netherlands
| | | | | | | | | | | | - Adriaan P. IJzerman
- Division of Medicinal Chemistry, Leiden/Amsterdam Center for Drug Research, Leiden, The Netherlands
| | - Herman W. T. van Vlijmen
- Division of Medicinal Chemistry, Leiden/Amsterdam Center for Drug Research, Leiden, The Netherlands
- Tibotec BVBA, Beerse, Belgium
| | - Andreas Bender
- Division of Medicinal Chemistry, Leiden/Amsterdam Center for Drug Research, Leiden, The Netherlands
- Unilever Centre for Molecular Science Informatics, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
28
|
Kuhn M, Szklarczyk D, Franceschini A, von Mering C, Jensen LJ, Bork P. STITCH 3: zooming in on protein-chemical interactions. Nucleic Acids Res 2011; 40:D876-80. [PMID: 22075997 PMCID: PMC3245073 DOI: 10.1093/nar/gkr1011] [Citation(s) in RCA: 217] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
To facilitate the study of interactions between proteins and chemicals, we have created STITCH, an aggregated database of interactions connecting over 300 000 chemicals and 2.6 million proteins from 1133 organisms. Compared to the previous version, the number of chemicals with interactions and the number of high-confidence interactions both increase 4-fold. The database can be accessed interactively through a web interface, displaying interactions in an integrated network view. It is also available for computational studies through downloadable files and an API. As an extension in the current version, we offer the option to switch between two levels of detail, namely whether stereoisomers of a given compound are shown as a merged entity or as separate entities. Separate display of stereoisomers is necessary, for example, for carbohydrates and chiral drugs. Combining the isomers increases the coverage, as interaction databases and publications found through text mining will often refer to compounds without specifying the stereoisomer. The database is accessible at http://stitch.embl.de/.
Collapse
Affiliation(s)
- Michael Kuhn
- Biotechnology Center, TU Dresden, 01062 Dresden, Germany.
| | | | | | | | | | | |
Collapse
|