1
|
Rodríguez-Pérez R, Bajorath J. Evolution of Support Vector Machine and Regression Modeling in Chemoinformatics and Drug Discovery. J Comput Aided Mol Des 2022; 36:355-362. [PMID: 35304657 PMCID: PMC9325859 DOI: 10.1007/s10822-022-00442-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 02/15/2022] [Indexed: 11/05/2022]
Abstract
The support vector machine (SVM) algorithm is one of the most widely used machine learning (ML) methods for predicting active compounds and molecular properties. In chemoinformatics and drug discovery, SVM has been a state-of-the-art ML approach for more than a decade. A unique attribute of SVM is that it operates in feature spaces of increasing dimensionality. Hence, SVM conceptually departs from the paradigm of low dimensionality that applies to many other methods for chemical space navigation. The SVM approach is applicable to compound classification, and ranking, multi-class predictions, and –in algorithmically modified form– regression modeling. In the emerging era of deep learning (DL), SVM retains its relevance as one of the premier ML methods in chemoinformatics, for reasons discussed herein. We describe the SVM methodology including strengths and weaknesses and discuss selected applications that have contributed to the evolution of SVM as a premier approach for compound classification, property predictions, and virtual compound screening.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115, Bonn, Germany.,Novartis Institutes for Biomedical Research, Novartis Campus, CH-4002, Basel, Switzerland
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115, Bonn, Germany. .,Novartis Institutes for Biomedical Research, Novartis Campus, CH-4002, Basel, Switzerland.
| |
Collapse
|
2
|
Binding site identification of G protein-coupled receptors through a 3D Zernike polynomials-based method: application to C. elegans olfactory receptors. J Comput Aided Mol Des 2022; 36:11-24. [PMID: 34977999 PMCID: PMC8831295 DOI: 10.1007/s10822-021-00434-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 11/18/2021] [Indexed: 11/01/2022]
Abstract
Studying the binding processes of G protein-coupled receptors (GPCRs) proteins is of particular interest both to better understand the molecular mechanisms that regulate the signaling between the extracellular and intracellular environment and for drug design purposes. In this study, we propose a new computational approach for the identification of the binding site for a specific ligand on a GPCR. The method is based on the Zernike polynomials and performs the ligand-GPCR association through a shape complementarity analysis of the local molecular surfaces. The method is parameter-free and it can distinguish, working on hundreds of experimentally GPCR-ligand complexes, binding pockets from randomly sampled regions on the receptor surface, obtaining an Area Under ROC curve of 0.77. Given its importance both as a model organism and in terms of applications, we thus investigated the olfactory receptors of the C. elegans, building a list of associations between 21 GPCRs belonging to its olfactory neurons and a set of possible ligands. Thus, we can not only carry out rapid and efficient screenings of drugs proposed for GPCRs, key targets in many pathologies, but also we laid the groundwork for computational mutagenesis processes, aimed at increasing or decreasing the binding affinity between ligands and receptors.
Collapse
|
3
|
Ye Q, Hsieh CY, Yang Z, Kang Y, Chen J, Cao D, He S, Hou T. A unified drug-target interaction prediction framework based on knowledge graph and recommendation system. Nat Commun 2021; 12:6775. [PMID: 34811351 PMCID: PMC8635420 DOI: 10.1038/s41467-021-27137-3] [Citation(s) in RCA: 68] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 11/05/2021] [Indexed: 02/06/2023] Open
Abstract
Prediction of drug-target interactions (DTI) plays a vital role in drug development in various areas, such as virtual screening, drug repurposing and identification of potential drug side effects. Despite extensive efforts have been invested in perfecting DTI prediction, existing methods still suffer from the high sparsity of DTI datasets and the cold start problem. Here, we develop KGE_NFM, a unified framework for DTI prediction by combining knowledge graph (KG) and recommendation system. This framework firstly learns a low-dimensional representation for various entities in the KG, and then integrates the multimodal information via neural factorization machine (NFM). KGE_NFM is evaluated under three realistic scenarios, and achieves accurate and robust predictions on four benchmark datasets, especially in the scenario of the cold start for proteins. Our results indicate that KGE_NFM provides valuable insight to integrate KG and recommendation system-based techniques into a unified framework for novel DTI discovery.
Collapse
Affiliation(s)
- Qing Ye
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang China ,grid.13402.340000 0004 1759 700XCollege of Control Science and Engineering, Zhejiang University, Hangzhou, 310027 Zhejiang China ,grid.13402.340000 0004 1759 700XState Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058 China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Shenzhen, 518057 Guangdong China
| | - Ziyi Yang
- Tencent Quantum Laboratory, Shenzhen, 518057 Guangdong China
| | - Yu Kang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang China
| | - Jiming Chen
- grid.13402.340000 0004 1759 700XCollege of Control Science and Engineering, Zhejiang University, Hangzhou, 310027 Zhejiang China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, China.
| | - Shibo He
- College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China. .,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, 310058, China.
| |
Collapse
|
4
|
Recent Advances in In Silico Target Fishing. Molecules 2021; 26:molecules26175124. [PMID: 34500568 PMCID: PMC8433825 DOI: 10.3390/molecules26175124] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 08/14/2021] [Accepted: 08/18/2021] [Indexed: 12/24/2022] Open
Abstract
In silico target fishing, whose aim is to identify possible protein targets for a query molecule, is an emerging approach used in drug discovery due its wide variety of applications. This strategy allows the clarification of mechanism of action and biological activities of compounds whose target is still unknown. Moreover, target fishing can be employed for the identification of off targets of drug candidates, thus recognizing and preventing their possible adverse effects. For these reasons, target fishing has increasingly become a key approach for polypharmacology, drug repurposing, and the identification of new drug targets. While experimental target fishing can be lengthy and difficult to implement, due to the plethora of interactions that may occur for a single small-molecule with different protein targets, an in silico approach can be quicker, less expensive, more efficient for specific protein structures, and thus easier to employ. Moreover, the possibility to use it in combination with docking and virtual screening studies, as well as the increasing number of web-based tools that have been recently developed, make target fishing a more appealing method for drug discovery. It is especially worth underlining the increasing implementation of machine learning in this field, both as a main target fishing approach and as a further development of already applied strategies. This review reports on the main in silico target fishing strategies, belonging to both ligand-based and receptor-based approaches, developed and applied in the last years, with a particular attention to the different web tools freely accessible by the scientific community for performing target fishing studies.
Collapse
|
5
|
Kimber TB, Chen Y, Volkamer A. Deep Learning in Virtual Screening: Recent Applications and Developments. Int J Mol Sci 2021; 22:4435. [PMID: 33922714 PMCID: PMC8123040 DOI: 10.3390/ijms22094435] [Citation(s) in RCA: 65] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 04/13/2021] [Accepted: 04/14/2021] [Indexed: 01/03/2023] Open
Abstract
Drug discovery is a cost and time-intensive process that is often assisted by computational methods, such as virtual screening, to speed up and guide the design of new compounds. For many years, machine learning methods have been successfully applied in the context of computer-aided drug discovery. Recently, thanks to the rise of novel technologies as well as the increasing amount of available chemical and bioactivity data, deep learning has gained a tremendous impact in rational active compound discovery. Herein, recent applications and developments of machine learning, with a focus on deep learning, in virtual screening for active compound design are reviewed. This includes introducing different compound and protein encodings, deep learning techniques as well as frequently used bioactivity and benchmark data sets for model training and testing. Finally, the present state-of-the-art, including the current challenges and emerging problems, are examined and discussed.
Collapse
Affiliation(s)
| | | | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany; (T.B.K.); (Y.C.)
| |
Collapse
|
6
|
Applicability Domain of Active Learning in Chemical Probe Identification: Convergence in Learning from Non-Specific Compounds and Decision Rule Clarification. Molecules 2019; 24:molecules24152716. [PMID: 31357419 PMCID: PMC6696588 DOI: 10.3390/molecules24152716] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 07/19/2019] [Accepted: 07/24/2019] [Indexed: 12/27/2022] Open
Abstract
Efficient identification of chemical probes for the manipulation and understanding of biological systems demands specificity for target proteins. Computational means to optimize candidate compound selection for experimental selectivity evaluation are being sought. The active learning virtual screening method has demonstrated the ability to efficiently converge on predictive models with reduced datasets, though its applicability domain to probe identification has yet to be determined. In this article, we challenge active learning’s ability to predict inhibitory bioactivity profiles of selective compounds when learning from chemogenomic features found in non-selective ligand-target pairs. Comparison of controls versus multiple molecule representations de-convolutes factors contributing to predictive capability. Experiments using the matrix metalloproteinase family demonstrate maximum probe bioactivity prediction achieved from only approximately 20% of non-probe bioactivity; this data volume is consistent with prior chemogenomic active learning studies despite the increased difficulty from chemical biology experimental settings used here. Feature weight analyses are combined with a custom visualization to unambiguously detail how active learning arrives at classification decisions, yielding clarified expectations for chemogenomic modeling. The results influence tactical decisions for computational probe design and discovery.
Collapse
|
7
|
Prediction of GPCR-Ligand Binding Using Machine Learning Algorithms. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2018; 2018:6565241. [PMID: 29666662 PMCID: PMC5831789 DOI: 10.1155/2018/6565241] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/04/2017] [Revised: 11/27/2017] [Accepted: 12/18/2017] [Indexed: 01/18/2023]
Abstract
We propose a novel method that predicts binding of G-protein coupled receptors (GPCRs) and ligands. The proposed method uses hub and cycle structures of ligands and amino acid motif sequences of GPCRs, rather than the 3D structure of a receptor or similarity of receptors or ligands. The experimental results show that these new features can be effective in predicting GPCR-ligand binding (average area under the curve [AUC] of 0.944), because they are thought to include hidden properties of good ligand-receptor binding. Using the proposed method, we were able to identify novel ligand-GPCR bindings, some of which are supported by several studies.
Collapse
|
8
|
Liu J, Ning X. Multi-Assay-Based Compound Prioritization via Assistance Utilization: A Machine Learning Framework. J Chem Inf Model 2017; 57:484-498. [PMID: 28234477 DOI: 10.1021/acs.jcim.6b00737] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Junfeng Liu
- Indiana University-Purdue University, Indianapolis, 723 West Michigan St., SL 280, Indianapolis, Indiana 46202, United States
| | - Xia Ning
- Indiana University-Purdue University, Indianapolis, 723 West Michigan St., SL 280, Indianapolis, Indiana 46202, United States
- Center
for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 West 10th St., HITS 5000, Indianapolis, Indiana 46202, United States
| |
Collapse
|
9
|
Shaikh N, Sharma M, Garg P. An improved approach for predicting drug-target interaction: proteochemometrics to molecular docking. MOLECULAR BIOSYSTEMS 2016; 12:1006-14. [PMID: 26822863 DOI: 10.1039/c5mb00650c] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Proteochemometric (PCM) methods, which use descriptors of both the interacting species, i.e. drug and the target, are being successfully employed for the prediction of drug-target interactions (DTI). However, unavailability of non-interacting dataset and determining the applicability domain (AD) of model are a main concern in PCM modeling. In the present study, traditional PCM modeling was improved by devising novel methodologies for reliable negative dataset generation and fingerprint based AD analysis. In addition, various types of descriptors and classifiers were evaluated for their performance. The Random Forest and Support Vector Machine models outperformed the other classifiers (accuracies >98% and >89% for 10-fold cross validation and external validation, respectively). The type of protein descriptors had negligible effect on the developed models, encouraging the use of sequence-based descriptors over the structure-based descriptors. To establish the practical utility of built models, targets were predicted for approved anticancer drugs of natural origin. The molecular recognition interactions between the predicted drug-target pair were quantified with the help of a reverse molecular docking approach. The majority of predicted targets are known for anticancer therapy. These results thus correlate well with anticancer potential of the selected drugs. Interestingly, out of all predicted DTIs, thirty were found to be reported in the ChEMBL database, further validating the adopted methodology. The outcome of this study suggests that the proposed approach, involving use of the improved PCM methodology and molecular docking, can be successfully employed to elucidate the intricate mode of action for drug molecules as well as repositioning them for new therapeutic applications.
Collapse
Affiliation(s)
- Naeem Shaikh
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER), S. A. S. Nagar, Punjab 160062, India.
| | - Mahesh Sharma
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER), S. A. S. Nagar, Punjab 160062, India.
| | - Prabha Garg
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER), S. A. S. Nagar, Punjab 160062, India.
| |
Collapse
|
10
|
Öztürk H, Ozkirimli E, Özgür A. A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction. BMC Bioinformatics 2016; 17:128. [PMID: 26987649 PMCID: PMC4797122 DOI: 10.1186/s12859-016-0977-x] [Citation(s) in RCA: 70] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 03/03/2016] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Molecular structures can be represented as strings of special characters using SMILES. Since each molecule is represented as a string, the similarity between compounds can be computed using SMILES-based string similarity functions. Most previous studies on drug-target interaction prediction use 2D-based compound similarity kernels such as SIMCOMP. To the best of our knowledge, using SMILES-based similarity functions, which are computationally more efficient than the 2D-based kernels, has not been investigated for this task before. RESULTS In this study, we adapt and evaluate various SMILES-based similarity methods for drug-target interaction prediction. In addition, inspired by the vector space model of Information Retrieval we propose cosine similarity based SMILES kernels that make use of the Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) weighting approaches. We also investigate generating composite kernels by combining our best SMILES-based similarity functions with the SIMCOMP kernel. With this study, we provided a comparison of 13 different ligand similarity functions, each of which utilizes the SMILES string of molecule representation. Additionally, TF and TF-IDF based cosine similarity kernels are proposed. CONCLUSION The more efficient SMILES-based similarity functions performed similarly to the more complex 2D-based SIMCOMP kernel in terms of AUC-ROC scores. The TF-IDF based cosine similarity obtained a better AUC-PR score than the SIMCOMP kernel on the GPCR benchmark data set. The composite kernel of TF-IDF based cosine similarity and SIMCOMP achieved the best AUC-PR scores for all data sets.
Collapse
Affiliation(s)
- Hakime Öztürk
- Department of Computer Engineering, Bogazici University, Bebek, Istanbul, 34342, Turkey.
| | - Elif Ozkirimli
- Department of Computer Engineering, Bogazici University, Bebek, Istanbul, 34342, Turkey.
| | - Arzucan Özgür
- Department of Computer Engineering, Bogazici University, Bebek, Istanbul, 34342, Turkey.
| |
Collapse
|
11
|
Balfer J, Bajorath J. Visualization and Interpretation of Support Vector Machine Activity Predictions. J Chem Inf Model 2015; 55:1136-47. [DOI: 10.1021/acs.jcim.5b00175] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Jenny Balfer
- Department of Life Science
Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal
Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science
Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal
Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany
| |
Collapse
|
12
|
Jasial S, Balfer J, Vogt M, Bajorath J. Determination of Meta-Parameters for Support Vector Machine Linear Combinations. Mol Inform 2015; 34:127-33. [DOI: 10.1002/minf.201400163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 12/16/2014] [Indexed: 11/05/2022]
|
13
|
Dörr A, Rosenbaum L, Zell A. A ranking method for the concurrent learning of compounds with various activity profiles. J Cheminform 2015; 7:2. [PMID: 25643067 PMCID: PMC4306736 DOI: 10.1186/s13321-014-0050-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 12/11/2014] [Indexed: 11/30/2022] Open
Abstract
Background In this study, we present a SVM-based ranking algorithm for the concurrent learning of compounds with different activity profiles and their varying prioritization. To this end, a specific labeling of each compound was elaborated in order to infer virtual screening models against multiple targets. We compared the method with several state-of-the-art SVM classification techniques that are capable of inferring multi-target screening models on three chemical data sets (cytochrome P450s, dehydrogenases, and a trypsin-like protease data set) containing three different biological targets each. Results The experiments show that ranking-based algorithms show an increased performance for single- and multi-target virtual screening. Moreover, compounds that do not completely fulfill the desired activity profile are still ranked higher than decoys or compounds with an entirely undesired profile, compared to other multi-target SVM methods. Conclusions SVM-based ranking methods constitute a valuable approach for virtual screening in multi-target drug design. The utilization of such methods is most helpful when dealing with compounds with various activity profiles and the finding of many ligands with an already perfectly matching activity profile is not to be expected. Electronic supplementary material The online version of this article (doi:10.1186/s13321-014-0050-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alexander Dörr
- Center for Bioinformatics Tübingen (ZBIT), University of Tuebingen, Sand 1, Tübingen, 72076 Germany
| | - Lars Rosenbaum
- Center for Bioinformatics Tübingen (ZBIT), University of Tuebingen, Sand 1, Tübingen, 72076 Germany
| | - Andreas Zell
- Center for Bioinformatics Tübingen (ZBIT), University of Tuebingen, Sand 1, Tübingen, 72076 Germany
| |
Collapse
|
14
|
Manoharan P, Chennoju K, Ghoshal N. Target specific proteochemometric model development for BACE1 – protein flexibility and structural water are critical in virtual screening. MOLECULAR BIOSYSTEMS 2015; 11:1955-72. [DOI: 10.1039/c5mb00088b] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Structural water and protein plasticity are important factors for BACE1 targeted ligand virtual screening.
Collapse
Affiliation(s)
- Prabu Manoharan
- Structural Biology and Bioinformatics Division
- CSIR-Indian Institute of Chemical Biology
- Kolkata 700032
- India
| | - Kiranmai Chennoju
- National Institute of Pharmaceutical Education and Research
- Kolkata 700032
- India
| | - Nanda Ghoshal
- Structural Biology and Bioinformatics Division
- CSIR-Indian Institute of Chemical Biology
- Kolkata 700032
- India
| |
Collapse
|
15
|
Cortés-Ciriano I, Ain QU, Subramanian V, Lenselink EB, Méndez-Lucio O, IJzerman AP, Wohlfahrt G, Prusis P, Malliavin TE, van Westen GJP, Bender A. Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. MEDCHEMCOMM 2015. [DOI: 10.1039/c4md00216d] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Proteochemometric (PCM) modelling is a computational method to model the bioactivity of multiple ligands against multiple related protein targets simultaneously.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Unité de Bioinformatique Structurale
- Institut Pasteur and CNRS UMR 3825
- Structural Biology and Chemistry Department
- 75 724 Paris
- France
| | - Qurrat Ul Ain
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| | | | - Eelke B. Lenselink
- Division of Medicinal Chemistry
- Leiden Academic Centre for Drug Research
- Leiden
- The Netherlands
| | - Oscar Méndez-Lucio
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| | - Adriaan P. IJzerman
- Division of Medicinal Chemistry
- Leiden Academic Centre for Drug Research
- Leiden
- The Netherlands
| | - Gerd Wohlfahrt
- Computer-Aided Drug Design
- Orion Pharma
- FIN-02101 Espoo
- Finland
| | - Peteris Prusis
- Computer-Aided Drug Design
- Orion Pharma
- FIN-02101 Espoo
- Finland
| | - Thérèse E. Malliavin
- Unité de Bioinformatique Structurale
- Institut Pasteur and CNRS UMR 3825
- Structural Biology and Chemistry Department
- 75 724 Paris
- France
| | - Gerard J. P. van Westen
- European Molecular Biology Laboratory
- European Bioinformatics Institute
- Wellcome Trust Genome Campus
- Hinxton
- UK
| | - Andreas Bender
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| |
Collapse
|
16
|
Korkmaz S, Zararsiz G, Goksuluk D. Drug/nondrug classification using Support Vector Machines with various feature selection strategies. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 117:51-60. [PMID: 25224081 DOI: 10.1016/j.cmpb.2014.08.009] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 08/15/2014] [Accepted: 08/27/2014] [Indexed: 06/03/2023]
Abstract
In conjunction with the advance in computer technology, virtual screening of small molecules has been started to use in drug discovery. Since there are thousands of compounds in early-phase of drug discovery, a fast classification method, which can distinguish between active and inactive molecules, can be used for screening large compound collections. In this study, we used Support Vector Machines (SVM) for this type of classification task. SVM is a powerful classification tool that is becoming increasingly popular in various machine-learning applications. The data sets consist of 631 compounds for training set and 216 compounds for a separate test set. In data pre-processing step, the Pearson's correlation coefficient used as a filter to eliminate redundant features. After application of the correlation filter, a single SVM has been applied to this reduced data set. Moreover, we have investigated the performance of SVM with different feature selection strategies, including SVM-Recursive Feature Elimination, Wrapper Method and Subset Selection. All feature selection methods generally represent better performance than a single SVM while Subset Selection outperforms other feature selection methods. We have tested SVM as a classification tool in a real-life drug discovery problem and our results revealed that it could be a useful method for classification task in early-phase of drug discovery.
Collapse
Affiliation(s)
- Selcuk Korkmaz
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey.
| | - Gokmen Zararsiz
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey
| | - Dincer Goksuluk
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey
| |
Collapse
|
17
|
Sugaya N. Ligand efficiency-based support vector regression models for predicting bioactivities of ligands to drug target proteins. J Chem Inf Model 2014; 54:2751-63. [PMID: 25220713 DOI: 10.1021/ci5003262] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
The concept of ligand efficiency (LE) indices is widely accepted throughout the drug design community and is frequently used in a retrospective manner in the process of drug development. For example, LE indices are used to investigate LE optimization processes of already-approved drugs and to re-evaluate hit compounds obtained from structure-based virtual screening methods and/or high-throughput experimental assays. However, LE indices could also be applied in a prospective manner to explore drug candidates. Here, we describe the construction of machine learning-based regression models in which LE indices are adopted as an end point and show that LE-based regression models can outperform regression models based on pIC50 values. In addition to pIC50 values traditionally used in machine learning studies based on chemogenomics data, three representative LE indices (ligand lipophilicity efficiency (LLE), binding efficiency index (BEI), and surface efficiency index (SEI)) were adopted, then used to create four types of training data. We constructed regression models by applying a support vector regression (SVR) method to the training data. In cross-validation tests of the SVR models, the LE-based SVR models showed higher correlations between the observed and predicted values than the pIC50-based models. Application tests to new data displayed that, generally, the predictive performance of SVR models follows the order SEI > BEI > LLE > pIC50. Close examination of the distributions of the activity values (pIC50, LLE, BEI, and SEI) in the training and validation data implied that the performance order of the SVR models may be ascribed to the much higher diversity of the LE-based training and validation data. In the application tests, the LE-based SVR models can offer better predictive performance of compound-protein pairs with a wider range of ligand potencies than the pIC50-based models. This finding strongly suggests that LE-based SVR models are better than pIC50-based models at predicting bioactivities of compounds that could exhibit a much higher (or lower) potency.
Collapse
Affiliation(s)
- Nobuyoshi Sugaya
- Drug Discovery Department, Research & Development Division, PharmaDesign, Inc. , Hatchobori 2-19-8, Chuo-ku, Tokyo 104-0032, Japan
| |
Collapse
|
18
|
|
19
|
Brown JB, Niijima S, Okuno Y. CompoundProtein Interaction Prediction Within Chemogenomics: Theoretical Concepts, Practical Usage, and Future Directions. Mol Inform 2013; 32:906-21. [DOI: 10.1002/minf.201300101] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Accepted: 08/06/2013] [Indexed: 11/08/2022]
|
20
|
Sugaya N. Training based on ligand efficiency improves prediction of bioactivities of ligands and drug target proteins in a machine learning approach. J Chem Inf Model 2013; 53:2525-37. [PMID: 24020509 DOI: 10.1021/ci400240u] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Machine learning methods based on ligand-protein interaction data in bioactivity databases are one of the current strategies for efficiently finding novel lead compounds as the first step in the drug discovery process. Although previous machine learning studies have succeeded in predicting novel ligand-protein interactions with high performance, all of the previous studies to date have been heavily dependent on the simple use of raw bioactivity data of ligand potencies measured by IC50, EC50, K(i), and K(d) deposited in databases. ChEMBL provides us with a unique opportunity to investigate whether a machine-learning-based classifier created by reflecting ligand efficiency other than the IC50, EC50, K(i), and Kd values can also offer high predictive performance. Here we report that classifiers created from training data based on ligand efficiency show higher performance than those from data based on IC50 or K(i) values. Utilizing GPCRSARfari and KinaseSARfari databases in ChEMBL, we created IC50- or K(i)-based training data and binding efficiency index (BEI) based training data then constructed classifiers using support vector machines (SVMs). The SVM classifiers from the BEI-based training data showed slightly higher area under curve (AUC), accuracy, sensitivity, and specificity in the cross-validation tests. Application of the classifiers to the validation data demonstrated that the AUCs and specificities of the BEI-based classifiers dramatically increased in comparison with the IC50- or K(i)-based classifiers. The improvement of the predictive power by the BEI-based classifiers can be attributed to (i) the more separated distributions of positives and negatives, (ii) the higher diversity of negatives in the BEI-based training data in a feature space of SVMs, and (iii) a more balanced number of positives and negatives in the BEI-based training data. These results strongly suggest that training data based on ligand efficiency as well as data based on classical IC50, EC50, K(d), and K(i) values are important when creating a classifier using a machine learning approach based on bioactivity data.
Collapse
Affiliation(s)
- Nobuyoshi Sugaya
- Drug Discovery Department, Research & Development Division, PharmaDesign, Inc. , Hatchobori 2-19-8, Chuo-ku, Tokyo, 104-0032, Japan
| |
Collapse
|
21
|
The continuous molecular fields approach to building 3D-QSAR models. J Comput Aided Mol Des 2013; 27:427-42. [PMID: 23719959 DOI: 10.1007/s10822-013-9656-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Accepted: 05/22/2013] [Indexed: 10/26/2022]
Abstract
The continuous molecular fields (CMF) approach is based on the application of continuous functions for the description of molecular fields instead of finite sets of molecular descriptors (such as interaction energies computed at grid nodes) commonly used for this purpose. These functions can be encapsulated into kernels and combined with kernel-based machine learning algorithms to provide a variety of novel methods for building classification and regression structure-activity models, visualizing chemical datasets and conducting virtual screening. In this article, the CMF approach is applied to building 3D-QSAR models for 8 datasets through the use of five types of molecular fields (the electrostatic, steric, hydrophobic, hydrogen-bond acceptor and donor ones), the linear convolution molecular kernel with the contribution of each atom approximated with a single isotropic Gaussian function, and the kernel ridge regression data analysis technique. It is shown that the CMF approach even in this simplest form provides either comparable or enhanced predictive performance in comparison with state-of-the-art 3D-QSAR methods.
Collapse
|
22
|
Heikamp K, Bajorath J. Prediction of Compounds with Closely Related Activity Profiles Using Weighted Support Vector Machine Linear Combinations. J Chem Inf Model 2013; 53:791-801. [DOI: 10.1021/ci400090t] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Kathrin Heikamp
- Department
of Life Science Informatics, B-IT, LIMES
Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr.
2, D-53113 Bonn, Germany
| | - Jürgen Bajorath
- Department
of Life Science Informatics, B-IT, LIMES
Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr.
2, D-53113 Bonn, Germany
| |
Collapse
|
23
|
Wang L, Ma C, Wipf P, Liu H, Su W, Xie XQ. TargetHunter: an in silico target identification tool for predicting therapeutic potential of small organic molecules based on chemogenomic database. AAPS JOURNAL 2013; 15:395-406. [PMID: 23292636 DOI: 10.1208/s12248-012-9449-z] [Citation(s) in RCA: 131] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Accepted: 12/10/2012] [Indexed: 02/08/2023]
Abstract
Target identification of the known bioactive compounds and novel synthetic analogs is a very important research field in medicinal chemistry, biochemistry, and pharmacology. It is also a challenging and costly step towards chemical biology and phenotypic screening. In silico identification of potential biological targets for chemical compounds offers an alternative avenue for the exploration of ligand-target interactions and biochemical mechanisms, as well as for investigation of drug repurposing. Computational target fishing mines biologically annotated chemical databases and then maps compound structures into chemogenomical space in order to predict the biological targets. We summarize the recent advances and applications in computational target fishing, such as chemical similarity searching, data mining/machine learning, panel docking, and the bioactivity spectral analysis for target identification. We then described in detail a new web-based target prediction tool, TargetHunter (http://www.cbligand.org/TargetHunter). This web portal implements a novel in silico target prediction algorithm, the Targets Associated with its MOst SImilar Counterparts, by exploring the largest chemogenomical databases, ChEMBL. Prediction accuracy reached 91.1% from the top 3 guesses on a subset of high-potency compounds from the ChEMBL database, which outperformed a published algorithm, multiple-category models. TargetHunter also features an embedded geography tool, BioassayGeoMap, developed to allow the user easily to search for potential collaborators that can experimentally validate the predicted biological target(s) or off target(s). TargetHunter therefore provides a promising alternative to bridge the knowledge gap between biology and chemistry, and significantly boost the productivity of chemogenomics researchers for in silico drug design and discovery.
Collapse
Affiliation(s)
- Lirong Wang
- Department of Pharmaceutical Sciences, School of Pharmacy, Computational Chemical Genomics Screening Center, Pittsburgh, PA, USA
| | | | | | | | | | | |
Collapse
|
24
|
Wu D, Huang Q, Zhang Y, Zhang Q, Liu Q, Gao J, Cao Z, Zhu R. Screening of selective histone deacetylase inhibitors by proteochemometric modeling. BMC Bioinformatics 2012; 13:212. [PMID: 22913517 PMCID: PMC3542186 DOI: 10.1186/1471-2105-13-212] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Accepted: 08/16/2012] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Histone deacetylase (HDAC) is a novel target for the treatment of cancer and it can be classified into three classes, i.e., classes I, II, and IV. The inhibitors selectively targeting individual HDAC have been proved to be the better candidate antitumor drugs. To screen selective HDAC inhibitors, several proteochemometric (PCM) models based on different combinations of three kinds of protein descriptors, two kinds of ligand descriptors and multiplication cross-terms were constructed in our study. RESULTS The results show that structure similarity descriptors are better than sequence similarity descriptors and geometry descriptors in the leftacterization of HDACs. Furthermore, the predictive ability was not improved by introducing the cross-terms in our models. Finally, a best PCM model based on protein structure similarity descriptors and 32-dimensional general descriptors was derived (R2 = 0.9897, Qtest2 = 0.7542), which shows a powerful ability to screen selective HDAC inhibitors. CONCLUSIONS Our best model not only predict the activities of inhibitors for each HDAC isoform, but also screen and distinguish class-selective inhibitors and even more isoform-selective inhibitors, thus it provides a potential way to discover or design novel candidate antitumor drugs with reduced side effect.
Collapse
Affiliation(s)
- Dingfeng Wu
- School of Life Sciences and Technology, Tongji University, Shanghai, 200092, P.R. China
| | - Qi Huang
- School of Life Sciences and Technology, Tongji University, Shanghai, 200092, P.R. China
| | - Yida Zhang
- School of Life Sciences and Technology, Tongji University, Shanghai, 200092, P.R. China
| | - Qingchen Zhang
- School of Life Sciences and Technology, Tongji University, Shanghai, 200092, P.R. China
| | - Qi Liu
- School of Life Sciences and Technology, Tongji University, Shanghai, 200092, P.R. China
| | - Jun Gao
- School of Life Sciences and Technology, Tongji University, Shanghai, 200092, P.R. China
- School of Information Engineering, Shanghai Maritime University, Shanghai, 201306, P.R. China
| | - Zhiwei Cao
- School of Life Sciences and Technology, Tongji University, Shanghai, 200092, P.R. China
| | - Ruixin Zhu
- School of Life Sciences and Technology, Tongji University, Shanghai, 200092, P.R. China
- Institute for Advanced Study of Translational Medicine, Tongji University, Shanghai, 200092, P.R. China
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, Liaoning, 116600, P.R. China
| |
Collapse
|
25
|
Cheng F, Zhou Y, Li J, Li W, Liu G, Tang Y. Prediction of chemical-protein interactions: multitarget-QSAR versus computational chemogenomic methods. MOLECULAR BIOSYSTEMS 2012; 8:2373-84. [PMID: 22751809 DOI: 10.1039/c2mb25110h] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Elucidation of chemical-protein interactions (CPI) is the basis of target identification and drug discovery. It is time-consuming and costly to determine CPI experimentally, and computational methods will facilitate the determination of CPI. In this study, two methods, multitarget quantitative structure-activity relationship (mt-QSAR) and computational chemogenomics, were developed for CPI prediction. Two comprehensive data sets were collected from the ChEMBL database for method assessment. One data set consisted of 81 689 CPI pairs among 50 924 compounds and 136 G-protein coupled receptors (GPCRs), while the other one contained 43 965 CPI pairs among 23 376 compounds and 176 kinases. The range of the area under the receiver operating characteristic curve (AUC) for the test sets was 0.95 to 1.0 and 0.82 to 1.0 for 100 GPCR mt-QSAR models and 100 kinase mt-QSAR models, respectively. The AUC of 5-fold cross validation were about 0.92 for both 176 kinases and 136 GPCRs using the chemogenomic method. However, the performance of the chemogenomic method was worse than that of mt-QSAR for the external validation set. Further analysis revealed that there was a high false positive rate for the external validation set when using the chemogenomic method. In addition, we developed a web server named CPI-Predictor, , which is available for free. The methods and tool have potential applications in network pharmacology and drug repositioning.
Collapse
Affiliation(s)
- Feixiong Cheng
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | | | | | | | | | | |
Collapse
|
26
|
Varnek A, Baskin I. Machine learning methods for property prediction in chemoinformatics: Quo Vadis? J Chem Inf Model 2012; 52:1413-37. [PMID: 22582859 DOI: 10.1021/ci200409x] [Citation(s) in RCA: 148] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
This paper is focused on modern approaches to machine learning, most of which are as yet used infrequently or not at all in chemoinformatics. Machine learning methods are characterized in terms of the "modes of statistical inference" and "modeling levels" nomenclature and by considering different facets of the modeling with respect to input/ouput matching, data types, models duality, and models inference. Particular attention is paid to new approaches and concepts that may provide efficient solutions of common problems in chemoinformatics: improvement of predictive performance of structure-property (activity) models, generation of structures possessing desirable properties, model applicability domain, modeling of properties with functional endpoints (e.g., phase diagrams and dose-response curves), and accounting for multiple molecular species (e.g., conformers or tautomers).
Collapse
Affiliation(s)
- Alexandre Varnek
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4, rue B. Pascal, Strasbourg 67000, France.
| | | |
Collapse
|
27
|
Anderson PC, De Sapio V, Turner KB, Elmer SP, Roe DC, Schoeniger JS. Identification of binding specificity-determining features in protein families. J Med Chem 2012; 55:1926-39. [PMID: 22289061 DOI: 10.1021/jm200979x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
We present a new approach for identifying features of ligand-protein binding interfaces that predict binding selectivity and demonstrate its effectiveness for predicting kinase inhibitor specificity. We analyzed a large set of human kinases and kinase inhibitors using clustering of experimentally determined inhibition constants (to define specificity classes of kinases and inhibitors) and virtual ligand docking (to extract structural and chemical features of the ligand-protein binding interfaces). We then used statistical methods to identify features characteristic of each class. Machine learning was employed to determine which combinations of characteristic features were predictive of class membership and to predict binding specificities and affinities of new compounds. Experiments showed predictions were 70% accurate. These results show that our method can automatically pinpoint on the three-dimensional binding interfaces pharmacophore-like features that act as "selectivity filters". The method is not restricted to kinases, requires no prior hypotheses about specific interactions, and can be applied to any protein families for which sets of structures and ligand binding data are available.
Collapse
Affiliation(s)
- Peter C Anderson
- Sandia National Laboratories, Box 969, MS 9291, Livermore, California 94551, USA
| | | | | | | | | | | |
Collapse
|
28
|
Wang F, Liu D, Wang H, Luo C, Zheng M, Liu H, Zhu W, Luo X, Zhang J, Jiang H. Computational Screening for Active Compounds Targeting Protein Sequences: Methodology and Experimental Validation. J Chem Inf Model 2011; 51:2821-8. [DOI: 10.1021/ci200264h] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Fei Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 555 Zu Chong Zhi Road, Shanghai, 201203, China
| | - Dongxiang Liu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 555 Zu Chong Zhi Road, Shanghai, 201203, China
| | - Heyao Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 555 Zu Chong Zhi Road, Shanghai, 201203, China
| | - Cheng Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 555 Zu Chong Zhi Road, Shanghai, 201203, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 555 Zu Chong Zhi Road, Shanghai, 201203, China
| | - Hong Liu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 555 Zu Chong Zhi Road, Shanghai, 201203, China
| | - Weiliang Zhu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 555 Zu Chong Zhi Road, Shanghai, 201203, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 555 Zu Chong Zhi Road, Shanghai, 201203, China
| | - Jian Zhang
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao-Tong University, School of Medicine, Shanghai, 200025, China
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 555 Zu Chong Zhi Road, Shanghai, 201203, China
- School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| |
Collapse
|
29
|
Meslamani J, Rognan D. Enhancing the Accuracy of Chemogenomic Models with a Three-Dimensional Binding Site Kernel. J Chem Inf Model 2011; 51:1593-603. [DOI: 10.1021/ci200166t] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jamel Meslamani
- Structural Chemogenomics, Laboratory of Therapeutical Innovation, UMR 7200 CNRS, University of Strasbourg, F-67400 Illkirch, France
| | - Didier Rognan
- Structural Chemogenomics, Laboratory of Therapeutical Innovation, UMR 7200 CNRS, University of Strasbourg, F-67400 Illkirch, France
| |
Collapse
|
30
|
González-Díaz H, Prado-Prado F, Sobarzo-Sánchez E, Haddad M, Maurel Chevalley S, Valentin A, Quetin-Leclercq J, Dea-Ayuela MA, Teresa Gomez-Muños M, Munteanu CR, José Torres-Labandeira J, García-Mera X, Tapia RA, Ubeira FM. NL MIND-BEST: A web server for ligands and proteins discovery—Theoretic-experimental study of proteins of Giardia lamblia and new compounds active against Plasmodium falciparum. J Theor Biol 2011; 276:229-49. [DOI: 10.1016/j.jtbi.2011.01.010] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2010] [Revised: 12/02/2010] [Accepted: 01/10/2011] [Indexed: 10/18/2022]
|
31
|
Ranu S, Singh AK. Novel Method for Pharmacophore Analysis by Examining the Joint Pharmacophore Space. J Chem Inf Model 2011; 51:1106-21. [DOI: 10.1021/ci100503y] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Sayan Ranu
- Department of Computer Science, University of California, Santa Barbara, Santa Barbara, California, United States
| | - Ambuj K. Singh
- Department of Computer Science, University of California, Santa Barbara, Santa Barbara, California, United States
| |
Collapse
|
32
|
Varnek A, Baskin II. Chemoinformatics as a Theoretical Chemistry Discipline. Mol Inform 2011; 30:20-32. [PMID: 27467875 DOI: 10.1002/minf.201000100] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2010] [Accepted: 01/14/2011] [Indexed: 01/29/2023]
Abstract
Here, chemoinformatics is considered as a theoretical chemistry discipline complementary to quantum chemistry and force-field molecular modeling. These three fields are compared with respect to molecular representation, inference mechanisms, basic concepts and application areas. A chemical space, a fundamental concept of chemoinformatics, is considered with respect to complex relations between chemical objects (graphs or descriptor vectors). Statistical Learning Theory, one of the main mathematical approaches in structure-property modeling, is briefly reviewed. Links between chemoinformatics and its "sister" fields - machine learning, chemometrics and bioinformatics are discussed.
Collapse
Affiliation(s)
- Alexandre Varnek
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4, rue B. Pascal, Strasbourg 67000, France.
| | - Igor I Baskin
- Department of Chemistry, Moscow State University, Moscow 119991, Russia
| |
Collapse
|
33
|
van Westen GJP, Wegner JK, IJzerman AP, van Vlijmen HWT, Bender A. Proteochemometric modeling as a tool to design selective compounds and for extrapolating to novel targets. MEDCHEMCOMM 2011. [DOI: 10.1039/c0md00165a] [Citation(s) in RCA: 123] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Proteochemometric modeling is founded on the principles of QSAR but is able to benefit from additional information in model training due to the inclusion of target information.
Collapse
Affiliation(s)
- Gerard J. P. van Westen
- Division of Medicinal Chemistry
- Leiden/Amsterdam Center for Drug Research
- Leiden
- The Netherlands
| | | | - Adriaan P. IJzerman
- Division of Medicinal Chemistry
- Leiden/Amsterdam Center for Drug Research
- Leiden
- The Netherlands
| | - Herman W. T. van Vlijmen
- Division of Medicinal Chemistry
- Leiden/Amsterdam Center for Drug Research
- Leiden
- The Netherlands
- Tibotec BVBA
| | - A. Bender
- Division of Medicinal Chemistry
- Leiden/Amsterdam Center for Drug Research
- Leiden
- The Netherlands
- Unilever Centre for Molecular Science Informatics
| |
Collapse
|
34
|
Ning X, Karypis G. In silico structure-activity-relationship (SAR) models from machine learning: a review. Drug Dev Res 2010. [DOI: 10.1002/ddr.20410] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
35
|
Li N, Thompson S, Jiang H, Lieberman PM, Luo C. Development of drugs for Epstein-Barr virus using high-throughput in silico virtual screening. Expert Opin Drug Discov 2010; 5:1189-203. [PMID: 22822721 PMCID: PMC3816986 DOI: 10.1517/17460441.2010.524640] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
IMPORTANCE OF THE FIELD Epstein-Barr virus (EBV) is a ubiquitous human herpesvirus that is causally associated with endemic forms of Burkitt's lymphoma, nasopharyngeal carcinoma and lymphoproliferative disease in immunosuppressed individuals. On a global scale, EBV infects > 90% of the adult population and is responsible for ∼ 1% of all human cancers. To date, there is no efficacious drug or therapy for the treatment of EBV infection and EBV-related diseases. AREAS COVERED IN THIS REVIEW In this review, we discuss the existing anti-EBV inhibitors and those under development. We discuss the value of different molecular targets, including EBV lytic DNA replication enzymes as well as proteins that are expressed exclusively during latent infection, such as EBV nuclear antigen 1 (EBNA-1) and latent membrane protein 1. As the atomic structure of the EBNA-1 DNA binding domain has been described, it is an attractive target for in silico methods of drug design and small molecule screening. We discuss the use of computational methods that can greatly facilitate the development of novel inhibitors and how in silico screening methods can be applied to target proteins with known structures, such as EBNA-1, to treat EBV infection and disease. WHAT THE READER WILL GAIN The reader is familiarized with the problems in targeting of EBV for inhibition by small molecules and how computational methods can greatly facilitate this process. TAKE HOME MESSAGE Despite the impressive efficacy of nucleoside analogs for the treatment of herpesvirus lytic infection, there remain few effective treatments for latent infections. As EBV latent infection persists within and contributes to the formation of EBV-associated cancers, targeting EBV latent proteins is an unmet medical need. High-throughput in silico screening can accelerate the process of drug discovery for novel and selective agents that inhibit EBV latent infection and associated disease.
Collapse
Affiliation(s)
- Ning Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | | | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | | | - Cheng Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- Center for Systems Biology, Soochow University, Jiangsu 215006, China
| |
Collapse
|
36
|
Wassermann AM, Heikamp K, Bajorath J. Potency-Directed Similarity Searching Using Support Vector Machines. Chem Biol Drug Des 2010; 77:30-8. [DOI: 10.1111/j.1747-0285.2010.01059.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
37
|
Agarwal S, Dugar D, Sengupta S. Ranking chemical structures for drug discovery: a new machine learning approach. J Chem Inf Model 2010; 50:716-31. [PMID: 20387860 DOI: 10.1021/ci9003865] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
With chemical libraries increasingly containing millions of compounds or more, there is a fast-growing need for computational methods that can rank or prioritize compounds for screening. Machine learning methods have shown considerable promise for this task; indeed, classification methods such as support vector machines (SVMs), together with their variants, have been used in virtual screening to distinguish active compounds from inactive ones, while regression methods such as partial least-squares (PLS) and support vector regression (SVR) have been used in quantitative structure-activity relationship (QSAR) analysis for predicting biological activities of compounds. Recently, a new class of machine learning methods - namely, ranking methods, which are designed to directly optimize ranking performance - have been developed for ranking tasks such as web search that arise in information retrieval (IR) and other applications. Here we report the application of these new ranking methods in machine learning to the task of ranking chemical structures. Our experiments show that the new ranking methods give better ranking performance than both classification based methods in virtual screening and regression methods in QSAR analysis. We also make some interesting connections between ranking performance measures used in cheminformatics and those used in IR studies.
Collapse
Affiliation(s)
- Shivani Agarwal
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
| | | | | |
Collapse
|
38
|
Strömbergsson H, Lapins M, Kleywegt GJ, Wikberg JES. Towards Proteome-Wide Interaction Models Using the Proteochemometrics Approach. Mol Inform 2010; 29:499-508. [PMID: 27463328 DOI: 10.1002/minf.201000052] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2010] [Accepted: 05/25/2010] [Indexed: 02/02/2023]
Abstract
A proteochemometrics model was induced from all interaction data in the BindingDB database, comprizing in all 7078 protein-ligand complexes with representatives from all major drug target categories. Proteins were represented by alignment-independent sequence descriptors holding information on properties such as hydrophobicity, charge, and secondary structure. Ligands were represented by commonly used QSAR descriptors. The inhibition constant (pKi ) values of protein-ligand complexes were discretized into "high" and "low" interaction activity. Different machine-learning techniques were used to induce models relating protein and ligand properties to the interaction activity. The best was decision trees, which gave an accuracy of 80 % and an area under the ROC curve of 0.81. The tree pointed to the protein and ligand properties, which are relevant for the interaction. As the approach does neither require alignments nor knowledge of protein 3D structures virtually all available protein-ligand interaction data could be utilized, thus opening a way to completely general interaction models that may span entire proteomes.
Collapse
Affiliation(s)
- Helena Strömbergsson
- The Linnaeus Centre for Bioinformatics, Department of Cell and Molecular Biology, Biomedical Centre, Box 598, SE-751 24, Uppsala, Sweden.
| | - Maris Lapins
- Department of Pharmaceutical Pharmacology, Biomedical Centre, Box 591, SE-751 24 Uppsala, Sweden
| | - Gerard J Kleywegt
- Department of Cell and Molecular Biology, Biomedical Centre, Box 596, SE-751 24, Uppsala, Sweden
| | - Jarl E S Wikberg
- Department of Pharmaceutical Pharmacology, Biomedical Centre, Box 591, SE-751 24 Uppsala, Sweden
| |
Collapse
|
39
|
Geppert H, Vogt M, Bajorath J. Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation. J Chem Inf Model 2010; 50:205-16. [PMID: 20088575 DOI: 10.1021/ci900419k] [Citation(s) in RCA: 231] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Hanna Geppert
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universitat, Dahlmannstrasse 2, D-53113 Bonn, Germany
| | | | | |
Collapse
|
40
|
Ning X, Rangwala H, Karypis G. Multi-Assay-Based Structure−Activity Relationship Models: Improving Structure−Activity Relationship Models by Incorporating Activity Information from Related Targets. J Chem Inf Model 2009; 49:2444-56. [DOI: 10.1021/ci900182q] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Xia Ning
- Department of Computer Science and Computer Engineering, University of Minnesota, 4-192 EE/CS Building, 200 Union Street SE, Minneapolis, Minnesota 55455 and Department of Computer Science, George Mason University, 4400 University Drive MSN 4A5, Fairfax, Virginia 22030
| | - Huzefa Rangwala
- Department of Computer Science and Computer Engineering, University of Minnesota, 4-192 EE/CS Building, 200 Union Street SE, Minneapolis, Minnesota 55455 and Department of Computer Science, George Mason University, 4400 University Drive MSN 4A5, Fairfax, Virginia 22030
| | - George Karypis
- Department of Computer Science and Computer Engineering, University of Minnesota, 4-192 EE/CS Building, 200 Union Street SE, Minneapolis, Minnesota 55455 and Department of Computer Science, George Mason University, 4400 University Drive MSN 4A5, Fairfax, Virginia 22030
| |
Collapse
|
41
|
Wassermann AM, Geppert H, Bajorath J. Ligand Prediction for Orphan Targets Using Support Vector Machines and Various Target-Ligand Kernels Is Dominated by Nearest Neighbor Effects. J Chem Inf Model 2009; 49:2155-67. [DOI: 10.1021/ci9002624] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Anne Mai Wassermann
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität Bonn, Dahlmannstrasse 2, D-53113 Bonn, Germany
| | - Hanna Geppert
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität Bonn, Dahlmannstrasse 2, D-53113 Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität Bonn, Dahlmannstrasse 2, D-53113 Bonn, Germany
| |
Collapse
|