1
|
Rodríguez-Pérez R, Bajorath J. Evolution of Support Vector Machine and Regression Modeling in Chemoinformatics and Drug Discovery. J Comput Aided Mol Des 2022; 36:355-362. [PMID: 35304657 PMCID: PMC9325859 DOI: 10.1007/s10822-022-00442-9] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 02/15/2022] [Indexed: 11/05/2022]
Abstract
The support vector machine (SVM) algorithm is one of the most widely used machine learning (ML) methods for predicting active compounds and molecular properties. In chemoinformatics and drug discovery, SVM has been a state-of-the-art ML approach for more than a decade. A unique attribute of SVM is that it operates in feature spaces of increasing dimensionality. Hence, SVM conceptually departs from the paradigm of low dimensionality that applies to many other methods for chemical space navigation. The SVM approach is applicable to compound classification, and ranking, multi-class predictions, and –in algorithmically modified form– regression modeling. In the emerging era of deep learning (DL), SVM retains its relevance as one of the premier ML methods in chemoinformatics, for reasons discussed herein. We describe the SVM methodology including strengths and weaknesses and discuss selected applications that have contributed to the evolution of SVM as a premier approach for compound classification, property predictions, and virtual compound screening.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115, Bonn, Germany.,Novartis Institutes for Biomedical Research, Novartis Campus, CH-4002, Basel, Switzerland
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115, Bonn, Germany. .,Novartis Institutes for Biomedical Research, Novartis Campus, CH-4002, Basel, Switzerland.
| |
Collapse
|
2
|
Škuta C, Cortés-Ciriano I, Dehaen W, Kříž P, van Westen GJP, Tetko IV, Bender A, Svozil D. QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping. J Cheminform 2020; 12:39. [PMID: 33431038 PMCID: PMC7260783 DOI: 10.1186/s13321-020-00443-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 05/16/2020] [Indexed: 02/11/2023] Open
Abstract
An affinity fingerprint is the vector consisting of compound’s affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds.![]()
Collapse
Affiliation(s)
- C Škuta
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic
| | - I Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - W Dehaen
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic.,CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - P Kříž
- Department of Mathematics, Faculty of Chemical Engineering, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - G J P van Westen
- Computational Drug Discovery, Drug Discovery and Safety, LACDR, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - I V Tetko
- Helmholtz Zentrum Muenchen - German Research Center for Environmental Health (GmbH) and BIGCHEM GmbH, Ingolstaedter Landstrasse 1, 85764, Neuherberg, Germany
| | - A Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - D Svozil
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic. .,CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic.
| |
Collapse
|
3
|
Abstract
Pharmacological science is trying to establish the link between chemicals, targets, and disease-related phenotypes. A plethora of chemical proteomics and structural data have been generated, thanks to the target-based approach that has dominated drug discovery at the turn of the century. There is an invaluable source of information for in silico target profiling. Prediction is based on the principle of chemical similarity (similar drugs bind similar targets) or on first principles from the biophysics of molecular interactions. In the first case, compound comparison is made through ligand-based chemical similarity search or through classifier-based machine learning approach. The 3D techniques are based on 3D structural descriptors or energy-based scoring scheme to infer a binding affinity of a compound with its putative target. More recently, a new approach based on compound set metric has been proposed in which a query compound is compared with a whole of compounds associated with a target or a family of targets. This chapter reviews the different techniques of in silico target profiling and their main applications such as inference of unwanted targets, drug repurposing, or compound prioritization after phenotypic-based screening campaigns.
Collapse
|
4
|
Fang J, Pang X, Wu P, Yan R, Gao L, Li C, Lian W, Wang Q, Liu AL, Du GH. Molecular Modeling on Berberine Derivatives toward BuChE: An Integrated Study with Quantitative Structure-Activity Relationships Models, Molecular Docking, and Molecular Dynamics Simulations. Chem Biol Drug Des 2016; 87:649-63. [PMID: 26648584 DOI: 10.1111/cbdd.12700] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2015] [Revised: 09/21/2015] [Accepted: 11/20/2015] [Indexed: 12/31/2022]
Abstract
A dataset of 67 berberine derivatives for the inhibition of butyrylcholinesterase (BuChE) was studied based on the combination of quantitative structure-activity relationships models, molecular docking, and molecular dynamics methods. First, a series of berberine derivatives were reported, and their inhibitory activities toward butyrylcholinesterase (BuChE) were evaluated. By 2D- quantitative structure-activity relationships studies, the best model built by partial least-square had a conventional correlation coefficient of the training set (R(2)) of 0.883, a cross-validation correlation coefficient (Qcv2) of 0.777, and a conventional correlation coefficient of the test set (Rpred2) of 0.775. The model was also confirmed by Y-randomization examination. In addition, the molecular docking and molecular dynamics simulation were performed to better elucidate the inhibitory mechanism of three typical berberine derivatives (berberine, C2, and C55) toward BuChE. The predicted binding free energy results were consistent with the experimental data and showed that the van der Waals energy term (ΔEvdw) difference played the most important role in differentiating the activity among the three inhibitors (berberine, C2, and C55). The developed quantitative structure-activity relationships models provide details on the fine relationship linking structure and activity and offer clues for structural modifications, and the molecular simulation helps to understand the inhibitory mechanism of the three typical inhibitors. In conclusion, the results of this study provide useful clues for new drug design and discovery of BuChE inhibitors from berberine derivatives.
Collapse
Affiliation(s)
- Jiansong Fang
- Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China.,Institute of Clinical Pharmacology, Guangzhou University of Traditional Chinese Medicine, Guangzhou, 510006, China
| | - Xiaocong Pang
- Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China
| | - Ping Wu
- Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China
| | - Rong Yan
- Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China
| | - Li Gao
- Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China
| | - Chao Li
- Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China
| | - Wenwen Lian
- Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China
| | - Qi Wang
- Institute of Clinical Pharmacology, Guangzhou University of Traditional Chinese Medicine, Guangzhou, 510006, China
| | - Ai-lin Liu
- Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China.,Beijing Key Laboratory of Drug Target and Screening Research, Beijing, 100050, China.,State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Beijing, 100050, China
| | - Guan-hua Du
- Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China.,Beijing Key Laboratory of Drug Target and Screening Research, Beijing, 100050, China.,State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Beijing, 100050, China
| |
Collapse
|