1
|
Unsupervised Representation Learning for Proteochemometric Modeling. Int J Mol Sci 2021; 22:ijms222312882. [PMID: 34884688 PMCID: PMC8657702 DOI: 10.3390/ijms222312882] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/25/2021] [Accepted: 11/26/2021] [Indexed: 11/18/2022] Open
Abstract
In silico protein–ligand binding prediction is an ongoing area of research in computational chemistry and machine learning based drug discovery, as an accurate predictive model could greatly reduce the time and resources necessary for the detection and prioritization of possible drug candidates. Proteochemometric modeling (PCM) attempts to create an accurate model of the protein–ligand interaction space by combining explicit protein and ligand descriptors. This requires the creation of information-rich, uniform and computer interpretable representations of proteins and ligands. Previous studies in PCM modeling rely on pre-defined, handcrafted feature extraction methods, and many methods use protein descriptors that require alignment or are otherwise specific to a particular group of related proteins. However, recent advances in representation learning have shown that unsupervised machine learning can be used to generate embeddings that outperform complex, human-engineered representations. Several different embedding methods for proteins and molecules have been developed based on various language-modeling methods. Here, we demonstrate the utility of these unsupervised representations and compare three protein embeddings and two compound embeddings in a fair manner. We evaluate performance on various splits of a benchmark dataset, as well as on an internal dataset of protein–ligand binding activities and find that unsupervised-learned representations significantly outperform handcrafted representations.
Collapse
|
2
|
Kumar SP, Patel CN, Rawal RM, Pandya HA. Energetic contributions of amino acid residues and its cross‐talk to delineate ligand‐binding mechanism. Proteins 2020; 88:1207-1225. [DOI: 10.1002/prot.25894] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 02/20/2020] [Accepted: 04/03/2020] [Indexed: 02/02/2023]
Affiliation(s)
| | - Chirag N. Patel
- Department of Botany, Bioinformatics, and Climate Change Impacts ManagementUniversity School of Sciences, Gujarat University Ahmedabad India
| | - Rakesh M. Rawal
- Department of Life SciencesUniversity School of Sciences, Gujarat University Ahmedabad India
| | - Himanshu A. Pandya
- Department of Life SciencesUniversity School of Sciences, Gujarat University Ahmedabad India
- Department of Botany, Bioinformatics, and Climate Change Impacts ManagementUniversity School of Sciences, Gujarat University Ahmedabad India
| |
Collapse
|
3
|
Gagic Z, Ruzic D, Djokovic N, Djikic T, Nikolic K. In silico Methods for Design of Kinase Inhibitors as Anticancer Drugs. Front Chem 2020; 7:873. [PMID: 31970149 PMCID: PMC6960140 DOI: 10.3389/fchem.2019.00873] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Accepted: 12/04/2019] [Indexed: 12/11/2022] Open
Abstract
Rational drug design implies usage of molecular modeling techniques such as pharmacophore modeling, molecular dynamics, virtual screening, and molecular docking to explain the activity of biomolecules, define molecular determinants for interaction with the drug target, and design more efficient drug candidates. Kinases play an essential role in cell function and therefore are extensively studied targets in drug design and discovery. Kinase inhibitors are clinically very important and widely used antineoplastic drugs. In this review, computational methods used in rational drug design of kinase inhibitors are discussed and compared, considering some representative case studies.
Collapse
Affiliation(s)
- Zarko Gagic
- Department of Pharmaceutical Chemistry, Faculty of Medicine, University of Banja Luka, Banja Luka, Bosnia and Herzegovina
| | - Dusan Ruzic
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Belgrade, Belgrade, Serbia
| | - Nemanja Djokovic
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Belgrade, Belgrade, Serbia
| | - Teodora Djikic
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Belgrade, Belgrade, Serbia
| | - Katarina Nikolic
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Belgrade, Belgrade, Serbia
| |
Collapse
|
4
|
Whitehead TM, Irwin BWJ, Hunt P, Segall MD, Conduit GJ. Imputation of Assay Bioactivity Data Using Deep Learning. J Chem Inf Model 2019; 59:1197-1204. [PMID: 30753070 DOI: 10.1021/acs.jcim.8b00768] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We describe a novel deep learning neural network method and its application to impute assay pIC50 values. Unlike conventional machine learning approaches, this method is trained on sparse bioactivity data as input, typical of that found in public and commercial databases, enabling it to learn directly from correlations between activities measured in different assays. In two case studies on public domain data sets we show that the neural network method outperforms traditional quantitative structure-activity relationship (QSAR) models and other leading approaches. Furthermore, by focusing on only the most confident predictions the accuracy is increased to R2 > 0.9 using our method, as compared to R2 = 0.44 when reporting all predictions.
Collapse
Affiliation(s)
- T M Whitehead
- Intellegens , Eagle Labs , Chesterton Road , Cambridge CB4 3AZ , United Kingdom
| | - B W J Irwin
- Optibrium , F5-6 Blenheim House, Cambridge Innovation Park, Denny End Road , Cambridge CB25 9PB , United Kingdom
| | - P Hunt
- Optibrium , F5-6 Blenheim House, Cambridge Innovation Park, Denny End Road , Cambridge CB25 9PB , United Kingdom
| | - M D Segall
- Optibrium , F5-6 Blenheim House, Cambridge Innovation Park, Denny End Road , Cambridge CB25 9PB , United Kingdom
| | - G J Conduit
- Intellegens , Eagle Labs , Chesterton Road , Cambridge CB4 3AZ , United Kingdom.,Cavendish Laboratory , University of Cambridge , J.J. Thomson Avenue , Cambridge CB3 0HE , United Kingdom
| |
Collapse
|
5
|
Giblin KA, Hughes SJ, Boyd H, Hansson P, Bender A. Prospectively Validated Proteochemometric Models for the Prediction of Small-Molecule Binding to Bromodomain Proteins. J Chem Inf Model 2018; 58:1870-1888. [DOI: 10.1021/acs.jcim.8b00400] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Kathryn A. Giblin
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Samantha J. Hughes
- Computational Chemistry, Oncology, IMED Biotech Unit, AstraZeneca, Cambridge CB10 1XL, U.K
| | - Helen Boyd
- Discovery Biology, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg 431 50 SE, Sweden
| | - Pia Hansson
- Discovery Biology, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg 431 50 SE, Sweden
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| |
Collapse
|
6
|
Martin EJ, Polyakov VR, Tian L, Perez RC. Profile-QSAR 2.0: Kinase Virtual Screening Accuracy Comparable to Four-Concentration IC 50s for Realistically Novel Compounds. J Chem Inf Model 2017. [PMID: 28651433 DOI: 10.1021/acs.jcim.7b00166] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
While conventional random forest regression (RFR) virtual screening models appear to have excellent accuracy on random held-out test sets, they prove lacking in actual practice. Analysis of 18 historical virtual screens showed that random test sets are far more similar to their training sets than are the compounds project teams actually order. A new, cluster-based "realistic" training/test set split, which mirrors the chemical novelty of real-life virtual screens, recapitulates the poor predictive power of RFR models in real projects. The original Profile-QSAR (pQSAR) method greatly broadened the domain of applicability over conventional models by using as independent variables a profile of activity predictions from all historical assays in a large protein family. However, the accuracy still fell short of experiment on realistic test sets. The improved "pQSAR 2.0" method replaces probabilities of activity from naïve Bayes categorical models at several thresholds with predicted IC50s from RFR models. Unexpectedly, the high accuracy also requires removing the RFR model for the actual assay of interest from the independent variable profile. With these improvements, pQSAR 2.0 activity predictions are now statistically comparable to medium-throughput four-concentration IC50 measurements even on the realistic test set. Beyond the yes/no activity predictions from a typical high-throughput screen (HTS) or conventional virtual screen, these semiquantitative IC50 predictions allow for predicted potency, ligand efficiency, lipophilic efficiency, and selectivity against antitargets, greatly facilitating hitlist triaging and enabling virtual screening panels such as toxicity panels and overall promiscuity predictions.
Collapse
Affiliation(s)
- Eric J Martin
- Novartis Institutes for Biomedical Research , 5300 Chiron Way, Emeryville, California 94608-2916, United States
| | - Valery R Polyakov
- Novartis Institutes for Biomedical Research , 5300 Chiron Way, Emeryville, California 94608-2916, United States
| | - Li Tian
- Novartis Institutes for Biomedical Research , 5300 Chiron Way, Emeryville, California 94608-2916, United States
| | - Rolando C Perez
- Novartis Institutes for Biomedical Research , 5300 Chiron Way, Emeryville, California 94608-2916, United States
| |
Collapse
|
7
|
Subramanian V, Ain QU, Henno H, Pietilä LO, Fuchs JE, Prusis P, Bender A, Wohlfahrt G. 3D proteochemometrics: using three-dimensional information of proteins and ligands to address aspects of the selectivity of serine proteases. MEDCHEMCOMM 2017; 8:1037-1045. [PMID: 30108817 PMCID: PMC6072133 DOI: 10.1039/c6md00701e] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Accepted: 03/14/2017] [Indexed: 11/21/2022]
Abstract
The high similarity between certain sub-pockets of serine proteases may lead to low selectivity of protease inhibitors. Therefore the application of proteochemometrics (PCM), which quantifies the relationship between protein/ligand descriptors and affinity for multiple ligands and targets simultaneously, is useful to understand and improve the selectivity profiles of potential inhibitors. In this study, protein field-based PCM that uses knowledge-based and WaterMap derived fields to describe proteins in combination with 2D (RDKit and MOE fingerprints) and 3D (4 point pharmacophoric fingerprints and GRIND) ligand descriptors was used to model the bioactivities of 24 homologous serine proteases and 5863 inhibitors in an integrated fashion. Of the multiple field-based PCM models generated based on different ligand descriptors, RDKit fingerprints showed the best performance in terms of external prediction with Rtest2 of 0.72 and RMSEP of 0.81. Further, visual interpretation of the models highlights sub-pocket specific regions that influence affinity and selectivity of serine protease inhibitors.
Collapse
Affiliation(s)
- Vigneshwari Subramanian
- Division of Pharmaceutical Chemistry and Technology , Faculty of Pharmacy , University of Helsinki , 00014 Helsinki , Finland
- Computer-Aided Drug Design , Orion Pharma , Orionintie 1 , 02101 Espoo , Finland .
| | - Qurrat Ul Ain
- Centre for Molecular Informatics , Department of Chemistry , Lensfield Road , CB2 1EW Cambridge , UK
| | - Helena Henno
- Computer-Aided Drug Design , Orion Pharma , Orionintie 1 , 02101 Espoo , Finland .
| | - Lars-Olof Pietilä
- Computer-Aided Drug Design , Orion Pharma , Orionintie 1 , 02101 Espoo , Finland .
| | - Julian E Fuchs
- Centre for Molecular Informatics , Department of Chemistry , Lensfield Road , CB2 1EW Cambridge , UK
- Institute of General , Inorganic and Theoretical Chemistry , University of Innsbruck , Innrain 82 , 6020 Innsbruck , Austria
| | - Peteris Prusis
- Computer-Aided Drug Design , Orion Pharma , Orionintie 1 , 02101 Espoo , Finland .
| | - Andreas Bender
- Centre for Molecular Informatics , Department of Chemistry , Lensfield Road , CB2 1EW Cambridge , UK
| | - Gerd Wohlfahrt
- Computer-Aided Drug Design , Orion Pharma , Orionintie 1 , 02101 Espoo , Finland .
| |
Collapse
|