1
|
Daniluk P, Oleniecki T, Lesyng B. DAMA: a method for computing multiple alignments of protein structures using local structure descriptors. Bioinformatics 2021; 38:80-85. [PMID: 34396393 PMCID: PMC8696102 DOI: 10.1093/bioinformatics/btab571] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 05/31/2021] [Accepted: 08/12/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The well-known fact that protein structures are more conserved than their sequences forms the basis of several areas of computational structural biology. Methods based on the structure analysis provide more complete information on residue conservation in evolutionary processes. This is crucial for the determination of evolutionary relationships between proteins and for the identification of recurrent structural patterns present in biomolecules involved in similar functions. However, algorithmic structural alignment is much more difficult than multiple sequence alignment. This study is devoted to the development and applications of DAMA-a novel effective environment capable to compute and analyze multiple structure alignments. RESULTS DAMA is based on local structural similarities, using local 3D structure descriptors and thus accounts for nearest-neighbor molecular environments of aligned residues. It is constrained neither by protein topology nor by its global structure. DAMA is an extension of our previous study (DEDAL) which demonstrated the applicability of local descriptors to pairwise alignment problems. Since the multiple alignment problem is NP-complete, an effective heuristic approach has been developed without imposing any artificial constraints. The alignment algorithm searches for the largest, consistent ensemble of similar descriptors. The new method is capable to capture most of the biologically significant similarities present in canonical test sets and is discriminatory enough to prevent the emergence of larger, but meaningless, solutions. Tests performed on the test sets, including protein kinases, demonstrate DAMA's capability of identifying equivalent residues, which should be very useful in discovering the biological nature of proteins similarity. Performance profiles show the advantage of DAMA over other methods, in particular when using a strict similarity measure QC, which is the ratio of correctly aligned columns, and when applying the methods to more difficult cases. AVAILABILITY AND IMPLEMENTATION DAMA is available online at http://dworkowa.imdik.pan.pl/EP/DAMA. Linux binaries of the software are available upon request. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Paweł Daniluk
- Bioinformatics Laboratory, Mossakowski Medical Research Centre, Polish Academy of Sciences, 02-106 Warsaw, Poland
| | - Tymoteusz Oleniecki
- College of Inter-Faculty Individual Studies in Mathematics and Natural Sciences, University of Warsaw, 02-089 Warsaw, Poland
| | | |
Collapse
|
2
|
Nazarshodeh E, Sheikhpour R, Gharaghani S, Sarram MA. A novel proteochemometrics model for predicting the inhibition of nine carbonic anhydrase isoforms based on supervised Laplacian score and k-nearest neighbour regression. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2018; 29:419-437. [PMID: 29882433 DOI: 10.1080/1062936x.2018.1447995] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Accepted: 02/28/2018] [Indexed: 06/08/2023]
Abstract
Carbonic anhydrases (CAs) are essential enzymes in biological processes. Prediction of the activity of compounds towards CA isoforms could be evaluated by computational techniques to discover a novel therapeutic inhibitor. Studies such as quantitative structure-activity relationships (QSARs), molecular docking and pharmacophore modelling have been carried out to design potent inhibitors. Unfortunately, QSAR does not consider the information of target space in the model. We successfully developed an in silico proteochemometrics model that simultaneously uses target and ligand descriptors to predict the activities of CA inhibitors. Herein, a strong predictive model was built for the prediction of protein-ligand binding affinity between nine human CA isoforms and 549 ligands. We applied descriptors obtained from the PROFEAT webserver for the proteins. Ligands were encoded by descriptors from PaDEL-Descriptor software. Supervised Laplacian score (SLS) and particle swarm optimization were used for feature selection. Models were derived using k-nearest neighbour (KNN) regression and a kernel smoother model. The predictive ability of the models was evaluated by an external validation test. Statistical results (Q2ext = 0.7806, r2test = 0.7811 and RMSEtest = 0.5549) showed that the model generated using SLS and KNN regression outperformed the other models. Consequently, the selectivity of compounds towards these enzymes will be predicted prior to synthesis.
Collapse
Affiliation(s)
- E Nazarshodeh
- a Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics , University of Tehran , Tehran , Iran
| | - R Sheikhpour
- b Department of Computer Engineering , Yazd University , Yazd , Iran
| | - S Gharaghani
- a Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics , University of Tehran , Tehran , Iran
| | - M A Sarram
- b Department of Computer Engineering , Yazd University , Yazd , Iran
| |
Collapse
|
3
|
Rasti B, Namazi M, Karimi-Jafari MH, Ghasemi JB. Proteochemometric Modeling of the Interaction Space of Carbonic Anhydrase and its Inhibitors: An Assessment of Structure-based and Sequence-based Descriptors. Mol Inform 2016; 36. [PMID: 27860295 DOI: 10.1002/minf.201600102] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2015] [Accepted: 10/26/2016] [Indexed: 11/08/2022]
Abstract
Due to its physiological and clinical roles, carbonic anhydrase (CA) is one of the most interesting case studies. There are different classes of CAinhibitors including sulfonamides, polyamines, coumarins and dithiocarbamates (DTCs). However, many of them hardly act as a selective inhibitor against a specific isoform. Therefore, finding highly selective inhibitors for different isoforms of CA is still an ongoing project. Proteochemometrics modeling (PCM) is able to model the bioactivity of multiple compounds against different isoforms of a protein. Therefore, it would be extremely applicable when investigating the selectivity of different ligands towards different receptors. Given the facts, we applied PCM to investigate the interaction space and structural properties that lead to the selective inhibition of CA isoforms by some dithiocarbamates. Our models have provided interesting structural information that can be considered to design compounds capable of inhibiting different isoforms of CA in an improved selective manner. Validity and predictivity of the models were confirmed by both internal and external validation methods; while Y-scrambling approach was applied to assess the robustness of the models. To prove the reliability and the applicability of our findings, we showed how ligands-receptors selectivity can be affected by removing any of these critical findings from the modeling process.
Collapse
Affiliation(s)
- Behnam Rasti
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Mohsen Namazi
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - M H Karimi-Jafari
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Jahan B Ghasemi
- Department of Analytical Chemistry, School of Chemistry, College of Science, University of Tehran, Tehran, Iran
| |
Collapse
|
4
|
Qiu T, Qiu J, Feng J, Wu D, Yang Y, Tang K, Cao Z, Zhu R. The recent progress in proteochemometric modelling: focusing on target descriptors, cross-term descriptors and application scope. Brief Bioinform 2016; 18:125-136. [PMID: 26873661 DOI: 10.1093/bib/bbw004] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Revised: 12/09/2015] [Indexed: 12/17/2022] Open
Abstract
As an extension of the conventional quantitative structure activity relationship models, proteochemometric (PCM) modelling is a computational method that can predict the bioactivity relations between multiple ligands and multiple targets. Traditional PCM modelling includes three essential elements: descriptors (including target descriptors, ligand descriptors and cross-term descriptors), bioactivity data and appropriate learning functions that link the descriptors to the bioactivity data. Since its appearance, PCM modelling has developed rapidly over the past decade by taking advantage of the progress of different descriptors and machine learning techniques, along with the increasing amounts of available bioactivity data. Specifically, the new emerging target descriptors and cross-term descriptors not only significantly increased the performance of PCM modelling but also expanded its application scope from traditional protein-ligand interaction to more abundant interactions, including protein-peptide, protein-DNA and even protein-protein interactions. In this review, target descriptors and cross-term descriptors, as well as the corresponding application scope, are intensively summarized. Additionally, we look forward to seeing PCM modelling extend into new application scopes, such as Target-Catalyst-Ligand systems, with the further development of descriptors, machine learning techniques and increasing amounts of available bioactivity data.
Collapse
|
5
|
Gardiner EJ, Gillet VJ. Perspectives on Knowledge Discovery Algorithms Recently Introduced in Chemoinformatics: Rough Set Theory, Association Rule Mining, Emerging Patterns, and Formal Concept Analysis. J Chem Inf Model 2015; 55:1781-803. [DOI: 10.1021/acs.jcim.5b00198] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Affiliation(s)
- Eleanor J. Gardiner
- Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, United Kingdom
| | - Valerie J. Gillet
- Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, United Kingdom
| |
Collapse
|
6
|
Khaliq Z, Leijon M, Belák S, Komorowski J. A complete map of potential pathogenicity markers of avian influenza virus subtype H5 predicted from 11 expressed proteins. BMC Microbiol 2015; 15:128. [PMID: 26112351 PMCID: PMC4482282 DOI: 10.1186/s12866-015-0465-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 06/12/2015] [Indexed: 01/18/2023] Open
Abstract
Background Polybasic cleavage sites of the hemagglutinin (HA) proteins are considered to be the most important determinants indicating virulence of the avian influenza viruses (AIV). However, evidence is accumulating that these sites alone are not sufficient to establish high pathogenicity. There need to exist other sites located on the HA protein outside the cleavage site or on the other proteins expressed by AIV that contribute to the pathogenicity. Results We employed rule-based computational modeling to construct a map, with high statistical significance, of amino acid (AA) residues associated to pathogenicity in 11 proteins of the H5 type viruses. We found potential markers of pathogenicity in all of the 11 proteins expressed by the H5 type of AIV. AA mutations S-43HA1-D, D-83HA1-A in HA; S-269-D, E-41-H in NA; S-48-N, K-212-N in NS1; V-166-A in M1; G-14-E in M2; K-77-R, S-377-N in NP; and Q-48-P in PB1-F2 were identified as having a potential to shift the pathogenicity from low to high. Our results suggest that the low pathogenicity is common to most of the subtypes of the H5 AIV while the high pathogenicity is specific to each subtype. The models were developed using public data and validated on new, unseen sequences. Conclusions Our models explicitly define a viral genetic background required for the virus to be highly pathogenic and thus confirm the hypothesis of the presence of pathogenicity markers beyond the cleavage site. Electronic supplementary material The online version of this article (doi:10.1186/s12866-015-0465-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zeeshan Khaliq
- Department of Cell and Molecular Biology, Computational and Systems Biology, Science for Life Laboratory, Uppsala University, SE-751 24, Uppsala, Sweden.
| | - Mikael Leijon
- Department of Virology, Parasitology and Immunobiology (VIP), National Veterinary Institute (SVA), Uppsala, Sweden. .,OIE Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine, Ulls väg 2B and 26, SE-756 89, Uppsala, Sweden.
| | - Sándor Belák
- OIE Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine, Ulls väg 2B and 26, SE-756 89, Uppsala, Sweden. .,Department of Biomedical Sciences and Veterinary Public Health (BVF), Swedish University of Agricultural Sciences (SLU), Uppsala, Sweden.
| | - Jan Komorowski
- Department of Cell and Molecular Biology, Computational and Systems Biology, Science for Life Laboratory, Uppsala University, SE-751 24, Uppsala, Sweden. .,Institute of Computer Science, Polish Academy of Sciences, 01-248, Warszawa, Poland.
| |
Collapse
|
7
|
Ain QU, Méndez-Lucio O, Ciriano IC, Malliavin T, van Westen GJP, Bender A. Modelling ligand selectivity of serine proteases using integrative proteochemometric approaches improves model performance and allows the multi-target dependent interpretation of features. Integr Biol (Camb) 2015; 6:1023-33. [PMID: 25255469 DOI: 10.1039/c4ib00175c] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Serine proteases, implicated in important physiological functions, have a high intra-family similarity, which leads to unwanted off-target effects of inhibitors with insufficient selectivity. However, the availability of sequence and structure data has now made it possible to develop approaches to design pharmacological agents that can discriminate successfully between their related binding sites. In this study, we have quantified the relationship between 12,625 distinct protease inhibitors and their bioactivity against 67 targets of the serine protease family (20,213 data points) in an integrative manner, using proteochemometric modelling (PCM). The benchmarking of 21 different target descriptors motivated the usage of specific binding pocket amino acid descriptors, which helped in the identification of active site residues and selective compound chemotypes affecting compound affinity and selectivity. PCM models performed better than alternative approaches (models trained using exclusively compound descriptors on all available data, QSAR) employed for comparison with R(2)/RMSE values of 0.64 ± 0.23/0.66 ± 0.20 vs. 0.35 ± 0.27/1.05 ± 0.27 log units, respectively. Moreover, the interpretation of the PCM model singled out various chemical substructures responsible for bioactivity and selectivity towards particular proteases (thrombin, trypsin and coagulation factor 10) in agreement with the literature. For instance, absence of a tertiary sulphonamide was identified to be responsible for decreased selective activity (by on average 0.27 ± 0.65 pChEMBL units) on FA10. Among the binding pocket residues, the amino acids (arginine, leucine and tyrosine) at positions 35, 39, 60, 93, 140 and 207 were observed as key contributing residues for selective affinity on these three targets.
Collapse
Affiliation(s)
- Qurrat U Ain
- Centre for Molecular Informatics, Department of Chemistry, Lensfield Road, CB2 1EW, University of Cambridge, UK.
| | | | | | | | | | | |
Collapse
|
8
|
Cortés-Ciriano I, Ain QU, Subramanian V, Lenselink EB, Méndez-Lucio O, IJzerman AP, Wohlfahrt G, Prusis P, Malliavin TE, van Westen GJP, Bender A. Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. MEDCHEMCOMM 2015. [DOI: 10.1039/c4md00216d] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Proteochemometric (PCM) modelling is a computational method to model the bioactivity of multiple ligands against multiple related protein targets simultaneously.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Unité de Bioinformatique Structurale
- Institut Pasteur and CNRS UMR 3825
- Structural Biology and Chemistry Department
- 75 724 Paris
- France
| | - Qurrat Ul Ain
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| | | | - Eelke B. Lenselink
- Division of Medicinal Chemistry
- Leiden Academic Centre for Drug Research
- Leiden
- The Netherlands
| | - Oscar Méndez-Lucio
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| | - Adriaan P. IJzerman
- Division of Medicinal Chemistry
- Leiden Academic Centre for Drug Research
- Leiden
- The Netherlands
| | - Gerd Wohlfahrt
- Computer-Aided Drug Design
- Orion Pharma
- FIN-02101 Espoo
- Finland
| | - Peteris Prusis
- Computer-Aided Drug Design
- Orion Pharma
- FIN-02101 Espoo
- Finland
| | - Thérèse E. Malliavin
- Unité de Bioinformatique Structurale
- Institut Pasteur and CNRS UMR 3825
- Structural Biology and Chemistry Department
- 75 724 Paris
- France
| | - Gerard J. P. van Westen
- European Molecular Biology Laboratory
- European Bioinformatics Institute
- Wellcome Trust Genome Campus
- Hinxton
- UK
| | - Andreas Bender
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| |
Collapse
|
9
|
Dąbrowski MJ, Bornelöv S, Kruczyk M, Baltzer N, Komorowski J. 'True' null allele detection in microsatellite loci: a comparison of methods, assessment of difficulties and survey of possible improvements. Mol Ecol Resour 2014; 15:477-88. [PMID: 25187238 DOI: 10.1111/1755-0998.12326] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Revised: 08/19/2014] [Accepted: 08/21/2014] [Indexed: 02/04/2023]
Abstract
Null alleles are alleles that for various reasons fail to amplify in a PCR assay. The presence of null alleles in microsatellite data is known to bias the genetic parameter estimates. Thus, efficient detection of null alleles is crucial, but the methods available for indirect null allele detection return inconsistent results. Here, our aim was to compare different methods for null allele detection, to explain their respective performance and to provide improvements. We applied several approaches to identify the 'true' null alleles based on the predictions made by five different methods, used either individually or in combination. First, we introduced simulated 'true' null alleles into 240 population data sets and applied the methods to measure their success in detecting the simulated null alleles. The single best-performing method was ML-NullFreq_frequency. Furthermore, we applied different noise reduction approaches to improve the results. For instance, by combining the results of several methods, we obtained more reliable results than using a single one. Rule-based classification was applied to identify population properties linked to the false discovery rate. Rules obtained from the classifier described which population genetic estimates and loci characteristics were linked to the success of each method. We have shown that by simulating 'true' null alleles into a population data set, we may define a null allele frequency threshold, related to a desired true or false discovery rate. Moreover, using such simulated data sets, the expected null allele homozygote frequency may be estimated independently of the equilibrium state of the population.
Collapse
Affiliation(s)
- M J Dąbrowski
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, Box 596, 751 24, Uppsala, Sweden; Museum and Institute of Zoology, Polish Academy of Sciences, Wilcza 64, 00-679, Warsaw, Poland
| | | | | | | | | |
Collapse
|
10
|
Bornelöv S, Marillet S, Komorowski J. Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers. BMC Bioinformatics 2014; 15:139. [PMID: 24886370 PMCID: PMC4030460 DOI: 10.1186/1471-2105-15-139] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2013] [Accepted: 04/07/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The use of classification algorithms is becoming increasingly important for the field of computational biology. However, not only the quality of the classification, but also its biological interpretation is important. This interpretation may be eased if interacting elements can be identified and visualized, something that requires appropriate tools and methods. RESULTS We developed a new approach to detecting interactions in complex systems based on classification. Using rule-based classifiers, we previously proposed a rule network visualization strategy that may be applied as a heuristic for finding interactions. We now complement this work with Ciruvis, a web-based tool for the construction of rule networks from classifiers made of IF-THEN rules. Simulated and biological data served as an illustration of how the tool may be used to visualize and interpret classifiers. Furthermore, we used the rule networks to identify feature interactions, compared them to alternative methods, and computationally validated the findings. CONCLUSIONS Rule networks enable a fast method for model visualization and provide an exploratory heuristic to interaction detection. The tool is made freely available on the web and may thus be used to aid and improve rule-based classification.
Collapse
Affiliation(s)
| | | | - Jan Komorowski
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, 751 24 Uppsala, Sweden.
| |
Collapse
|
11
|
Subramanian V, Prusis P, Pietilä LO, Xhaard H, Wohlfahrt G. Visually interpretable models of kinase selectivity related features derived from field-based proteochemometrics. J Chem Inf Model 2013; 53:3021-30. [PMID: 24116714 DOI: 10.1021/ci400369z] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Achieving selectivity for small organic molecules toward biological targets is a main focus of drug discovery but has been proven difficult, for example, for kinases because of the high similarity of their ATP binding pockets. To support the design of more selective inhibitors with fewer side effects or with altered target profiles for improved efficacy, we developed a method combining ligand- and receptor-based information. Conventional QSAR models enable one to study the interactions of multiple ligands toward a single protein target, but in order to understand the interactions between multiple ligands and multiple proteins, we have used proteochemometrics, a multivariate statistics method that aims to combine and correlate both ligand and protein descriptions with affinity to receptors. The superimposed binding sites of 50 unique kinases were described by molecular interaction fields derived from knowledge-based potentials and Schrödinger's WaterMap software. Eighty ligands were described by Mold(2), Open Babel, and Volsurf descriptors. Partial least-squares regression including cross-terms, which describe the selectivity, was used for model building. This combination of methods allows interpretation and easy visualization of the models within the context of ligand binding pockets, which can be translated readily into the design of novel inhibitors.
Collapse
|
12
|
Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets. J Cheminform 2013; 5:42. [PMID: 24059743 PMCID: PMC4015169 DOI: 10.1186/1758-2946-5-42] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 09/18/2013] [Indexed: 11/10/2022] Open
Abstract
Background While a large body of work exists on comparing and benchmarking descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 amino acid descriptor sets have been benchmarked with respect to their ability of establishing bioactivity models. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI, BLOSUM, a novel protein descriptor set (termed ProtFP (4 variants)), and in addition we created and benchmarked three pairs of descriptor combinations. Prediction performance was evaluated in seven structure-activity benchmarks which comprise Angiotensin Converting Enzyme (ACE) dipeptidic inhibitor data, and three proteochemometric data sets, namely (1) GPCR ligands modeled against a GPCR panel, (2) enzyme inhibitors (NNRTIs) with associated bioactivities against a set of HIV enzyme mutants, and (3) enzyme inhibitors (PIs) with associated bioactivities on a large set of HIV enzyme mutants. Results The amino acid descriptor sets compared here show similar performance (<0.1 log units RMSE difference and <0.1 difference in MCC), while errors for individual proteins were in some cases found to be larger than those resulting from descriptor set differences ( > 0.3 log units RMSE difference and >0.7 difference in MCC). Combining different descriptor sets generally leads to better modeling performance than utilizing individual sets. The best performers were Z-scales (3) combined with ProtFP (Feature), or Z-Scales (3) combined with an average Z-Scale value for each target, while ProtFP (PCA8), ST-Scales, and ProtFP (Feature) rank last. Conclusions While amino acid descriptor sets capture different aspects of amino acids their ability to be used for bioactivity modeling is still – on average – surprisingly similar. Still, combining sets describing complementary information consistently leads to small but consistent improvement in modeling performance (average MCC 0.01 better, average RMSE 0.01 log units lower). Finally, performance differences exist between the targets compared thereby underlining that choosing an appropriate descriptor set is of fundamental for bioactivity modeling, both from the ligand- as well as the protein side.
Collapse
|
13
|
van Westen GJ, Swier RF, Wegner JK, Ijzerman AP, van Vlijmen HW, Bender A. Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets. J Cheminform 2013; 5:41. [PMID: 24059694 PMCID: PMC3848949 DOI: 10.1186/1758-2946-5-41] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 09/18/2013] [Indexed: 11/10/2022] Open
Abstract
Background While a large body of work exists on comparing and benchmarking of descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 different protein descriptor sets have been compared with respect to their behavior in perceiving similarities between amino acids. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI and BLOSUM, and a novel protein descriptor set termed ProtFP (4 variants). We investigate to which extent descriptor sets show collinear as well as orthogonal behavior via principal component analysis (PCA). Results In describing amino acid similarities, MSWHIM, T-scales and ST-scales show related behavior, as do the VHSE, FASGAI, and ProtFP (PCA3) descriptor sets. Conversely, the ProtFP (PCA5), ProtFP (PCA8), Z-Scales (Binned), and BLOSUM descriptor sets show behavior that is distinct from one another as well as both of the clusters above. Generally, the use of more principal components (>3 per amino acid, per descriptor) leads to a significant differences in the way amino acids are described, despite that the later principal components capture less variation per component of the original input data. Conclusion In this work a comparison is provided of how similar (and differently) currently available amino acids descriptor sets behave when converting structure to property space. The results obtained enable molecular modelers to select suitable amino acid descriptor sets for structure-activity analyses, e.g. those showing complementary behavior.
Collapse
Affiliation(s)
- Gerard Jp van Westen
- Division of Medicinal Chemistry, Leiden / Amsterdam Center for Drug Research, Einsteinweg 55, Leiden 2333, CC, The Netherlands.
| | | | | | | | | | | |
Collapse
|
14
|
Pérot S, Regad L, Reynès C, Spérandio O, Miteva MA, Villoutreix BO, Camproux AC. Insights into an original pocket-ligand pair classification: a promising tool for ligand profile prediction. PLoS One 2013; 8:e63730. [PMID: 23840299 PMCID: PMC3688729 DOI: 10.1371/journal.pone.0063730] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2012] [Accepted: 04/05/2013] [Indexed: 11/18/2022] Open
Abstract
Pockets are today at the cornerstones of modern drug discovery projects and at the crossroad of several research fields, from structural biology to mathematical modeling. Being able to predict if a small molecule could bind to one or more protein targets or if a protein could bind to some given ligands is very useful for drug discovery endeavors, anticipation of binding to off- and anti-targets. To date, several studies explore such questions from chemogenomic approach to reverse docking methods. Most of these studies have been performed either from the viewpoint of ligands or targets. However it seems valuable to use information from both ligands and target binding pockets. Hence, we present a multivariate approach relating ligand properties with protein pocket properties from the analysis of known ligand-protein interactions. We explored and optimized the pocket-ligand pair space by combining pocket and ligand descriptors using Principal Component Analysis and developed a classification engine on this paired space, revealing five main clusters of pocket-ligand pairs sharing specific and similar structural or physico-chemical properties. These pocket-ligand pair clusters highlight correspondences between pocket and ligand topological and physico-chemical properties and capture relevant information with respect to protein-ligand interactions. Based on these pocket-ligand correspondences, a protocol of prediction of clusters sharing similarity in terms of recognition characteristics is developed for a given pocket-ligand complex and gives high performances. It is then extended to cluster prediction for a given pocket in order to acquire knowledge about its expected ligand profile or to cluster prediction for a given ligand in order to acquire knowledge about its expected pocket profile. This prediction approach shows promising results and could contribute to predict some ligand properties critical for binding to a given pocket, and conversely, some key pocket properties for ligand binding.
Collapse
Affiliation(s)
- Stéphanie Pérot
- INSERM, UMRS 973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMRS 973, MTi, Paris, France
| | - Leslie Regad
- INSERM, UMRS 973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMRS 973, MTi, Paris, France
| | - Christelle Reynès
- INSERM, UMRS 973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMRS 973, MTi, Paris, France
| | - Olivier Spérandio
- INSERM, UMRS 973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMRS 973, MTi, Paris, France
| | - Maria A. Miteva
- INSERM, UMRS 973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMRS 973, MTi, Paris, France
| | - Bruno O. Villoutreix
- INSERM, UMRS 973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMRS 973, MTi, Paris, France
| | - Anne-Claude Camproux
- INSERM, UMRS 973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMRS 973, MTi, Paris, France
- * E-mail:
| |
Collapse
|
15
|
Koch U, Hamacher M, Nussbaumer P. Cheminformatics at the interface of medicinal chemistry and proteomics. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:156-61. [PMID: 23707564 DOI: 10.1016/j.bbapap.2013.05.010] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2012] [Revised: 04/26/2013] [Accepted: 05/13/2013] [Indexed: 10/26/2022]
Abstract
Multiple factors have to be optimized in the course of a drug discovery project. Traditionally this includes potency on a single target, eventually specificity as well as the pharmacokinetic, physicochemical and the safety profile. Recently an additional dimension has been added by realizing that the therapeutic outcome of a drug is often determined not only by its activity on a single target but also by its activity profile across a variety of biological targets. To address the polypharmacology of drug candidates many compounds are tested on a set of targets or in phenotypic screens generating a tremendous amount of data. To extract useful information computational methods at the interface of proteomics and cheminformatics are indispensable. This review will focus on some recent developments in this field. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
Collapse
Affiliation(s)
- Uwe Koch
- Lead Discovery Center GmbH, Otto-Hahn-Str. 15, D-44227 Dortmund, Germany.
| | | | | |
Collapse
|
16
|
Flower DR, Perrie Y. Identification of Candidate Vaccine Antigens In Silico. IMMUNOMIC DISCOVERY OF ADJUVANTS AND CANDIDATE SUBUNIT VACCINES 2013. [PMCID: PMC7120937 DOI: 10.1007/978-1-4614-5070-2_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The identification of immunogenic whole-protein antigens is fundamental to the successful discovery of candidate subunit vaccines and their rapid, effective, and efficient transformation into clinically useful, commercially successful vaccine formulations. In the wider context of the experimental discovery of vaccine antigens, with particular reference to reverse vaccinology, this chapter adumbrates the principal computational approaches currently deployed in the hunt for novel antigens: genome-level prediction of antigens, antigen identification through the use of protein sequence alignment-based approaches, antigen detection through the use of subcellular location prediction, and the use of alignment-independent approaches to antigen discovery. Reference is also made to the recent emergence of various expert systems for protein antigen identification.
Collapse
Affiliation(s)
- Darren R. Flower
- Aston Pharmacy School, School of Life and Health Sciences, University of Aston, Aston Triangle, Birmingham, B4 7ET United Kingdom
| | - Yvonne Perrie
- Aston Pharmacy School, School of Life and Health Sciences, Aston University, Aston Triangle, Birmingham, B4 7ET United Kingdom
| |
Collapse
|
17
|
Proteochemometric modeling of the bioactivity spectra of HIV-1 protease inhibitors by introducing protein-ligand interaction fingerprint. PLoS One 2012; 7:e41698. [PMID: 22848570 PMCID: PMC3407198 DOI: 10.1371/journal.pone.0041698] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2012] [Accepted: 06/25/2012] [Indexed: 01/01/2023] Open
Abstract
HIV-1 protease is one of the main therapeutic targets in HIV. However, a major problem in treatment of HIV is the rapid emergence of drug-resistant strains. It should be particularly helpful to clinical therapy of AIDS if one method can be used to predict antivirus capability of compounds for different variants. In our study, proteochemometric (PCM) models were created to study the bioactivity spectra of 92 chemical compounds with 47 unique HIV-1 protease variants. In contrast to other PCM models, which used Multiplication of Ligands and Proteins Descriptors (MLPD) as cross-term, one new cross-term, i.e. Protein-Ligand Interaction Fingerprint (PLIF) was introduced in our modeling. With different combinations of ligand descriptors, protein descriptors and cross-terms, nine PCM models were obtained, and six of them achieved good predictive abilities (Q(2)(test)>0.7). These results showed that the performance of PCM models could be improved when ligand and protein descriptors were complemented by the newly introduced cross-term PLIF. Compared with the conventional cross-term MLPD, the newly introduced PLIF had a better predictive ability. Furthermore, our best model (GD & P & PLIF: Q(2)(test) = 0.8271) could select out those inhibitors which have a broad antiviral activity. As a conclusion, our study indicates that proteochemometric modeling with PLIF as cross-term is a potential useful way to solve the HIV-1 drug-resistant problem.
Collapse
|
18
|
Daniluk P, Lesyng B. A novel method to compare protein structures using local descriptors. BMC Bioinformatics 2011; 12:344. [PMID: 21849047 PMCID: PMC3179968 DOI: 10.1186/1471-2105-12-344] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2011] [Accepted: 08/17/2011] [Indexed: 11/15/2022] Open
Abstract
Background Protein structure comparison is one of the most widely performed tasks in bioinformatics. However, currently used methods have problems with the so-called "difficult similarities", including considerable shifts and distortions of structure, sequential swaps and circular permutations. There is a demand for efficient and automated systems capable of overcoming these difficulties, which may lead to the discovery of previously unknown structural relationships. Results We present a novel method for protein structure comparison based on the formalism of local descriptors of protein structure - DEscriptor Defined Alignment (DEDAL). Local similarities identified by pairs of similar descriptors are extended into global structural alignments. We demonstrate the method's capability by aligning structures in difficult benchmark sets: curated alignments in the SISYPHUS database, as well as SISY and RIPC sets, including non-sequential and non-rigid-body alignments. On the most difficult RIPC set of sequence alignment pairs the method achieves an accuracy of 77% (the second best method tested achieves 60% accuracy). Conclusions DEDAL is fast enough to be used in whole proteome applications, and by lowering the threshold of detectable structure similarity it may shed additional light on molecular evolution processes. It is well suited to improving automatic classification of structure domains, helping analyze protein fold space, or to improving protein classification schemes. DEDAL is available online at http://bioexploratorium.pl/EP/DEDAL.
Collapse
Affiliation(s)
- Paweł Daniluk
- Faculty of Physics, Department of Biophysics and CoE BioExploratorium, University of Warsaw, Żwirki i Wigury 93, Warsaw, Poland
| | | |
Collapse
|
19
|
Xie L, Xie L, Bourne PE. Structure-based systems biology for analyzing off-target binding. Curr Opin Struct Biol 2011; 21:189-99. [PMID: 21292475 PMCID: PMC3070778 DOI: 10.1016/j.sbi.2011.01.004] [Citation(s) in RCA: 110] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Revised: 01/11/2011] [Accepted: 01/13/2011] [Indexed: 12/24/2022]
Abstract
Here off-target binding implies the binding of a small molecule of therapeutic interest to a protein target other than the primary target for which it was intended. Increasingly such off-targeting appears to be the norm rather than the exception, rational drug design notwithstanding, and can lead to detrimental side-effects, or opportunities to reposition a therapeutic agent to treat a different condition. Not surprisingly, there is significant interest in determining a priori what off-targets exist on a proteome-wide scale. Beyond determining putative off-targets is the need to understand the impact of such binding on the complete biological system, with the ultimate goal of being able to predict the phenotypic outcome. While a very ambitious goal, some progress is being made.
Collapse
Affiliation(s)
- Lei Xie
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego MC9743, 9500 Gilman Drive, La Jolla, CA 92093, USA
- Department of Computer Science, Hunter College, the City University of New York, 695 Park Avenue, New York City, NY 10065, USA
| | - Li Xie
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego MC9743, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Philip E. Bourne
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego MC9743, 9500 Gilman Drive, La Jolla, CA 92093, USA
| |
Collapse
|
20
|
van Westen GJP, Wegner JK, IJzerman AP, van Vlijmen HWT, Bender A. Proteochemometric modeling as a tool to design selective compounds and for extrapolating to novel targets. MEDCHEMCOMM 2011. [DOI: 10.1039/c0md00165a] [Citation(s) in RCA: 123] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Proteochemometric modeling is founded on the principles of QSAR but is able to benefit from additional information in model training due to the inclusion of target information.
Collapse
Affiliation(s)
- Gerard J. P. van Westen
- Division of Medicinal Chemistry
- Leiden/Amsterdam Center for Drug Research
- Leiden
- The Netherlands
| | | | - Adriaan P. IJzerman
- Division of Medicinal Chemistry
- Leiden/Amsterdam Center for Drug Research
- Leiden
- The Netherlands
| | - Herman W. T. van Vlijmen
- Division of Medicinal Chemistry
- Leiden/Amsterdam Center for Drug Research
- Leiden
- The Netherlands
- Tibotec BVBA
| | - A. Bender
- Division of Medicinal Chemistry
- Leiden/Amsterdam Center for Drug Research
- Leiden
- The Netherlands
- Unilever Centre for Molecular Science Informatics
| |
Collapse
|
21
|
Flower DR, Macdonald IK, Ramakrishnan K, Davies MN, Doytchinova IA. Computer aided selection of candidate vaccine antigens. Immunome Res 2010; 6 Suppl 2:S1. [PMID: 21067543 PMCID: PMC2981880 DOI: 10.1186/1745-7580-6-s2-s1] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Immunoinformatics is an emergent branch of informatics science that long ago pullulated from the tree of knowledge that is bioinformatics. It is a discipline which applies informatic techniques to problems of the immune system. To a great extent, immunoinformatics is typified by epitope prediction methods. It has found disappointingly limited use in the design and discovery of new vaccines, which is an area where proper computational support is generally lacking. Most extant vaccines are not based around isolated epitopes but rather correspond to chemically-treated or attenuated whole pathogens or correspond to individual proteins extract from whole pathogens or correspond to complex carbohydrate. In this chapter we attempt to review what progress there has been in an as-yet-underexplored area of immunoinformatics: the computational discovery of whole protein antigens. The effective development of antigen prediction methods would significantly reduce the laboratory resource required to identify pathogenic proteins as candidate subunit vaccines. We begin our review by placing antigen prediction firmly into context, exploring the role of reverse vaccinology in the design and discovery of vaccines. We also highlight several competing yet ultimately complementary methodological approaches: sub-cellular location prediction, identifying antigens using sequence similarity, and the use of sophisticated statistical approaches for predicting the probability of antigen characteristics. We end by exploring how a systems immunomics approach to the prediction of immunogenicity would prove helpful in the prediction of antigens.
Collapse
Affiliation(s)
- Darren R Flower
- School of Life and Health Sciences, University of Aston, Aston Triangle, Birmingham, B4 7ET, UK.
| | | | | | | | | |
Collapse
|
22
|
Strömbergsson H, Lapins M, Kleywegt GJ, Wikberg JES. Towards Proteome-Wide Interaction Models Using the Proteochemometrics Approach. Mol Inform 2010; 29:499-508. [PMID: 27463328 DOI: 10.1002/minf.201000052] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2010] [Accepted: 05/25/2010] [Indexed: 02/02/2023]
Abstract
A proteochemometrics model was induced from all interaction data in the BindingDB database, comprizing in all 7078 protein-ligand complexes with representatives from all major drug target categories. Proteins were represented by alignment-independent sequence descriptors holding information on properties such as hydrophobicity, charge, and secondary structure. Ligands were represented by commonly used QSAR descriptors. The inhibition constant (pKi ) values of protein-ligand complexes were discretized into "high" and "low" interaction activity. Different machine-learning techniques were used to induce models relating protein and ligand properties to the interaction activity. The best was decision trees, which gave an accuracy of 80 % and an area under the ROC curve of 0.81. The tree pointed to the protein and ligand properties, which are relevant for the interaction. As the approach does neither require alignments nor knowledge of protein 3D structures virtually all available protein-ligand interaction data could be utilized, thus opening a way to completely general interaction models that may span entire proteomes.
Collapse
Affiliation(s)
- Helena Strömbergsson
- The Linnaeus Centre for Bioinformatics, Department of Cell and Molecular Biology, Biomedical Centre, Box 598, SE-751 24, Uppsala, Sweden.
| | - Maris Lapins
- Department of Pharmaceutical Pharmacology, Biomedical Centre, Box 591, SE-751 24 Uppsala, Sweden
| | - Gerard J Kleywegt
- Department of Cell and Molecular Biology, Biomedical Centre, Box 596, SE-751 24, Uppsala, Sweden
| | - Jarl E S Wikberg
- Department of Pharmaceutical Pharmacology, Biomedical Centre, Box 591, SE-751 24 Uppsala, Sweden
| |
Collapse
|
23
|
Fernandez M, Ahmad S, Sarai A. Proteochemometric Recognition of Stable Kinase Inhibition Complexes Using Topological Autocorrelation and Support Vector Machines. J Chem Inf Model 2010; 50:1179-88. [DOI: 10.1021/ci1000532] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Michael Fernandez
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology (KIT), 680-4 Kawazu, Iizuka, 820-8502 Japan, and National Institute of Biomedical Innovation, 7-6-8, Saito-Asagi, Ibaraki-shi, Osaka 5670085, Japan
| | - Shandar Ahmad
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology (KIT), 680-4 Kawazu, Iizuka, 820-8502 Japan, and National Institute of Biomedical Innovation, 7-6-8, Saito-Asagi, Ibaraki-shi, Osaka 5670085, Japan
| | - Akinori Sarai
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology (KIT), 680-4 Kawazu, Iizuka, 820-8502 Japan, and National Institute of Biomedical Innovation, 7-6-8, Saito-Asagi, Ibaraki-shi, Osaka 5670085, Japan
| |
Collapse
|
24
|
Hvidsten TR, Kryshtafovych A, Fidelis K. Local descriptors of protein structure: a systematic analysis of the sequence-structure relationship in proteins using short- and long-range interactions. Proteins 2009; 75:870-84. [PMID: 19025980 DOI: 10.1002/prot.22296] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Local protein structure representations that incorporate long-range contacts between residues are often considered in protein structure comparison but have found relatively little use in structure prediction where assembly from single backbone fragments dominates. Here, we introduce the concept of local descriptors of protein structure to characterize local neighborhoods of amino acids including short- and long-range interactions. We build a library of recurring local descriptors and show that this library is general enough to allow assembly of unseen protein structures. The library could on average re-assemble 83% of 119 unseen structures, and showed little or no performance decrease between homologous targets and targets with folds not represented among domains used to build it. We then systematically evaluate the descriptor library to establish the level of the sequence signal in sets of protein fragments of similar geometrical conformation. In particular, we test whether that signal is strong enough to facilitate correct assignment and alignment of these local geometries to new sequences. We use the signal to assign descriptors to a test set of 479 sequences with less than 40% sequence identity to any domain used to build the library, and show that on average more than 50% of the backbone fragments constituting descriptors can be correctly aligned. We also use the assigned descriptors to infer SCOP folds, and show that correct predictions can be made in many of the 151 cases where PSI-BLAST was unable to detect significant sequence similarity to proteins in the library. Although the combinatorial problem of simultaneously aligning several fragments to sequence is a major bottleneck compared with single fragment methods, the advantage of the current approach is that correct alignments imply correct long range distance constraints. The lack of these constraints is most likely the major reason why structure prediction methods fail to consistently produce adequate models when good templates are unavailable or undetectable. Thus, we believe that the current study offers new and valuable insight into the prediction of sequence-structure relationships in proteins.
Collapse
|
25
|
Abstract
BACKGROUND Chemogenomics is an emerging inter-disciplinary approach to drug discovery that combines traditional ligand-based approaches with biological information on drug targets and lies at the interface of chemistry, biology and informatics. The ultimate goal in chemogenomics is to understand molecular recognition between all possible ligands and all possible drug targets. Protein and ligand space have previously been studied as separate entities, but chemogenomics studies deal with large datasets that cover parts of the joint protein-ligand space. Since drug discovery has traditionally focused on ligand optimization, the chemical space has been studied extensively. The protein space has been studied to some extent, typically for the purpose of classification of proteins into functional and structural classes. Since chemogenomics deals not only with ligands but also with the macromolecules the ligands interact with, it is of interest to find means to explore, compare and visualize protein-ligand subspaces. RESULTS Two chemogenomics protein-ligand interaction datasets were prepared for this study. The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB). The second dataset contains all approved drugs and drug targets stored in the DrugBank database, and represents the approved drug-drug target space. To capture biological and physicochemical features of the chemogenomics datasets, sequence-based descriptors were computed for the proteins, and 0, 1 and 2 dimensional descriptors for the ligands. Principal component analysis (PCA) was used to analyze the multidimensional data and to create global models of protein-ligand space. The nearest neighbour method, computed using the principal components, was used to obtain a measure of overlap between the datasets. CONCLUSION In this study, we present an approach to visualize protein-ligand spaces from a chemogenomics perspective, where both ligand and protein features are taken into account. The method can be applied to any protein-ligand interaction dataset. Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets. We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.
Collapse
Affiliation(s)
- Helena Strömbergsson
- Department of Cell and Molecular Biology/The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden
| | - Gerard J Kleywegt
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
26
|
Nigsch F, Macaluso NJM, Mitchell JBO, Zmuidinavicius D. Computational toxicology: an overview of the sources of data and of modelling methods. Expert Opin Drug Metab Toxicol 2009; 5:1-14. [PMID: 19236225 DOI: 10.1517/17425250802660467] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
BACKGROUND Toxicology has the goal of ensuring the safety of humans, animals and the environment. Computational toxicology is an area of active development and great potential. There are tangible reasons for the emerging interest in this discipline from academia, industry, regulatory bodies and governments. RESULTS Pharmaceuticals, personal health care products, nutritional ingredients and products of the chemical industries are all potential hazards and need to be assessed. Toxicological tests for these products are costly, frequently use laboratory animals and are time-consuming. This delays end-user access to improved products or, conversely, the timely withdrawal of dangerous substances from the market. The aim of computational toxicology is to accelerate the assessment of potentially dangerous substances through in silico models. CONCLUSIONS In this review, we provide an overview of the development of models for computational toxicology. Addressing the significant divide between the experimental and computational worlds-believed to be a prime hindrance to computational toxicology-we briefly consider the fundamental issue of toxicological data and the assays they stem from. Different kinds of models that can be built using such data are presented: computational filters, models for specific toxicological endpoints and tools for the generation of testable hypotheses.
Collapse
Affiliation(s)
- Florian Nigsch
- Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK.
| | | | | | | |
Collapse
|
27
|
Drug discovery using chemical systems biology: identification of the protein-ligand binding network to explain the side effects of CETP inhibitors. PLoS Comput Biol 2009; 5:e1000387. [PMID: 19436720 PMCID: PMC2676506 DOI: 10.1371/journal.pcbi.1000387] [Citation(s) in RCA: 185] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2009] [Accepted: 04/13/2009] [Indexed: 01/11/2023] Open
Abstract
Systematic identification of protein-drug interaction networks is crucial to correlate complex modes of drug action to clinical indications. We introduce a novel computational strategy to identify protein-ligand binding profiles on a genome-wide scale and apply it to elucidating the molecular mechanisms associated with the adverse drug effects of Cholesteryl Ester Transfer Protein (CETP) inhibitors. CETP inhibitors are a new class of preventive therapies for the treatment of cardiovascular disease. However, clinical studies indicated that one CETP inhibitor, Torcetrapib, has deadly off-target effects as a result of hypertension, and hence it has been withdrawn from phase III clinical trials. We have identified a panel of off-targets for Torcetrapib and other CETP inhibitors from the human structural genome and map those targets to biological pathways via the literature. The predicted protein-ligand network is consistent with experimental results from multiple sources and reveals that the side-effect of CETP inhibitors is modulated through the combinatorial control of multiple interconnected pathways. Given that combinatorial control is a common phenomenon observed in many biological processes, our findings suggest that adverse drug effects might be minimized by fine-tuning multiple off-target interactions using single or multiple therapies. This work extends the scope of chemogenomics approaches and exemplifies the role that systems biology has in the future of drug discovery. Both the cost to launch a new drug and the attrition rate during the late stage of the drug discovery and development process are increasing. Torcetrapib is a case in point, having been withdrawn from phase III clinical trials after 15 years of development and an estimated cost of US $800 M. Torcetrapib represents a new class of therapies for the treatment of cardiovascular disease; however, clinical studies indicated that Torcetrapib has deadly side-effects as a result of hypertension. To understand the origins of these adverse drug reactions from Torcetrapib and other related drugs undergoing clinical trials, we introduce a systematic strategy to identify off-targets in the human structural proteome and investigate the roles of these off-targets in impacting human physiology and pathology using biochemical pathway analysis. Our findings suggest that potential side-effects of a new drug can be identified at an early stage of the development cycle and be minimized by fine-tuning multiple off-target interactions. The hope is that this can reduce both the cost of drug development and the mortality rates during clinical trials.
Collapse
|
28
|
Bender A, Mikhailov D, Glick M, Scheiber J, Davies JW, Cleaver S, Marshall S, Tallarico JA, Harrington E, Cornella-Taracido I, Jenkins JL. Use of Ligand Based Models for Protein Domains To Predict Novel Molecular Targets and Applications To Triage Affinity Chromatography Data. J Proteome Res 2009; 8:2575-85. [DOI: 10.1021/pr900107z] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- Andreas Bender
- Center for Proteomic Chemistry, Lead Discovery Informatics, Developmental and Molecular Pathways, and Global Discovery Chemistry, Chemogenetics and Proteomics, Novartis Institutes for BioMedical Research, Inc., 250 Massachusetts Avenue, Cambridge, Massachusetts 02139
| | - Dmitri Mikhailov
- Center for Proteomic Chemistry, Lead Discovery Informatics, Developmental and Molecular Pathways, and Global Discovery Chemistry, Chemogenetics and Proteomics, Novartis Institutes for BioMedical Research, Inc., 250 Massachusetts Avenue, Cambridge, Massachusetts 02139
| | - Meir Glick
- Center for Proteomic Chemistry, Lead Discovery Informatics, Developmental and Molecular Pathways, and Global Discovery Chemistry, Chemogenetics and Proteomics, Novartis Institutes for BioMedical Research, Inc., 250 Massachusetts Avenue, Cambridge, Massachusetts 02139
| | - Josef Scheiber
- Center for Proteomic Chemistry, Lead Discovery Informatics, Developmental and Molecular Pathways, and Global Discovery Chemistry, Chemogenetics and Proteomics, Novartis Institutes for BioMedical Research, Inc., 250 Massachusetts Avenue, Cambridge, Massachusetts 02139
| | - John W. Davies
- Center for Proteomic Chemistry, Lead Discovery Informatics, Developmental and Molecular Pathways, and Global Discovery Chemistry, Chemogenetics and Proteomics, Novartis Institutes for BioMedical Research, Inc., 250 Massachusetts Avenue, Cambridge, Massachusetts 02139
| | - Stephen Cleaver
- Center for Proteomic Chemistry, Lead Discovery Informatics, Developmental and Molecular Pathways, and Global Discovery Chemistry, Chemogenetics and Proteomics, Novartis Institutes for BioMedical Research, Inc., 250 Massachusetts Avenue, Cambridge, Massachusetts 02139
| | - Stephen Marshall
- Center for Proteomic Chemistry, Lead Discovery Informatics, Developmental and Molecular Pathways, and Global Discovery Chemistry, Chemogenetics and Proteomics, Novartis Institutes for BioMedical Research, Inc., 250 Massachusetts Avenue, Cambridge, Massachusetts 02139
| | - John A. Tallarico
- Center for Proteomic Chemistry, Lead Discovery Informatics, Developmental and Molecular Pathways, and Global Discovery Chemistry, Chemogenetics and Proteomics, Novartis Institutes for BioMedical Research, Inc., 250 Massachusetts Avenue, Cambridge, Massachusetts 02139
| | - Edmund Harrington
- Center for Proteomic Chemistry, Lead Discovery Informatics, Developmental and Molecular Pathways, and Global Discovery Chemistry, Chemogenetics and Proteomics, Novartis Institutes for BioMedical Research, Inc., 250 Massachusetts Avenue, Cambridge, Massachusetts 02139
| | - Ivan Cornella-Taracido
- Center for Proteomic Chemistry, Lead Discovery Informatics, Developmental and Molecular Pathways, and Global Discovery Chemistry, Chemogenetics and Proteomics, Novartis Institutes for BioMedical Research, Inc., 250 Massachusetts Avenue, Cambridge, Massachusetts 02139
| | - Jeremy L. Jenkins
- Center for Proteomic Chemistry, Lead Discovery Informatics, Developmental and Molecular Pathways, and Global Discovery Chemistry, Chemogenetics and Proteomics, Novartis Institutes for BioMedical Research, Inc., 250 Massachusetts Avenue, Cambridge, Massachusetts 02139
| |
Collapse
|
29
|
Strömbergsson H, Daniluk P, Kryshtafovych A, Fidelis K, Wikberg JES, Kleywegt GJ, Hvidsten TR. Interaction model based on local protein substructures generalizes to the entire structural enzyme-ligand space. J Chem Inf Model 2008; 48:2278-88. [PMID: 18937438 DOI: 10.1021/ci800200e] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Chemogenomics is a new strategy in in silico drug discovery, where the ultimate goal is to understand molecular recognition for all molecules interacting with all proteins in the proteome. To study such cross interactions, methods that can generalize over proteins that vary greatly in sequence, structure, and function are needed. We present a general quantitative approach to protein-ligand binding affinity prediction that spans the entire structural enzyme-ligand space. The model was trained on a data set composed of all available enzymes cocrystallized with druglike ligands, taken from four publicly available interaction databases, for which a crystal structure is available. Each enzyme was characterized by a set of local descriptors of protein structure that describe the binding site of the cocrystallized ligand. The ligands in the training set were described by traditional QSAR descriptors. To evaluate the model, a comprehensive test set consisting of enzyme structures and ligands was manually curated. The test set contained enzyme-ligand complexes for which no crystal structures were available, and thus the binding modes were unknown. The test set enzymes were therefore characterized by matching their entire structures to the local descriptor library constructed from the training set. Both the training and the test set contained enzyme-ligand complexes from all major enzyme classes, and the enzymes spanned a large range of sequences and folds. The experimental binding affinities (p K i) ranged from 0.5 to 11.9 (0.7-11.0 in the test set). The induced model predicted the binding affinities of the external test set enzyme-ligand complexes with an r (2) of 0.53 and an RMSEP of 1.5. This demonstrates that the use of local descriptors makes it possible to create rough predictive models that can generalize over a wide range of protein targets.
Collapse
Affiliation(s)
- Helena Strömbergsson
- The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden, Department of Biophysics, Faculty of Physics, University of Warsaw, Warsaw, Poland
| | | | | | | | | | | | | |
Collapse
|
30
|
Gao HW, Xu Q, Chen L, Wang SL, Wang Y, Wu LL, Yuan Y. Potential protein toxicity of synthetic pigments: binding of poncean S to human serum albumin. Biophys J 2007; 94:906-17. [PMID: 17905844 PMCID: PMC2186231 DOI: 10.1529/biophysj.107.120865] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Using various methods, e.g., spectrophotometry, circular dichroism, and isothermal titration calorimetry, the interaction of poncean S (PS) with human serum albumin (HSA) was characterized at pH 1.81, 3.56, and 7.40 using the spectral correction technique, and Langmuir and Temkin isothermal models. The consistency among results concerning, e.g., binding number, binding energy, and type of binding, showed that ion pair electrostatic attraction fixed the position of PS in HSA and subsequently induced a combination of multiple noncovalent bonds such as H-bonds, hydrophobic interactions, and van der Waals forces. Ion pair attraction and H-bonds produced a stable PS-HSA complex and led to a marked change in the secondary structure of HSA in acidic media. The PS-HSA binding pattern and the process of change in HSA conformation were also investigated. The potentially toxic effect of PS on the transport function of HSA in a normal physiological environment was analyzed. This work provides a useful experimental strategy for studying the interaction of organic substances with biomacromolecules, helping us to understand the activity or mechanism of toxicity of an organic compound.
Collapse
Affiliation(s)
- Hong-Wen Gao
- State Key Laboratory of Pollution Control and Resource Reuse, College of Environmental Science and Engineering, Shanghai 200092, PR China.
| | | | | | | | | | | | | |
Collapse
|