1
|
Mansoldo FRP, Carta F, Angeli A, Cardoso VDS, Supuran CT, Vermelho AB. Chagas Disease: Perspectives on the Past and Present and Challenges in Drug Discovery. Molecules 2020; 25:E5483. [PMID: 33238613 PMCID: PMC7700143 DOI: 10.3390/molecules25225483] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 11/19/2020] [Accepted: 11/20/2020] [Indexed: 12/20/2022] Open
Abstract
Chagas disease still has no effective treatment option for all of its phases despite being discovered more than 100 years ago. The development of commercial drugs has been stagnating since the 1960s, a fact that sheds light on the question of how drug discovery research has progressed and taken advantage of technological advances. Could it be that technological advances have not yet been sufficient to resolve this issue or is there a lack of protocol, validation and standardization of the data generated by different research teams? This work presents an overview of commercial drugs and those that have been evaluated in studies and clinical trials so far. A brief review is made of recent target-based and phenotypic studies based on the search for molecules with anti-Trypanosoma cruzi action. It also discusses how proteochemometric (PCM) modeling and microcrystal electron diffraction (MicroED) can help in the case of the lack of a 3D protein structure; more specifically, Trypanosoma cruzi carbonic anhydrase.
Collapse
Affiliation(s)
- Felipe Raposo Passos Mansoldo
- BIOINOVAR-Biocatalysis, Bioproducts and Bioenergy, Institute of Microbiology Paulo de Góes, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro 21941-902, Brazil; (F.R.P.M.); (V.d.S.C.)
| | - Fabrizio Carta
- Neurofarba Department, Università degli Studi di Firenze, Sezione di Scienze Farmaceutiche, Via Ugo Schiff 6, 50019 Sesto Fiorentino (Florence), Italy; (F.C.); (A.A.)
| | - Andrea Angeli
- Neurofarba Department, Università degli Studi di Firenze, Sezione di Scienze Farmaceutiche, Via Ugo Schiff 6, 50019 Sesto Fiorentino (Florence), Italy; (F.C.); (A.A.)
- Centre of Advanced Research in Bionanoconjugates and Biopolymers Department, “Petru Poni” Institute of Macromolecular Chemistry, 700487 Iasi, Romania
| | - Veronica da Silva Cardoso
- BIOINOVAR-Biocatalysis, Bioproducts and Bioenergy, Institute of Microbiology Paulo de Góes, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro 21941-902, Brazil; (F.R.P.M.); (V.d.S.C.)
| | - Claudiu T. Supuran
- Neurofarba Department, Università degli Studi di Firenze, Sezione di Scienze Farmaceutiche, Via Ugo Schiff 6, 50019 Sesto Fiorentino (Florence), Italy; (F.C.); (A.A.)
| | - Alane Beatriz Vermelho
- BIOINOVAR-Biocatalysis, Bioproducts and Bioenergy, Institute of Microbiology Paulo de Góes, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro 21941-902, Brazil; (F.R.P.M.); (V.d.S.C.)
| |
Collapse
|
2
|
Moumbock AF, Li J, Mishra P, Gao M, Günther S. Current computational methods for predicting protein interactions of natural products. Comput Struct Biotechnol J 2019; 17:1367-1376. [PMID: 31762960 PMCID: PMC6861622 DOI: 10.1016/j.csbj.2019.08.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 08/09/2019] [Accepted: 08/23/2019] [Indexed: 01/08/2023] Open
Abstract
Natural products (NPs) are an indispensable source of drugs and they have a better coverage of the pharmacological space than synthetic compounds, owing to their high structural diversity. The prediction of their interaction profiles with druggable protein targets remains a major challenge in modern drug discovery. Experimental (off-)target predictions of NPs are cost- and time-consuming, whereas computational methods, on the other hand, are much faster and cheaper. As a result, computational predictions are preferentially used in the first instance for NP profiling, prior to experimental validations. This review covers recent advances in computational approaches which have been developed to aid the annotation of unknown drug-target interactions (DTIs), by focusing on three broad classes, namely: ligand-based, target-based, and target-ligand-based (hybrid) approaches. Computational DTI prediction methods have the potential to significantly advance the discovery and development of novel selective drugs exhibiting minimal side effects. We highlight some inherent caveats of these methods which must be overcome to enable them to realize their full potential, and a future outlook is given.
Collapse
Affiliation(s)
| | | | | | | | - Stefan Günther
- Institute of Pharmaceutical Sciences, Research Group Pharmaceutical Bioinformatics, Albert-Ludwigs-Universität Freiburg, Germany
| |
Collapse
|
3
|
Structural and conformational determinants of macrocycle cell permeability. Nat Chem Biol 2016; 12:1065-1074. [DOI: 10.1038/nchembio.2203] [Citation(s) in RCA: 119] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2015] [Accepted: 08/04/2016] [Indexed: 12/31/2022]
|
4
|
Qiu T, Qiu J, Feng J, Wu D, Yang Y, Tang K, Cao Z, Zhu R. The recent progress in proteochemometric modelling: focusing on target descriptors, cross-term descriptors and application scope. Brief Bioinform 2016; 18:125-136. [PMID: 26873661 DOI: 10.1093/bib/bbw004] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Revised: 12/09/2015] [Indexed: 12/17/2022] Open
Abstract
As an extension of the conventional quantitative structure activity relationship models, proteochemometric (PCM) modelling is a computational method that can predict the bioactivity relations between multiple ligands and multiple targets. Traditional PCM modelling includes three essential elements: descriptors (including target descriptors, ligand descriptors and cross-term descriptors), bioactivity data and appropriate learning functions that link the descriptors to the bioactivity data. Since its appearance, PCM modelling has developed rapidly over the past decade by taking advantage of the progress of different descriptors and machine learning techniques, along with the increasing amounts of available bioactivity data. Specifically, the new emerging target descriptors and cross-term descriptors not only significantly increased the performance of PCM modelling but also expanded its application scope from traditional protein-ligand interaction to more abundant interactions, including protein-peptide, protein-DNA and even protein-protein interactions. In this review, target descriptors and cross-term descriptors, as well as the corresponding application scope, are intensively summarized. Additionally, we look forward to seeing PCM modelling extend into new application scopes, such as Target-Catalyst-Ligand systems, with the further development of descriptors, machine learning techniques and increasing amounts of available bioactivity data.
Collapse
|
5
|
Baumann D, Baumann K. Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation. J Cheminform 2014; 6:47. [PMID: 25506400 PMCID: PMC4260165 DOI: 10.1186/s13321-014-0047-1] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Accepted: 10/30/2014] [Indexed: 01/17/2023] Open
Abstract
Background Generally, QSAR modelling requires both model selection and validation since there is no a priori knowledge about the optimal QSAR model. Prediction errors (PE) are frequently used to select and to assess the models under study. Reliable estimation of prediction errors is challenging – especially under model uncertainty – and requires independent test objects. These test objects must not be involved in model building nor in model selection. Double cross-validation, sometimes also termed nested cross-validation, offers an attractive possibility to generate test data and to select QSAR models since it uses the data very efficiently. Nevertheless, there is a controversy in the literature with respect to the reliability of double cross-validation under model uncertainty. Moreover, systematic studies investigating the adequate parameterization of double cross-validation are still missing. Here, the cross-validation design in the inner loop and the influence of the test set size in the outer loop is systematically studied for regression models in combination with variable selection. Methods Simulated and real data are analysed with double cross-validation to identify important factors for the resulting model quality. For the simulated data, a bias-variance decomposition is provided. Results The prediction errors of QSAR/QSPR regression models in combination with variable selection depend to a large degree on the parameterization of double cross-validation. While the parameters for the inner loop of double cross-validation mainly influence bias and variance of the resulting models, the parameters for the outer loop mainly influence the variability of the resulting prediction error estimate. Conclusions Double cross-validation reliably and unbiasedly estimates prediction errors under model uncertainty for regression models. As compared to a single test set, double cross-validation provided a more realistic picture of model quality and should be preferred over a single test set. Electronic supplementary material The online version of this article (doi:10.1186/s13321-014-0047-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Désirée Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, D-38106 Braunschweig, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, D-38106 Braunschweig, Germany
| |
Collapse
|
6
|
Nabu S, Nantasenamat C, Owasirikul W, Lawung R, Isarankura-Na-Ayudhya C, Lapins M, Wikberg JES, Prachayasittikul V. Proteochemometric model for predicting the inhibition of penicillin-binding proteins. J Comput Aided Mol Des 2014; 29:127-41. [DOI: 10.1007/s10822-014-9809-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2014] [Accepted: 10/21/2014] [Indexed: 12/17/2022]
|
7
|
Mateus A, Matsson P, Artursson P. Rapid Measurement of Intracellular Unbound Drug Concentrations. Mol Pharm 2013; 10:2467-78. [DOI: 10.1021/mp4000822] [Citation(s) in RCA: 119] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- André Mateus
- Department of Pharmacy, Uppsala University, Box 580, SE-751 23 Uppsala, Sweden
- Research Institute for Medicines
and Pharmaceutical Sciences (iMed.UL), Faculty of Pharmacy, University
of Lisbon, 1649-003 Lisbon, Portugal
| | - Pär Matsson
- Department of Pharmacy, Uppsala University, Box 580, SE-751 23 Uppsala, Sweden
- Uppsala University Drug Optimization
and Pharmaceutical Profiling Platform (UDOPP)—a node of the
Chemical Biology Consortium Sweden (CBCS), Department of Pharmacy, Uppsala University, 751 23 Uppsala, Sweden
| | - Per Artursson
- Department of Pharmacy, Uppsala University, Box 580, SE-751 23 Uppsala, Sweden
- Uppsala University Drug Optimization
and Pharmaceutical Profiling Platform (UDOPP)—a node of the
Chemical Biology Consortium Sweden (CBCS), Department of Pharmacy, Uppsala University, 751 23 Uppsala, Sweden
| |
Collapse
|
8
|
Gao J, Huang Q, Wu D, Zhang Q, Zhang Y, Chen T, Liu Q, Zhu R, Cao Z, He Y. Study on human GPCR–inhibitor interactions by proteochemometric modeling. Gene 2013; 518:124-31. [DOI: 10.1016/j.gene.2012.11.061] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2012] [Accepted: 11/27/2012] [Indexed: 11/15/2022]
|
9
|
Eklund M, Norinder U, Boyer S, Carlsson L. Benchmarking Variable Selection in QSAR. Mol Inform 2012; 31:173-9. [DOI: 10.1002/minf.201100142] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Accepted: 12/28/2011] [Indexed: 11/07/2022]
|
10
|
Mining of miRNAs and potential targets from gene oriented clusters of transcripts sequences of the anti-malarial plant, Artemisia annua. Biotechnol Lett 2011; 34:737-45. [DOI: 10.1007/s10529-011-0808-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Accepted: 11/21/2011] [Indexed: 12/11/2022]
|
11
|
Kido Y, Matsson P, Giacomini KM. Profiling of a prescription drug library for potential renal drug-drug interactions mediated by the organic cation transporter 2. J Med Chem 2011; 54:4548-58. [PMID: 21599003 DOI: 10.1021/jm2001629] [Citation(s) in RCA: 116] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Drug-drug interactions (DDIs) are major causes of serious adverse drug reactions. Most DDIs have a pharmacokinetic basis in which one drug reduces the elimination of a second drug, leading to potentially toxic drug levels. As a major organ of drug elimination, the kidney represents an important site for DDIs. Here, we screened a prescription drug library against the renal organic cation transporter OCT2/SLC22A2, which mediates the first step in the renal secretion of many cationic drugs. Of the 910 compounds screened, 244 inhibited OCT2. Computational analyses revealed key properties of inhibitors versus noninhibitors, which included overall molecular charge. Four of six potential clinical inhibitors were transporter-selective in follow-up screens against additional transporters: OCT1/SLC22A1, MATE1/SLC47A1, and MATE2-K/SLC47A2. Two compounds showed different kinetics of interaction with the common polymorphism OCT2-A270S, suggesting a role of genetics in modulating renal DDIs.
Collapse
Affiliation(s)
- Yasuto Kido
- Department of Bioengineering and Therapeutic Sciences, University of California-San Francisco, San Francisco, California 94143, United States
| | | | | |
Collapse
|
12
|
Spjuth O, Eklund M, Lapins M, Junaid M, Wikberg JES. Services for prediction of drug susceptibility for HIV proteases and reverse transcriptases at the HIV drug research centre. Bioinformatics 2011; 27:1719-20. [PMID: 21493651 DOI: 10.1093/bioinformatics/btr192] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
SUMMARY The HIV Drug Research Centre (HIVDRC) has established Web services for prediction of drug susceptibility for HIV proteases and reverse transcriptases. The services are based on two proteochemometric models which accepts a protease or reverse transcriptase sequence in amino acid form, and outputs the predicted drug susceptibility values. The predictions are based on a comprehensive analysis where all the relevant inhibitors are included, resulting in models with excellent predictive capabilities. AVAILABILITY AND IMPLEMENTATION The services are implemented as interoperable Web services (REST and XMPP), with supporting web pages to allow for individual analyses. A set of plugins were also developed which make the services available from the Bioclipse workbench for life science. Services are available at http://www.hivdrc.org/services.
Collapse
Affiliation(s)
- Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden.
| | | | | | | | | |
Collapse
|
13
|
Wang L, Chu F. Extracting very simple diagnostic rules from microarray data. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2010; 2010:807-10. [PMID: 21096115 DOI: 10.1109/iembs.2010.5626565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We present an approach to deriving very simple classification rules from microarray data by first selecting very small gene subsets that can ensure highly accurate classification of cancers. Finding such minimum gene subsets can greatly reduce the computational load and "noise" arising from irrelevant genes. The derived simple classification rules allow for accurate diagnosis without the need for any classifiers. This work can simplify gene expression tests by including only a very small number of genes rather than thousands or tens of thousands of genes, which can significantly bring down the cost for cancer testing. These studies also call for further investigations into possible biological relationship between these small number of genes and cancer development and treatment. For example, we report the following simple, and yet 100% accurate, diagnostic rules involving only 2 genes to separate the 3 types of lymphoma patients: the patient has diffuse large B-cell lymphoma (DLBCL), if and only if the expression level of gene GENE1622X is greater than -0.75; the patient has chronic lymphocytic leukaemia (CLL), if and only if the expression level of gene GENE540X is less than -1; and the patient has follicular lymphoma (FL) otherwise, i.e., if and only if the expression level of gene GENE1622X is less than -0.75 and the expression level of gene GENE540X is greater than -1.
Collapse
Affiliation(s)
- Lipo Wang
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798.
| | | |
Collapse
|
14
|
Lapins M, Wikberg JE. Kinome-wide interaction modelling using alignment-based and alignment-independent approaches for kinase description and linear and non-linear data analysis techniques. BMC Bioinformatics 2010; 11:339. [PMID: 20569422 PMCID: PMC2910025 DOI: 10.1186/1471-2105-11-339] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2009] [Accepted: 06/22/2010] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Protein kinases play crucial roles in cell growth, differentiation, and apoptosis. Abnormal function of protein kinases can lead to many serious diseases, such as cancer. Kinase inhibitors have potential for treatment of these diseases. However, current inhibitors interact with a broad variety of kinases and interfere with multiple vital cellular processes, which causes toxic effects. Bioinformatics approaches that can predict inhibitor-kinase interactions from the chemical properties of the inhibitors and the kinase macromolecules might aid in design of more selective therapeutic agents, that show better efficacy and lower toxicity. RESULTS We applied proteochemometric modelling to correlate the properties of 317 wild-type and mutated kinases and 38 inhibitors (12,046 inhibitor-kinase combinations) to the respective combination's interaction dissociation constant (Kd). We compared six approaches for description of protein kinases and several linear and non-linear correlation methods. The best performing models encoded kinase sequences with amino acid physico-chemical z-scale descriptors and used support vector machines or partial least- squares projections to latent structures for the correlations. Modelling performance was estimated by double cross-validation. The best models showed high predictive ability; the squared correlation coefficient for new kinase-inhibitor pairs ranging P2 = 0.67-0.73; for new kinases it ranged P2kin = 0.65-0.70. Models could also separate interacting from non-interacting inhibitor-kinase pairs with high sensitivity and specificity; the areas under the ROC curves ranging AUC = 0.92-0.93. We also investigated the relationship between the number of protein kinases in the dataset and the modelling results. Using only 10% of all data still a valid model was obtained with P2 = 0.47, P2kin = 0.42 and AUC = 0.83. CONCLUSIONS Our results strongly support the applicability of proteochemometrics for kinome-wide interaction modelling. Proteochemometrics might be used to speed-up identification and optimization of protein kinase targeted and multi-targeted inhibitors.
Collapse
Affiliation(s)
- Maris Lapins
- Department of Pharmaceutical Pharmacology, Uppsala University, Sweden
| | | |
Collapse
|
15
|
Van Westen GJP, Wegner JK, IJzerman AP, Van Vlijmen HWT, Bender A. Molecular bioactivity extrapolation to novel targets by support vector machines. J Cheminform 2010. [PMCID: PMC2867134 DOI: 10.1186/1758-2946-2-s1-o3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
16
|
Hammerling U, Tallsjö A, Grafström R, Ilbäck NG. Comparative Hazard Characterization in Food Toxicology. Crit Rev Food Sci Nutr 2009; 49:626-69. [DOI: 10.1080/10408390802145617] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
17
|
Jacob L, Hoffmann B, Stoven V, Vert JP. Virtual screening of GPCRs: an in silico chemogenomics approach. BMC Bioinformatics 2008; 9:363. [PMID: 18775075 PMCID: PMC2553090 DOI: 10.1186/1471-2105-9-363] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2008] [Accepted: 09/06/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The G-protein coupled receptor (GPCR) superfamily is currently the largest class of therapeutic targets. In silico prediction of interactions between GPCRs and small molecules in the transmembrane ligand-binding site is therefore a crucial step in the drug discovery process, which remains a daunting task due to the difficulty to characterize the 3D structure of most GPCRs, and to the limited amount of known ligands for some members of the superfamily. Chemogenomics, which attempts to characterize interactions between all members of a target class and all small molecules simultaneously, has recently been proposed as an interesting alternative to traditional docking or ligand-based virtual screening strategies. RESULTS We show that interaction prediction in the chemogenomics framework outperforms state-of-the-art individual ligand-based methods in accuracy both for receptor with known ligands and without known ligands. This is done with no knowledge of the receptor 3D structure. In particular we are able to predict ligands of orphan GPCRs with an estimated accuracy of 78.1%. CONCLUSION We propose new methods for in silico chemogenomics and validate them on the virtual screening of GPCRs. The methods represent an extension of a recently proposed machine learning strategy, based on support vector machines (SVM), which provides a flexible framework to incorporate various information sources on the biological space of targets and on the chemical space of small molecules. We investigate the use of 2D and 3D descriptors for small molecules, and test a variety of descriptors for GPCRs. We show that incorporating information about the known hierarchical classification of the target family and about key residues in their inferred binding pockets significantly improves the prediction accuracy of our model.
Collapse
Affiliation(s)
- Laurent Jacob
- Mines ParisTech, Centre for Computational Biology, 35 rue Saint-Honoré, F-77305, Fontainebleau, France.
| | | | | | | |
Collapse
|
18
|
Eklund M, Spjuth O, Wikberg JE. The C1C2: a framework for simultaneous model selection and assessment. BMC Bioinformatics 2008; 9:360. [PMID: 18761753 PMCID: PMC2556350 DOI: 10.1186/1471-2105-9-360] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2008] [Accepted: 09/02/2008] [Indexed: 11/12/2022] Open
Abstract
Background There has been recent concern regarding the inability of predictive modeling approaches to generalize to new data. Some of the problems can be attributed to improper methods for model selection and assessment. Here, we have addressed this issue by introducing a novel and general framework, the C1C2, for simultaneous model selection and assessment. The framework relies on a partitioning of the data in order to separate model choice from model assessment in terms of used data. Since the number of conceivable models in general is vast, it was also of interest to investigate the employment of two automatic search methods, a genetic algorithm and a brute-force method, for model choice. As a demonstration, the C1C2 was applied to simulated and real-world datasets. A penalized linear model was assumed to reasonably approximate the true relation between the dependent and independent variables, thus reducing the model choice problem to a matter of variable selection and choice of penalizing parameter. We also studied the impact of assuming prior knowledge about the number of relevant variables on model choice and generalization error estimates. The results obtained with the C1C2 were compared to those obtained by employing repeated K-fold cross-validation for choosing and assessing a model. Results The C1C2 framework performed well at finding the true model in terms of choosing the correct variable subset and producing reasonable choices for the penalizing parameter, even in situations when the independent variables were highly correlated and when the number of observations was less than the number of variables. The C1C2 framework was also found to give accurate estimates of the generalization error. Prior information about the number of important independent variables improved the variable subset choice but reduced the accuracy of generalization error estimates. Using the genetic algorithm worsened the model choice but not the generalization error estimates, compared to using the brute-force method. The results obtained with repeated K-fold cross-validation were similar to those produced by the C1C2 in terms of model choice, however a lower accuracy of the generalization error estimates was observed. Conclusion The C1C2 framework was demonstrated to work well for finding the true model within a penalized linear model class and accurately assess its generalization error, even for datasets with many highly correlated independent variables, a low observation-to-variable ratio, and model assumption deviations. A complete separation of the model choice and the model assessment in terms of data used for each task improves the estimates of the generalization error.
Collapse
Affiliation(s)
- Martin Eklund
- Department of Pharmaceutical Pharmacology, Uppsala University, Box 591, BMC, SE-751 24 Uppsala, Sweden.
| | | | | |
Collapse
|
19
|
Davies MN, Secker A, Halling-Brown M, Moss DS, Freitas AA, Timmis J, Clark E, Flower DR. GPCRTree: online hierarchical classification of GPCR function. BMC Res Notes 2008; 1:67. [PMID: 18717986 PMCID: PMC2547103 DOI: 10.1186/1756-0500-1-67] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2008] [Accepted: 08/21/2008] [Indexed: 11/25/2022] Open
Abstract
Background G protein-coupled receptors (GPCRs) play important physiological roles transducing extracellular signals into intracellular responses. Approximately 50% of all marketed drugs target a GPCR. There remains considerable interest in effectively predicting the function of a GPCR from its primary sequence. Findings Using techniques drawn from data mining and proteochemometrics, an alignment-free approach to GPCR classification has been devised. It uses a simple representation of a protein's physical properties. GPCRTree, a publicly-available internet server, implements an algorithm that classifies GPCRs at the class, sub-family and sub-subfamily level. Conclusion A selective top-down classifier was developed which assigns sequences within a GPCR hierarchy. Compared to other publicly available GPCR prediction servers, GPCRTree is considerably more accurate at every level of classification. The server has been available online since March 2008 at URL: .
Collapse
Affiliation(s)
- Matthew N Davies
- The Jenner Institute, University of Oxford, Compton, Newbury, Berkshire, RG20 7NN, UK.
| | | | | | | | | | | | | | | |
Collapse
|
20
|
Lapins M, Eklund M, Spjuth O, Prusis P, Wikberg JES. Proteochemometric modeling of HIV protease susceptibility. BMC Bioinformatics 2008; 9:181. [PMID: 18402661 PMCID: PMC2375133 DOI: 10.1186/1471-2105-9-181] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2007] [Accepted: 04/10/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A major obstacle in treatment of HIV is the ability of the virus to mutate rapidly into drug-resistant variants. A method for predicting the susceptibility of mutated HIV strains to antiviral agents would provide substantial clinical benefit as well as facilitate the development of new candidate drugs. Therefore, we used proteochemometrics to model the susceptibility of HIV to protease inhibitors in current use, utilizing descriptions of the physico-chemical properties of mutated HIV proteases and 3D structural property descriptions for the protease inhibitors. The descriptions were correlated to the susceptibility data of 828 unique HIV protease variants for seven protease inhibitors in current use; the data set comprised 4792 protease-inhibitor combinations. RESULTS The model provided excellent predictability (R2 = 0.92, Q2 = 0.87) and identified general and specific features of drug resistance. The model's predictive ability was verified by external prediction in which the susceptibilities to each one of the seven inhibitors were omitted from the data set, one inhibitor at a time, and the data for the six remaining compounds were used to create new models. This analysis showed that the over all predictive ability for the omitted inhibitors was Q2 inhibitors = 0.72. CONCLUSION Our results show that a proteochemometric approach can provide generalized susceptibility predictions for new inhibitors. Our proteochemometric model can directly analyze inhibitor-protease interactions and facilitate treatment selection based on viral genotype. The model is available for public use, and is located at HIV Drug Research Centre.
Collapse
Affiliation(s)
- Maris Lapins
- Department of Pharmaceutical Pharmacology, Uppsala University, SE-751 24, Sweden.
| | | | | | | | | |
Collapse
|
21
|
Kontijevskis A, Wikberg JES, Komorowski J. Computational proteomics analysis of HIV-1 protease interactome. Proteins 2007; 68:305-12. [PMID: 17427231 DOI: 10.1002/prot.21415] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
HIV-1 protease is a small homodimeric enzyme that ensures maturation of HIV virions by cleaving the viral precursor Gag and Gag-Pol polyproteins into structural and functional elements. The cleavage sites in the viral polyproteins share neither sequence homology nor binding motif and the specificity of the HIV-1 protease is therefore only partially understood. Using an extensive data set collected from 16 years of HIV proteome research we have here created a general and predictive rule-based model for HIV-1 protease specificity based on rough sets. We demonstrate that HIV-1 protease specificity is much more complex than previously anticipated, which cannot be defined based solely on the amino acids at the substrate's scissile bond or by any other single substrate amino acid position only. Our results show that the combination of at least three particular amino acids is needed in the substrate for a cleavage event to occur. Only by combining and analyzing massive amounts of HIV proteome data it was possible to discover these novel and general patterns of physico-chemical substrate cleavage determinants. Our study is an example how computational biology methods can advance the understanding of the viral interactomes.
Collapse
|
22
|
Strachan RT, Ferrara G, Roth BL. Screening the receptorome: an efficient approach for drug discovery and target validation. Drug Discov Today 2006; 11:708-16. [PMID: 16846798 DOI: 10.1016/j.drudis.2006.06.012] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2006] [Revised: 06/02/2006] [Accepted: 06/16/2006] [Indexed: 11/18/2022]
Abstract
The receptorome, comprising at least 5% of the human genome, encodes receptors that mediate the physiological, pathological and therapeutic responses to a vast number of exogenous and endogenous ligands. Not surprisingly, the majority of approved medications target members of the receptorome. Several in silico and physical screening approaches have been devised to mine the receptorome efficiently for the discovery and validation of molecular targets for therapeutic drug discovery. Receptorome screening has also been used to discover, and thereby avoid, the molecular targets responsible for serious and unforeseen drug side effects.
Collapse
Affiliation(s)
- Ryan T Strachan
- Department of Biochemistry, Comprehensive Cancer Center and NIMH Psychoactive Drug Screening Program, Case Western Reserve University Medical School, Cleveland, OH 44106, USA
| | | | | |
Collapse
|
23
|
Abstract
The term 'receptorome' is now being used to describe receptors, ion channels and transporters in the human genome that are potential drug targets. These proteins comprise a considerable fraction of the human genome, and include the G protein-coupled receptors, which are the targets for many medications. In this review, we summarize recent advances in the field, including the concept that the ultimate goal of drug discovery may not be the development of highly selective single-target drugs, the idea that potential side-effects can also be the goal of multi-target drug screening, and a discussion of the application of computational screening and public domain databases available to interested investigators.
Collapse
Affiliation(s)
- Wesley K Kroeze
- Department of Biochemistry, Case Western Reserve University Medical School, Cleveland, OH 44106, USA
| | | |
Collapse
|
24
|
Soeria-Atmadja D, Wallman M, Björklund AK, Isaksson A, Hammerling U, Gustafsson MG. External cross-validation for unbiased evaluation of protein family detectors: application to allergens. Proteins 2006; 61:918-25. [PMID: 16231294 DOI: 10.1002/prot.20656] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Key issues in protein science and computational biology are design and evaluation of algorithms aimed at detection of proteins that belong to a specific family, as defined by structural, evolutionary, or functional criteria. In this context, several validation techniques are often used to compare different parameter settings of the detector, and to subsequently select the setting that yields the smallest error rate estimate. A frequently overlooked problem associated with this approach is that this smallest error rate estimate may have a large optimistic bias. Based on computer simulations, we show that a detector's error rate estimate can be overly optimistic and propose a method to obtain unbiased performance estimates of a detector design procedure. The method is founded on an external 10-fold cross-validation (CV) loop that embeds an internal validation procedure used for parameter selection in detector design. The designed detector generated in each of the 10 iterations are evaluated on held-out examples exclusively available in the external CV iterations. Notably, the average of these 10 performance estimates is not associated with a final detector, but rather with the average performance of the design procedure used. We apply the external CV loop to the particular problem of detecting potentially allergenic proteins, using a previously reported design procedure. Unbiased performance estimates of the allergen detector design procedure are presented together with information about which algorithms and parameter settings that are most frequently selected.
Collapse
|
25
|
Roth BL. Receptor systems: will mining the receptorome yield novel targets for pharmacotherapy? Pharmacol Ther 2006; 108:59-64. [PMID: 16083965 DOI: 10.1016/j.pharmthera.2005.06.013] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2005] [Accepted: 06/23/2005] [Indexed: 10/25/2022]
Abstract
We have recently defined the receptorome as 'that part of the proteome encoding receptors'. In this article, I provide a general overview of the members of the receptorome as well as methods used to screen the receptorome-both in silico and physically. Case histories of receptorome-based discovery efforts are then highlighted and the relevance of this approach to the discovery and validation of molecular targets for drug abuse treatment is emphasized.
Collapse
Affiliation(s)
- Bryan L Roth
- Department of Biochemistry, Case Western Reserve University Medical School, Cleveland, OH 44106, USA.
| |
Collapse
|
26
|
Gustafsson MG. Independent Component Analysis Yields Chemically Interpretable Latent Variables in Multivariate Regression. J Chem Inf Model 2005; 45:1244-55. [PMID: 16180901 DOI: 10.1021/ci050146n] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
This work shows that independent component analysis (ICA) can be used to obtain statistically independent and, therefore, chemically interpretable latent variables (LVs) in multivariate regression. Two novel algorithms based on ICA are introduced and compared with two classical methods on simulated data: principal component regression and partial least-squares regression. All methods compared yield accurate predictions, but only those based on ICA yield LVs that are chemically interpretable. Practical limitations of ICA-based regression with respect to the underlying assumptions, sample size, and measurement noise are discussed and illustrated by means of simulations.
Collapse
Affiliation(s)
- Mats G Gustafsson
- Uppsala University, Department of Engineering Sciences, Box 528, 751 20 Uppsala, Sweden.
| |
Collapse
|