Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Freyhult E, Prusis P, Lapinsh M, Wikberg JES, Moulton V, Gustafsson MG. Unbiased descriptor and parameter selection confirms the potential of proteochemometric modelling. BMC Bioinformatics 2005;6:50. [PMID: 15760465 PMCID: PMC555743 DOI: 10.1186/1471-2105-6-50] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2004] [Accepted: 03/10/2005] [Indexed: 12/05/2022] Open

For:	Freyhult E, Prusis P, Lapinsh M, Wikberg JES, Moulton V, Gustafsson MG. Unbiased descriptor and parameter selection confirms the potential of proteochemometric modelling. BMC Bioinformatics 2005;6:50. [PMID: 15760465 PMCID: PMC555743 DOI: 10.1186/1471-2105-6-50] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2004] [Accepted: 03/10/2005] [Indexed: 12/05/2022] Open

Number

Cited by Other Article(s)

Mansoldo FRP, Carta F, Angeli A, Cardoso VDS, Supuran CT, Vermelho AB. Chagas Disease: Perspectives on the Past and Present and Challenges in Drug Discovery. Molecules 2020;25:E5483. [PMID: 33238613 PMCID: PMC7700143 DOI: 10.3390/molecules25225483] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 11/19/2020] [Accepted: 11/20/2020] [Indexed: 12/20/2022] Open

Moumbock AF, Li J, Mishra P, Gao M, Günther S. Current computational methods for predicting protein interactions of natural products. Comput Struct Biotechnol J 2019;17:1367-1376. [PMID: 31762960 PMCID: PMC6861622 DOI: 10.1016/j.csbj.2019.08.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 08/09/2019] [Accepted: 08/23/2019] [Indexed: 01/08/2023] Open

Structural and conformational determinants of macrocycle cell permeability. Nat Chem Biol 2016;12:1065-1074. [DOI: 10.1038/nchembio.2203] [Citation(s) in RCA: 119] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2015] [Accepted: 08/04/2016] [Indexed: 12/31/2022]

Qiu T, Qiu J, Feng J, Wu D, Yang Y, Tang K, Cao Z, Zhu R. The recent progress in proteochemometric modelling: focusing on target descriptors, cross-term descriptors and application scope. Brief Bioinform 2016;18:125-136. [PMID: 26873661 DOI: 10.1093/bib/bbw004] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Revised: 12/09/2015] [Indexed: 12/17/2022] Open

Baumann D, Baumann K. Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation. J Cheminform 2014;6:47. [PMID: 25506400 PMCID: PMC4260165 DOI: 10.1186/s13321-014-0047-1] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Accepted: 10/30/2014] [Indexed: 01/17/2023] Open

Abstract

Background

Generally, QSAR modelling requires both model selection and validation since there is no a priori knowledge about the optimal QSAR model. Prediction errors (PE) are frequently used to select and to assess the models under study. Reliable estimation of prediction errors is challenging – especially under model uncertainty – and requires independent test objects. These test objects must not be involved in model building nor in model selection. Double cross-validation, sometimes also termed nested cross-validation, offers an attractive possibility to generate test data and to select QSAR models since it uses the data very efficiently. Nevertheless, there is a controversy in the literature with respect to the reliability of double cross-validation under model uncertainty. Moreover, systematic studies investigating the adequate parameterization of double cross-validation are still missing. Here, the cross-validation design in the inner loop and the influence of the test set size in the outer loop is systematically studied for regression models in combination with variable selection.

Methods

Simulated and real data are analysed with double cross-validation to identify important factors for the resulting model quality. For the simulated data, a bias-variance decomposition is provided.

Results

The prediction errors of QSAR/QSPR regression models in combination with variable selection depend to a large degree on the parameterization of double cross-validation. While the parameters for the inner loop of double cross-validation mainly influence bias and variance of the resulting models, the parameters for the outer loop mainly influence the variability of the resulting prediction error estimate.

Conclusions

Double cross-validation reliably and unbiasedly estimates prediction errors under model uncertainty for regression models. As compared to a single test set, double cross-validation provided a more realistic picture of model quality and should be preferred over a single test set.

Electronic supplementary material

The online version of this article (doi:10.1186/s13321-014-0047-1) contains supplementary material, which is available to authorized users.

Collapse

Nabu S, Nantasenamat C, Owasirikul W, Lawung R, Isarankura-Na-Ayudhya C, Lapins M, Wikberg JES, Prachayasittikul V. Proteochemometric model for predicting the inhibition of penicillin-binding proteins. J Comput Aided Mol Des 2014;29:127-41. [DOI: 10.1007/s10822-014-9809-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2014] [Accepted: 10/21/2014] [Indexed: 12/17/2022]

Mateus A, Matsson P, Artursson P. Rapid Measurement of Intracellular Unbound Drug Concentrations. Mol Pharm 2013;10:2467-78. [DOI: 10.1021/mp4000822] [Citation(s) in RCA: 119] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Gao J, Huang Q, Wu D, Zhang Q, Zhang Y, Chen T, Liu Q, Zhu R, Cao Z, He Y. Study on human GPCR–inhibitor interactions by proteochemometric modeling. Gene 2013;518:124-31. [DOI: 10.1016/j.gene.2012.11.061] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2012] [Accepted: 11/27/2012] [Indexed: 11/15/2022]

Eklund M, Norinder U, Boyer S, Carlsson L. Benchmarking Variable Selection in QSAR. Mol Inform 2012;31:173-9. [DOI: 10.1002/minf.201100142] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Accepted: 12/28/2011] [Indexed: 11/07/2022]

Mining of miRNAs and potential targets from gene oriented clusters of transcripts sequences of the anti-malarial plant, Artemisia annua. Biotechnol Lett 2011;34:737-45. [DOI: 10.1007/s10529-011-0808-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Accepted: 11/21/2011] [Indexed: 12/11/2022]

Kido Y, Matsson P, Giacomini KM. Profiling of a prescription drug library for potential renal drug-drug interactions mediated by the organic cation transporter 2. J Med Chem 2011;54:4548-58. [PMID: 21599003 DOI: 10.1021/jm2001629] [Citation(s) in RCA: 116] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Spjuth O, Eklund M, Lapins M, Junaid M, Wikberg JES. Services for prediction of drug susceptibility for HIV proteases and reverse transcriptases at the HIV drug research centre. Bioinformatics 2011;27:1719-20. [PMID: 21493651 DOI: 10.1093/bioinformatics/btr192] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open

Wang L, Chu F. Extracting very simple diagnostic rules from microarray data. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2010;2010:807-10. [PMID: 21096115 DOI: 10.1109/iembs.2010.5626565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Lapins M, Wikberg JE. Kinome-wide interaction modelling using alignment-based and alignment-independent approaches for kinase description and linear and non-linear data analysis techniques. BMC Bioinformatics 2010;11:339. [PMID: 20569422 PMCID: PMC2910025 DOI: 10.1186/1471-2105-11-339] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2009] [Accepted: 06/22/2010] [Indexed: 01/03/2023] Open

Abstract

BACKGROUND

Protein kinases play crucial roles in cell growth, differentiation, and apoptosis. Abnormal function of protein kinases can lead to many serious diseases, such as cancer. Kinase inhibitors have potential for treatment of these diseases. However, current inhibitors interact with a broad variety of kinases and interfere with multiple vital cellular processes, which causes toxic effects. Bioinformatics approaches that can predict inhibitor-kinase interactions from the chemical properties of the inhibitors and the kinase macromolecules might aid in design of more selective therapeutic agents, that show better efficacy and lower toxicity.

RESULTS

We applied proteochemometric modelling to correlate the properties of 317 wild-type and mutated kinases and 38 inhibitors (12,046 inhibitor-kinase combinations) to the respective combination's interaction dissociation constant (Kd). We compared six approaches for description of protein kinases and several linear and non-linear correlation methods. The best performing models encoded kinase sequences with amino acid physico-chemical z-scale descriptors and used support vector machines or partial least- squares projections to latent structures for the correlations. Modelling performance was estimated by double cross-validation. The best models showed high predictive ability; the squared correlation coefficient for new kinase-inhibitor pairs ranging P2 = 0.67-0.73; for new kinases it ranged P2kin = 0.65-0.70. Models could also separate interacting from non-interacting inhibitor-kinase pairs with high sensitivity and specificity; the areas under the ROC curves ranging AUC = 0.92-0.93. We also investigated the relationship between the number of protein kinases in the dataset and the modelling results. Using only 10% of all data still a valid model was obtained with P2 = 0.47, P2kin = 0.42 and AUC = 0.83.

CONCLUSIONS

Our results strongly support the applicability of proteochemometrics for kinome-wide interaction modelling. Proteochemometrics might be used to speed-up identification and optimization of protein kinase targeted and multi-targeted inhibitors.

Collapse

Van Westen GJP, Wegner JK, IJzerman AP, Van Vlijmen HWT, Bender A. Molecular bioactivity extrapolation to novel targets by support vector machines. J Cheminform 2010. [PMCID: PMC2867134 DOI: 10.1186/1758-2946-2-s1-o3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Hammerling U, Tallsjö A, Grafström R, Ilbäck NG. Comparative Hazard Characterization in Food Toxicology. Crit Rev Food Sci Nutr 2009;49:626-69. [DOI: 10.1080/10408390802145617] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]

Jacob L, Hoffmann B, Stoven V, Vert JP. Virtual screening of GPCRs: an in silico chemogenomics approach. BMC Bioinformatics 2008;9:363. [PMID: 18775075 PMCID: PMC2553090 DOI: 10.1186/1471-2105-9-363] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2008] [Accepted: 09/06/2008] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The G-protein coupled receptor (GPCR) superfamily is currently the largest class of therapeutic targets. In silico prediction of interactions between GPCRs and small molecules in the transmembrane ligand-binding site is therefore a crucial step in the drug discovery process, which remains a daunting task due to the difficulty to characterize the 3D structure of most GPCRs, and to the limited amount of known ligands for some members of the superfamily. Chemogenomics, which attempts to characterize interactions between all members of a target class and all small molecules simultaneously, has recently been proposed as an interesting alternative to traditional docking or ligand-based virtual screening strategies.

RESULTS

We show that interaction prediction in the chemogenomics framework outperforms state-of-the-art individual ligand-based methods in accuracy both for receptor with known ligands and without known ligands. This is done with no knowledge of the receptor 3D structure. In particular we are able to predict ligands of orphan GPCRs with an estimated accuracy of 78.1%.

CONCLUSION

We propose new methods for in silico chemogenomics and validate them on the virtual screening of GPCRs. The methods represent an extension of a recently proposed machine learning strategy, based on support vector machines (SVM), which provides a flexible framework to incorporate various information sources on the biological space of targets and on the chemical space of small molecules. We investigate the use of 2D and 3D descriptors for small molecules, and test a variety of descriptors for GPCRs. We show that incorporating information about the known hierarchical classification of the target family and about key residues in their inferred binding pockets significantly improves the prediction accuracy of our model.

Collapse

Eklund M, Spjuth O, Wikberg JE. The C1C2: a framework for simultaneous model selection and assessment. BMC Bioinformatics 2008;9:360. [PMID: 18761753 PMCID: PMC2556350 DOI: 10.1186/1471-2105-9-360] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2008] [Accepted: 09/02/2008] [Indexed: 11/12/2022] Open

Abstract

Background

There has been recent concern regarding the inability of predictive modeling approaches to generalize to new data. Some of the problems can be attributed to improper methods for model selection and assessment. Here, we have addressed this issue by introducing a novel and general framework, the C¹C², for simultaneous model selection and assessment. The framework relies on a partitioning of the data in order to separate model choice from model assessment in terms of used data. Since the number of conceivable models in general is vast, it was also of interest to investigate the employment of two automatic search methods, a genetic algorithm and a brute-force method, for model choice. As a demonstration, the C¹C²was applied to simulated and real-world datasets. A penalized linear model was assumed to reasonably approximate the true relation between the dependent and independent variables, thus reducing the model choice problem to a matter of variable selection and choice of penalizing parameter. We also studied the impact of assuming prior knowledge about the number of relevant variables on model choice and generalization error estimates. The results obtained with the C¹C²were compared to those obtained by employing repeated K-fold cross-validation for choosing and assessing a model.

Results

The C¹C²framework performed well at finding the true model in terms of choosing the correct variable subset and producing reasonable choices for the penalizing parameter, even in situations when the independent variables were highly correlated and when the number of observations was less than the number of variables. The C¹C²framework was also found to give accurate estimates of the generalization error. Prior information about the number of important independent variables improved the variable subset choice but reduced the accuracy of generalization error estimates. Using the genetic algorithm worsened the model choice but not the generalization error estimates, compared to using the brute-force method. The results obtained with repeated K-fold cross-validation were similar to those produced by the C¹C²in terms of model choice, however a lower accuracy of the generalization error estimates was observed.

Conclusion

The C¹C²framework was demonstrated to work well for finding the true model within a penalized linear model class and accurately assess its generalization error, even for datasets with many highly correlated independent variables, a low observation-to-variable ratio, and model assumption deviations. A complete separation of the model choice and the model assessment in terms of data used for each task improves the estimates of the generalization error.

Collapse

Davies MN, Secker A, Halling-Brown M, Moss DS, Freitas AA, Timmis J, Clark E, Flower DR. GPCRTree: online hierarchical classification of GPCR function. BMC Res Notes 2008;1:67. [PMID: 18717986 PMCID: PMC2547103 DOI: 10.1186/1756-0500-1-67] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2008] [Accepted: 08/21/2008] [Indexed: 11/25/2022] Open

Lapins M, Eklund M, Spjuth O, Prusis P, Wikberg JES. Proteochemometric modeling of HIV protease susceptibility. BMC Bioinformatics 2008;9:181. [PMID: 18402661 PMCID: PMC2375133 DOI: 10.1186/1471-2105-9-181] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2007] [Accepted: 04/10/2008] [Indexed: 11/10/2022] Open

Kontijevskis A, Wikberg JES, Komorowski J. Computational proteomics analysis of HIV-1 protease interactome. Proteins 2007;68:305-12. [PMID: 17427231 DOI: 10.1002/prot.21415] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Strachan RT, Ferrara G, Roth BL. Screening the receptorome: an efficient approach for drug discovery and target validation. Drug Discov Today 2006;11:708-16. [PMID: 16846798 DOI: 10.1016/j.drudis.2006.06.012] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2006] [Revised: 06/02/2006] [Accepted: 06/16/2006] [Indexed: 11/18/2022]

Kroeze WK, Roth BL. Screening the receptorome. J Psychopharmacol 2006;20:41-6. [PMID: 16785269 DOI: 10.1177/1359786806066045] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Soeria-Atmadja D, Wallman M, Björklund AK, Isaksson A, Hammerling U, Gustafsson MG. External cross-validation for unbiased evaluation of protein family detectors: application to allergens. Proteins 2006;61:918-25. [PMID: 16231294 DOI: 10.1002/prot.20656] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Roth BL. Receptor systems: will mining the receptorome yield novel targets for pharmacotherapy? Pharmacol Ther 2006;108:59-64. [PMID: 16083965 DOI: 10.1016/j.pharmthera.2005.06.013] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2005] [Accepted: 06/23/2005] [Indexed: 10/25/2022]

Gustafsson MG. Independent Component Analysis Yields Chemically Interpretable Latent Variables in Multivariate Regression. J Chem Inf Model 2005;45:1244-55. [PMID: 16180901 DOI: 10.1021/ci050146n] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]