51
|
Bhhatarai B, Gramatica P. Per- and Polyfluoro Toxicity (LC50 Inhalation) Study in Rat and Mouse Using QSAR Modeling. Chem Res Toxicol 2010; 23:528-39. [DOI: 10.1021/tx900252h] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Barun Bhhatarai
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology (DBSF), University of Insubria, via JH Dunant 3, Varese 21100, Italy
| | - Paola Gramatica
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology (DBSF), University of Insubria, via JH Dunant 3, Varese 21100, Italy
| |
Collapse
|
52
|
Gramatica P. Chemometric Methods and Theoretical Molecular Descriptors in Predictive QSAR Modeling of the Environmental Behavior of Organic Pollutants. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2010. [DOI: 10.1007/978-1-4020-9783-6_12] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
53
|
Gedeck P, Kramer C, Ertl P. Computational analysis of structure-activity relationships. PROGRESS IN MEDICINAL CHEMISTRY 2010; 49:113-60. [PMID: 20855040 DOI: 10.1016/s0079-6468(10)49004-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Peter Gedeck
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Forum 1, Novartis Campus, CH-4056 Basel, Switzerland
| | | | | |
Collapse
|
54
|
Cerqueira NMFSA, Sousa SF, Fernandes PA, Ramos MJ. Virtual screening of compound libraries. Methods Mol Biol 2010; 572:57-70. [PMID: 20694685 DOI: 10.1007/978-1-60761-244-5_4] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
During the last decade, Virtual Screening (VS) has definitively established itself as an important part of the drug discovery and development process. VS involves the selection of likely drug candidates from large libraries of chemical structures by using computational methodologies, but the generic definition of VS encompasses many different methodologies. This chapter provides an introduction to the field by reviewing a variety of important aspects, including the different types of virtual screening methods, and the several steps required for a successful virtual screening campaign within a state-of-the-art approach, from target selection to postfilter application. This analysis is further complemented with a small collection important VS success stories.
Collapse
Affiliation(s)
- Nuno M F S A Cerqueira
- Theoretical and Computational Chemistry Research Group, REQUIMTE, Departamento de Química Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | | | | | | |
Collapse
|
55
|
Ma J, Tong C, Liaw A, Sheridan R, Szumiloski J, Svetnik V. Generating hypotheses about molecular structure-activity relationships (SARs) by solving an optimization problem. Stat Anal Data Min 2009. [DOI: 10.1002/sam.10040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
56
|
Song CM, Lim SJ, Tong JC. Recent advances in computer-aided drug design. Brief Bioinform 2009; 10:579-91. [PMID: 19433475 DOI: 10.1093/bib/bbp023] [Citation(s) in RCA: 152] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Modern drug discovery is characterized by the production of vast quantities of compounds and the need to examine these huge libraries in short periods of time. The need to store, manage and analyze these rapidly increasing resources has given rise to the field known as computer-aided drug design (CADD). CADD represents computational methods and resources that are used to facilitate the design and discovery of new therapeutic solutions. Digital repositories, containing detailed information on drugs and other useful compounds, are goldmines for the study of chemical reactions capabilities. Design libraries, with the potential to generate molecular variants in their entirety, allow the selection and sampling of chemical compounds with diverse characteristics. Fold recognition, for studying sequence-structure homology between protein sequences and structures, are helpful for inferring binding sites and molecular functions. Virtual screening, the in silico analog of high-throughput screening, offers great promise for systematic evaluation of huge chemical libraries to identify potential lead candidates that can be synthesized and tested. In this article, we present an overview of the most important data sources and computational methods for the discovery of new molecular entities. The workflow of the entire virtual screening campaign is discussed, from data collection through to post-screening analysis.
Collapse
Affiliation(s)
- Chun Meng Song
- Institute for Infocomm Research, Connexis South Tower, Singapore 138632
| | | | | |
Collapse
|
57
|
Kramer C, Tautermann CS, Livingstone DJ, Salt DW, Whitley DC, Beck B, Clark T. Sharpening the toolbox of computational chemistry: a new approximation of critical f-values for multiple linear regression. J Chem Inf Model 2009; 49:28-34. [PMID: 19105731 DOI: 10.1021/ci800318q] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Multiple linear regression is a major tool in computational chemistry. Although it has been used for more than 30 years, it has only recently been noted within the cheminformatics community that the standard F-values used to assess the significance of the resulting models are inappropriate in situations where the variables included in a model are chosen from a large pool of descriptors, due to an effect known in the statistical literature as selection bias. We have used Monte Carlo simulations to estimate the critical F-values for many combinations of sample size (n), model size (p), and descriptor pool size (k), using stepwise regression, one of the methods most commonly used to derive linear models from large sets of molecular descriptors. The values of n, p, and k represent cases appropriate to contemporary cheminformatics data sets. A formula for general n, p, and k values has been developed from the numerical estimates that approximates the critical stepwise F-values at 90%, 95%, and 99% significance levels. This approximation reproduces both the original simulated values and an interpolation test set (within the range of the training values) with an R2 value greater than 0.995. For an extrapolation test set of cases outside the range of the training set, the approximation produced an R2 above 0.93.
Collapse
Affiliation(s)
- Christian Kramer
- Computer-Chemie-Centrum and Interdisciplinary Center for Molecular Materials, Friedrich-Alexander Universitat Erlangen-Nurnberg, Nagelsbachstrasse 52, 91052 Erlangen, Germany
| | | | | | | | | | | | | |
Collapse
|
58
|
Duchowicz P, Ocsachoque M. Quantitative Structure-Toxicity Models for Heterogeneous Aliphatic Compounds. ACTA ACUST UNITED AC 2009. [DOI: 10.1002/qsar.200860057] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
59
|
Tresadern G, Bemporad D, Howe T. A comparison of ligand based virtual screening methods and application to corticotropin releasing factor 1 receptor. J Mol Graph Model 2009; 27:860-70. [PMID: 19230731 DOI: 10.1016/j.jmgm.2009.01.003] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2008] [Revised: 01/12/2009] [Accepted: 01/14/2009] [Indexed: 11/15/2022]
Abstract
Ligand based virtual screening approaches were applied to the CRF1 receptor. We compared ECFP6 fingerprints, FTrees, Topomers, Cresset FieldScreen, ROCS OpenEye shape Tanimoto, OpenEye combo-score and OpenEye electrostatics. The 3D methods OpenEye Shape Tanimoto, combo-score and Topomers performed the best at separating actives from inactives in retrospective experiments. By virtue of their higher enrichment the same methods identified more active scaffolds. However, amongst a given number of active compounds the Cresset and OpenEye electrostatic methods contained more scaffolds and returned ranked compounds with greater diversity. A selection of the methods were employed to recommend compounds for screening in a prospective experiment. New CRF1 actives antagonists were found. The new actives contained different underlying chemical architecture to the query molecules, results indicative of successful scaffold-hopping.
Collapse
Affiliation(s)
- Gary Tresadern
- Johnson & Johnson, Pharmaceutical Research & Development, Janssen-Cilag S.A., Calle Jarama, 75, Poligono Industrial, 45007 Toledo, Spain.
| | | | | |
Collapse
|
60
|
Czodrowski P, Kriegl JM, Scheuerer S, Fox T. Computational approaches to predict drug metabolism. Expert Opin Drug Metab Toxicol 2009; 5:15-27. [DOI: 10.1517/17425250802568009] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
61
|
Abstract
Chemogenomics is a modern approach to analysis of the biological effect of a wide array of small molecule compounds on a large set of homologous receptors or other macromolecular drug targets. However, the relative productivity of the method and the extremely high-cost procedure jointly force the scientist to use additional computational tools for rational compound library design and selection. The present chapter will focus specifically on application of a predictive mapping computational technology in the context of the fundamental principles of chemogenomic approach to foster rational drug design and derive information from the simultaneous biological evaluation of multiple compounds on a set of coherent biological targets.
Collapse
|
62
|
Chen X, Liang YZ, Yuan DL, Xu QS. A modified uncorrelated linear discriminant analysis model coupled with recursive feature elimination for the prediction of bioactivity. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2009; 20:1-26. [PMID: 19343582 DOI: 10.1080/10629360902724127] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
To meet the requirements of providing accurate, robust, and interpretable prediction of bioactivity, a modified uncorrelated linear discriminant analysis (M-ULDA) model was developed. In addition, a feature selection method called recursive feature elimination (RFE), originally used for support vector machine (SVM), was introduced and modified to fit the scheme of ULDA. From the evaluation of six pharmaceutical datasets, the M-UDLA coupled with RFE showed better or comparable classification accuracy with respect to other well-studied methods such as SVM and decision trees. The RFE used for ULDA has the advantage of increasing the computational speed and provides useful insights into biochemical mechanisms related to pharmaceutical activity by significantly reducing the number of variables used for the final model.
Collapse
Affiliation(s)
- X Chen
- College of Chemistry and Chemical Engineering, Central South University, Changsha, People's Republic of China
| | | | | | | |
Collapse
|
63
|
Simmons K, Kinney J, Owens A, Kleier DA, Bloch K, Argentar D, Walsh A, Vaidyanathan G. Practical Outcomes of Applying Ensemble Machine Learning Classifiers to High-Throughput Screening (HTS) Data Analysis and Screening. J Chem Inf Model 2008; 48:2196-206. [DOI: 10.1021/ci800164u] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Affiliation(s)
- Kirk Simmons
- Simmons Consulting, 52 Windybush Way, Titusville, New Jersey 08560, DuPont Stine Haskell Research Laboratories, 1090 Elkton Road, Newark, Delaware 19711, DuPont Engineering Research and Technology, POB 80249, Wilmington, Delaware 19880, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, Sun Edge, LLC, 147 Tuckahoe Lane, Bear, Delaware 19701, and Quantum Leap Innovations, 3 Innovation Way, Suite 100, Newark, Delaware 19711
| | - John Kinney
- Simmons Consulting, 52 Windybush Way, Titusville, New Jersey 08560, DuPont Stine Haskell Research Laboratories, 1090 Elkton Road, Newark, Delaware 19711, DuPont Engineering Research and Technology, POB 80249, Wilmington, Delaware 19880, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, Sun Edge, LLC, 147 Tuckahoe Lane, Bear, Delaware 19701, and Quantum Leap Innovations, 3 Innovation Way, Suite 100, Newark, Delaware 19711
| | - Aaron Owens
- Simmons Consulting, 52 Windybush Way, Titusville, New Jersey 08560, DuPont Stine Haskell Research Laboratories, 1090 Elkton Road, Newark, Delaware 19711, DuPont Engineering Research and Technology, POB 80249, Wilmington, Delaware 19880, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, Sun Edge, LLC, 147 Tuckahoe Lane, Bear, Delaware 19701, and Quantum Leap Innovations, 3 Innovation Way, Suite 100, Newark, Delaware 19711
| | - Daniel A. Kleier
- Simmons Consulting, 52 Windybush Way, Titusville, New Jersey 08560, DuPont Stine Haskell Research Laboratories, 1090 Elkton Road, Newark, Delaware 19711, DuPont Engineering Research and Technology, POB 80249, Wilmington, Delaware 19880, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, Sun Edge, LLC, 147 Tuckahoe Lane, Bear, Delaware 19701, and Quantum Leap Innovations, 3 Innovation Way, Suite 100, Newark, Delaware 19711
| | - Karen Bloch
- Simmons Consulting, 52 Windybush Way, Titusville, New Jersey 08560, DuPont Stine Haskell Research Laboratories, 1090 Elkton Road, Newark, Delaware 19711, DuPont Engineering Research and Technology, POB 80249, Wilmington, Delaware 19880, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, Sun Edge, LLC, 147 Tuckahoe Lane, Bear, Delaware 19701, and Quantum Leap Innovations, 3 Innovation Way, Suite 100, Newark, Delaware 19711
| | - Dave Argentar
- Simmons Consulting, 52 Windybush Way, Titusville, New Jersey 08560, DuPont Stine Haskell Research Laboratories, 1090 Elkton Road, Newark, Delaware 19711, DuPont Engineering Research and Technology, POB 80249, Wilmington, Delaware 19880, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, Sun Edge, LLC, 147 Tuckahoe Lane, Bear, Delaware 19701, and Quantum Leap Innovations, 3 Innovation Way, Suite 100, Newark, Delaware 19711
| | - Alicia Walsh
- Simmons Consulting, 52 Windybush Way, Titusville, New Jersey 08560, DuPont Stine Haskell Research Laboratories, 1090 Elkton Road, Newark, Delaware 19711, DuPont Engineering Research and Technology, POB 80249, Wilmington, Delaware 19880, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, Sun Edge, LLC, 147 Tuckahoe Lane, Bear, Delaware 19701, and Quantum Leap Innovations, 3 Innovation Way, Suite 100, Newark, Delaware 19711
| | - Ganesh Vaidyanathan
- Simmons Consulting, 52 Windybush Way, Titusville, New Jersey 08560, DuPont Stine Haskell Research Laboratories, 1090 Elkton Road, Newark, Delaware 19711, DuPont Engineering Research and Technology, POB 80249, Wilmington, Delaware 19880, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, Sun Edge, LLC, 147 Tuckahoe Lane, Bear, Delaware 19701, and Quantum Leap Innovations, 3 Innovation Way, Suite 100, Newark, Delaware 19711
| |
Collapse
|
64
|
Hemalatha T, Imran PKM, Gnanamani A, Nagarajan S. Synthesis, antibacterial and antifungal activities of some N-nitroso-2,6-diarylpiperidin-4-one semicarbazones and QSAR analysis. Nitric Oxide 2008; 19:303-11. [PMID: 18700167 DOI: 10.1016/j.niox.2008.07.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2007] [Revised: 04/22/2008] [Accepted: 07/18/2008] [Indexed: 10/21/2022]
Abstract
A series of N-nitroso-2,6-diarylpiperidin-4-one semicarbazones and thiosemicarbazones were synthesized, characterized by IR, NMR and elemental analysis. All the compounds were screened for their antibacterial activity against Gram-positive bacteria Bacillus subtilis, Staphylococcus aureus and Gram-negative bacteria Escherichia coli and fungi Candida albicans. These compounds have showed moderate and very good antibacterial activity. Quantitative Structure Activity Relationship (QSAR) analysis was performed for these compounds by the application of Semiempirical calculations and molecular modeling. Different logP values were also evaluated to further the analysis.
Collapse
Affiliation(s)
- T Hemalatha
- Department of Chemistry, Annamalai University, Annamalainagar, Chidambram, Tamil Nadu 608002, India
| | | | | | | |
Collapse
|
65
|
Vogt I, Bajorath J. Design and Exploration of Target-Selective Chemical Space Representations. J Chem Inf Model 2008; 48:1389-95. [DOI: 10.1021/ci800106e] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Ingo Vogt
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology & Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstrasse 2, D-53113 Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology & Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstrasse 2, D-53113 Bonn, Germany
| |
Collapse
|
66
|
Willett P. From chemical documentation to chemoinformatics: 50 years of chemical information science. J Inf Sci 2008. [DOI: 10.1177/0165551507084631] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This paper summarizes the historical development of the discipline that is now called `chemoinformatics'. It shows how this has evolved, principally as a result of technological developments in chemistry and biology during the past decade, from long-established techniques for the modelling and searching of chemical molecules. A total of 30 papers, the earliest dating back to 1957, are briefly summarized to highlight some of the key publications and to show the development of the discipline.
Collapse
|
67
|
Livingstone DJ, Clark T, Ford MG, Hudson BD, Whitley DC. QSAR studies using the parashift system. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2008; 19:285-302. [PMID: 18484499 DOI: 10.1080/10629360802085041] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
A novel way of describing molecules in terms of their surfaces and local properties at the surfaces is described. The use of these surfaces and properties to explain chemical reactivity and model simple molecular properties has already been demonstrated. This study reports an examination of the use of these descriptions of molecules to model a simple chemical interaction (complex formation) and a diverse set of mutagens. Both of these systems have been modelled successfully and the results are discussed.
Collapse
|
68
|
Geppert H, Horváth T, Gärtner T, Wrobel S, Bajorath J. Support-Vector-Machine-Based Ranking Significantly Improves the Effectiveness of Similarity Searching Using 2D Fingerprints and Multiple Reference Compounds. J Chem Inf Model 2008; 48:742-6. [DOI: 10.1021/ci700461s] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Hanna Geppert
- Fraunhofer IAIS, Schloss Birlinghoven, D-53754 Sankt Augustin, Germany, Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany, and Institute of Computer Science III, Rheinische Friedrich-Wilhelms-Universität, Römerstr. 164, D-53117 Bonn, Germany
| | - Tamás Horváth
- Fraunhofer IAIS, Schloss Birlinghoven, D-53754 Sankt Augustin, Germany, Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany, and Institute of Computer Science III, Rheinische Friedrich-Wilhelms-Universität, Römerstr. 164, D-53117 Bonn, Germany
| | - Thomas Gärtner
- Fraunhofer IAIS, Schloss Birlinghoven, D-53754 Sankt Augustin, Germany, Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany, and Institute of Computer Science III, Rheinische Friedrich-Wilhelms-Universität, Römerstr. 164, D-53117 Bonn, Germany
| | - Stefan Wrobel
- Fraunhofer IAIS, Schloss Birlinghoven, D-53754 Sankt Augustin, Germany, Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany, and Institute of Computer Science III, Rheinische Friedrich-Wilhelms-Universität, Römerstr. 164, D-53117 Bonn, Germany
| | - Jürgen Bajorath
- Fraunhofer IAIS, Schloss Birlinghoven, D-53754 Sankt Augustin, Germany, Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany, and Institute of Computer Science III, Rheinische Friedrich-Wilhelms-Universität, Römerstr. 164, D-53117 Bonn, Germany
| |
Collapse
|
69
|
Zhou C, Nie C. Molecular Descriptors of Topology and a Study on Quantitative Structure and Property Relationships. BULLETIN OF THE CHEMICAL SOCIETY OF JAPAN 2007. [DOI: 10.1246/bcsj.80.1504] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
70
|
Cruz-Monteagudo M, González-Díaz H, Agüero-Chapín G, Santana L, Borges F, Domínguez ER, Podda G, Uriarte E. Computational chemistry development of a unified free energy Markov model for the distribution of 1300 chemicals to 38 different environmental or biological systems. J Comput Chem 2007; 28:1909-23. [PMID: 17405109 DOI: 10.1002/jcc.20730] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Predicting tissue and environmental distribution of chemicals is of major importance for environmental and life sciences. Most of the molecular descriptors used in computational prediction of chemicals partition behavior consider molecular structure but ignore the nature of the partition system. Consequently, computational models derived up-to-date are restricted to the specific system under study. Here, a free energy-based descriptor (DeltaG(k)) is introduced, which circumvent this problem. Based on DeltaG(k), we developed for the first time a single linear classification model to predict the partition behavior of a broad number of structurally diverse drugs and other chemicals (1300) for 38 different partition systems of biological and environmental significance. The model presented training/predicting set accuracies of 91.79/88.92%. Parametrical assumptions were checked. Desirability analysis was used to explore the levels of the predictors that produce the most desirable partition properties. Finally, inversion of the partition direction for each one of the 38 partition systems evidences that our models correctly classified 89.08% of compounds with an uncertainty of only +/-0.17% independently of the direction of the partition process used to seek the model. Other 10 different classification models (linear, neural networks, and genetic algorithms) were also tested for the same purposes. None of these computational models favorably compare with respect to the linear model indicating that our approach capture the main aspects that govern chemicals partition in different systems.
Collapse
Affiliation(s)
- Maykel Cruz-Monteagudo
- Physico-Chemical Molecular Research Unit, Department of Organic Chemistry, Faculty of Pharmacy, University of Porto 4050-047, Porto, Portugal
| | | | | | | | | | | | | | | |
Collapse
|
71
|
Brewer ML. Development of a spectral clustering method for the analysis of molecular data sets. J Chem Inf Model 2007; 47:1727-33. [PMID: 17636944 DOI: 10.1021/ci600565r] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A spectral clustering method is presented and applied to two-dimensional molecular structures, where it has been found particularly useful in the analysis of screening data. The method provides a means to quantify (1) the degree of intermolecular similarity within a cluster and (2) the contribution that the features of a molecule make to a cluster. In an application of the spectral clustering method to an example data set of 125 COX-2 inhibitors, these two criteria were used to place the molecules into clusters of chemically related two-dimensional structures.
Collapse
Affiliation(s)
- Mark L Brewer
- Computational Chemistry Group, Evotec (UK) Limited, 111 Milton Park, Abingdon, Oxfordshire OX14 4RZ, United Kingdom.
| |
Collapse
|
72
|
Abstract
BACKGROUND Discerning the similarity between molecules is a challenging problem in drug discovery as well as in molecular biology. The importance of this problem is due to the fact that the biochemical characteristics of a molecule are closely related to its structure. Therefore molecular similarity is a key notion in investigations targeting exploration of molecular structural space, query-retrieval in molecular databases, and structure-activity modelling. Determining molecular similarity is related to the choice of molecular representation. Currently, representations with high descriptive power and physical relevance like 3D surface-based descriptors are available. Information from such representations is both surface-based and volumetric. However, most techniques for determining molecular similarity tend to focus on idealized 2D graph-based descriptors due to the complexity that accompanies reasoning with more elaborate representations. RESULTS This paper addresses the problem of determining similarity when molecules are described using complex surface-based representations. It proposes an intrinsic, spherical representation that systematically maps points on a molecular surface to points on a standard coordinate system (a sphere). Molecular surface properties such as shape, field strengths, and effects due to field super-positioning can then be captured as distributions on the surface of the sphere. Surface-based molecular similarity is subsequently determined by computing the similarity of the surface-property distributions using a novel formulation of histogram-intersection. The similarity formulation is not only sensitive to the 3D distribution of the surface properties, but is also highly efficient to compute. CONCLUSION The proposed method obviates the computationally expensive step of molecular pose-optimisation, can incorporate conformational variations, and facilitates highly efficient determination of similarity by directly comparing molecular surfaces and surface-based properties. Retrieval performance, applications in structure-activity modeling of complex biological properties, and comparisons with existing research and commercial methods demonstrate the validity and effectiveness of the approach.
Collapse
Affiliation(s)
- Rahul Singh
- Department of Computer Science, San Francisco State University, San Francisco, CA 94132, USA.
| |
Collapse
|
73
|
Volarath P, Wang H, Fu H, Harrison R. Knowledge-based algorithms for chemical structure and property analysis. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2007; 2004:3011-4. [PMID: 17270912 DOI: 10.1109/iembs.2004.1403853] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We have successfully developed 'rule-based' algorithms that efficiently perform sub- and exact-structure searching, as well as accurately describe the chemistry of small molecules. These algorithms use a simple and concise set of rules for information extraction from molecule files. This design is intended to reduce the computational time required for the process, while improving the accuracy in the tasks. The performances of these algorithms have been successfully validated with a wide range of small molecules. Our future goal is to combine these algorithms with our newly designed knowledge-based object database, such that their tasks can be automated with a high efficiency.
Collapse
Affiliation(s)
- P Volarath
- Dept. of Chem., Georgia State Univ., Atlanta, GA, USA
| | | | | | | |
Collapse
|
74
|
Buttingsrud B, Alsberg BK, Astrand PO. Validation of critical points in the electron density as descriptors by building quantitative structure-property relationships for the atomic polar tensor. J Comput Chem 2007; 28:2130-9. [PMID: 17464968 DOI: 10.1002/jcc.20666] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
A crucial component of research in the field of quantitative structure-activity/property relationships is the identification of molecular descriptors relevant to the activity or property of interest. Descriptors based on the topology of the electron density as formulated in Bader's theory of atoms in molecules are investigated in detail in this work. In a model study, the authors investigate their ability to predict the atomic polar tensor (the gradient of the molecular dipole moment), which contains information on the vibrational intensities in infrared spectroscopy and constitutes a scheme for partitioning the total charge distribution into atomic charges. The atomic polar tensor may therefore be used to investigate whether the descriptors give adequate information on the local electronic structure in the molecule. Both the trace of the atomic polar tensor and for planar molecules its out-of-plane component may be interpreted as definitions of atomic charges suitable for prediction. Hydrogen and carbon atoms in a set of 60 aromatic compounds with various substituents have been studied. Excellent results for prediction of hydrogen and carbon charges have been achieved with cross-validated squared correlation coefficients between predicted and theoretical values varying from 0.92 and 0.977 for the most complex set of substituents when the value, Laplacian, and ellipticity of the electron density in the bond critical points are used as descriptors. The carbon charges defined from the trace of the atomic polar tensor are correlated with its out-of-plane component whereas such relationship is not observed for the hydrogen charges studied in this work.
Collapse
Affiliation(s)
- Bård Buttingsrud
- Department of Chemistry, Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway
| | | | | |
Collapse
|
75
|
Buttingsrud B, Alsberg BK, Astrand PO. Quantitative prediction of the absorption maxima of azobenzene dyes from bond lengths and critical points in the electron density. Phys Chem Chem Phys 2007; 9:2226-33. [PMID: 17487319 DOI: 10.1039/b617470a] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The relationship between the molecular electronic structure and the position of the absorption maxima in 191 azobenzene dyes has been studied by quantitative structure-property relations. A strong linearity is observed between the nitrogen-nitrogen bond length and the absorption wavelength with a squared correlation coefficient of 0.90. Bond lengths and properties of the critical points located on the electron density distribution are used to build partial least squares regression models for quantitative prediction of absorption wavelengths. Fifty of the azobenzene dyes were used as an external test set to evaluate the overall performance of the models. The simplest model where only the nitrogen-nitrogen bond length is used as a descriptor gives a root mean square error of prediction of 12.6 nm. When the value, laplacian and ellipticity of the electron density in all comparable bond critical points are used, the error of prediction is reduced to 5.4 nm. However, this model is less general and robust to prediction of novel molecular structures. It is demonstrated that the nitrogen-nitrogen bond in the azobenzene compounds relates to the colour of the dyes and in particular the nitrogen-nitrogen bond length plays a central role.
Collapse
Affiliation(s)
- Bård Buttingsrud
- Department of Chemistry, Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway
| | | | | |
Collapse
|
76
|
Salt DW, Ajmani S, Crichton R, Livingstone DJ. An Improved Approximation to the Estimation of the Critical F Values in Best Subset Regression. J Chem Inf Model 2006; 47:143-9. [PMID: 17238259 DOI: 10.1021/ci060113n] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Variable selection methods are routinely applied in regression modeling to identify a small number of descriptors which "best" explain the variation in the response variable. Most statistical packages that perform regression have some form of stepping algorithm that can be used in this identification process. Unfortunately, when a subset of p variables measured on a sample of n objects are selected from a set of k (>p) to maximize the squared sample multiple regression coefficient, the significance of the resulting regression is upwardly biased. The extent of this bias is investigated by using Monte Carlo simulation and is presented as an inflation factor which when multiplied by the usual tabulated F ratio gives an estimate of the true 5% critical value. The results show that selection bias can be very high even for moderate-size data sets. Selecting three variables from 50 generated at random with 20 observations will almost certainly provide a significant result if the usual tabulated F values are used. An interpolation formula is provided for the calculation of the inflation factor for different combinations of (n, p, k). Four real data sets are examined to illustrate the effect of correlated descriptor variables on the degree of inflation.
Collapse
Affiliation(s)
- David W Salt
- Department of Mathematics, Buckingham Building, Lion Terrace, University of Portsmouth, Portsmouth, UK
| | | | | | | |
Collapse
|
77
|
|
78
|
Gedeck P, Rohde B, Bartels C. QSAR--how good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets. J Chem Inf Model 2006; 46:1924-36. [PMID: 16995723 DOI: 10.1021/ci050413p] [Citation(s) in RCA: 107] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The quality of QSAR (Quantitative Structure-Activity Relationships) predictions depends on a large number of factors including the descriptor set, the statistical method, and the data sets used. Here we study the quality of QSAR predictions mainly as a function of the data set and descriptor type using partial least squares as the statistical modeling method. The study makes use of the fact that we have access to a large number of data sets and to a variety of different QSAR descriptors. The main conclusions are that the quality of the predictions depends both on the data set and the descriptor used. The quality of the predictions correlates positively with the size of the data set and the range of biological activities. There is no clear dependence of the quality of the predictions on the complexity of the data set. All of the descriptors tested produced useful predictions for some of the data sets. None of the descriptors is best for all data sets; it is therefore necessary to test in each individual case, which descriptor produces the best model. In our tests, 2D fragment based descriptors usually performed better than simpler descriptors based on augmented atom types. Possible reasons for these observations are discussed.
Collapse
Affiliation(s)
- Peter Gedeck
- Novartis Institutes for BioMedical Research, Novartis Horsham Research Centre, Wimblehurst Road, Horsham, West Sussex, RH12 5AB, UK.
| | | | | |
Collapse
|
79
|
Deswal S, Roy N. Quantitative structure activity relationship studies of aryl heterocycle-based thrombin inhibitors. Eur J Med Chem 2006; 41:1339-46. [PMID: 16884829 DOI: 10.1016/j.ejmech.2006.07.001] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 07/03/2006] [Accepted: 07/03/2006] [Indexed: 10/24/2022]
Abstract
A quantitative structure activity relationship (QSAR) analysis has been performed on a data set of 42 aryl heterocycle-based thrombin inhibitors. Several types of descriptors including topological, spatial, thermodynamic, information content and E-state indices were used to derive a quantitative relationship between the anti thrombin activity and structural properties. Genetic algorithm based genetic function approximation method of variable selection was used to generate the model. Best model was developed when number of descriptors in the equation was set to five. Highly statistically significant model was obtained with atom type logP descriptors, logP and Shadow_YZ. The model is not only able to predict the activity of new compounds but also explained the important regions in the molecules in a quantitative manner.
Collapse
Affiliation(s)
- Sumit Deswal
- Pharmacoinformatics division National Institute of Pharmaceutical Education and Research, Sector 67, Phase X, 160062 SAS Nagar, Punjab, India
| | | |
Collapse
|
80
|
Eckert H, Bajorath J. Design and Evaluation of a Novel Class-Directed 2D Fingerprint to Search for Structurally Diverse Active Compounds. J Chem Inf Model 2006; 46:2515-26. [PMID: 17125192 DOI: 10.1021/ci600303b] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Recent attempts to increase similarity search performance using molecular fingerprints have mostly focused on the evaluation of alternative similarity metrics or scoring schemes, rather than the development of new types of fingerprints. Here, we introduce a novel 2D fingerprint design (property descriptor value range-derived fingerprint or PDR-FP) that involves activity-oriented selection of property descriptors and the transformation of descriptor value ranges into a binary format such that each fingerprint bit position represents a specific value interval. The design is tailored toward multiple-template similarity searching and permits training on specific activity classes. In search calculations on 15 compound classes of increasing structural diversity, the PDR fingerprint performed better than other state-of-the-art 2D fingerprints. Among the structurally diverse classes were six compound sets with peptide character, which represent a notoriously difficult chemotype for 2D similarity searching. In these cases, PDR-FP produced promising results, whereas other fingerprint methods mostly failed. PDR-FP is specifically designed for search calculations on structurally diverse compounds, and these calculations are not influenced by molecular size effects, which represent a general problem for similarity searching using bit string representations.
Collapse
Affiliation(s)
- Hanna Eckert
- Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany
| | | |
Collapse
|
81
|
Eckert H, Vogt I, Bajorath J. Mapping Algorithms for Molecular Similarity Analysis and Ligand-Based Virtual Screening: Design of DynaMAD and Comparison with MAD and DMC. J Chem Inf Model 2006; 46:1623-34. [PMID: 16859294 DOI: 10.1021/ci060083o] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Here, we introduce the DynaMAD algorithm that is designed to map database compounds to combinations of activity-class-dependent descriptor value ranges in order to identify novel active molecules. The method combines and extends key features of two previously developed algorithms, MAD and DMC. These methods were first described as compound-mapping algorithms for large-scale virtual screening applications. DynaMAD and DMC operate in chemical spaces of stepwise increasing dimensionality. However, in contrast to DMC, which utilizes binary transformed descriptors, DynaMAD uses unmodified descriptor value distributions. The performance of these mapping methods was compared in detail in virtual screening trials on 24 different compound activity classes against a background of about 2 million database compounds. In these calculations, all three approaches produced results of considerable predictive value, and the enrichment of active molecules in small selection sets consisting of only about 20 or fewer database compounds emerged as a common feature. Furthermore, mapping methods were capable of recognizing remote molecular similarity relationships. Overall, DynaMAD performed better than MAD and DMC, producing average hit and recovery rates of 55% and 33%, respectively, over all 24 classes. Taken together, our findings suggest that dynamic compound mapping to combinations of activity-class-selective descriptor settings has significant potential for molecular similarity analysis and ligand-based virtual screening.
Collapse
Affiliation(s)
- Hanna Eckert
- Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany
| | | | | |
Collapse
|
82
|
Guo W, Hu X, Chu N, Yin C. Quantitative structure–activity relationship studies on HEPTs by supervised stochastic resonance. Bioorg Med Chem Lett 2006; 16:2855-9. [PMID: 16574414 DOI: 10.1016/j.bmcl.2006.03.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2005] [Revised: 02/20/2006] [Accepted: 03/07/2006] [Indexed: 11/26/2022]
Abstract
Quantitative structure-activity relationship studies (QSAR) on HEPTs were performed by using a new approach--supervised stochastic resonance (SSR) in this paper. Errors in physicochemical properties have great effects on variable selection and the predictive capability of QSAR models but errors-in-variables were seldom discussed in QSAR. In this paper, based on the theory of stochastic resonance (SR), SSR was proposed and employed to the problem. In SSR, errors and abundant variables were regarded as noise and the relevant descriptors as signals. In the nonlinear systems involved in the SR, the signal and the noise interact harmonically and the signal was consequently enhanced. Therefore, the correlation between the relevant variables and a specified activity of a series molecule was improved by SSR. It is demonstrated that the obtained QSAR models for HEPT analogues by SSR were comparable to those by published methods in their stability and predictivity. SSR is an efficient and promising approach to QSAR studies.
Collapse
Affiliation(s)
- Weimin Guo
- School of Environmental Science and Technology, Shanghai Jiao Tong University, Shanghai 200240, PR China.
| | | | | | | |
Collapse
|
83
|
|
84
|
Eckert H, Bajorath J. Determination and Mapping of Activity-Specific Descriptor Value Ranges for the Identification of Active Compounds. J Med Chem 2006; 49:2284-93. [PMID: 16570925 DOI: 10.1021/jm051110p] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
MAD (Mapping to Activity class-specific Descriptor value ranges) is a novel molecular similarity method that relies on the identification of activity-specific descriptors. Applying a categorical descriptor scoring function, value ranges of molecular descriptors in screening databases are compared with those in classes of active compounds and descriptors displaying significant deviations are selected. In order to identify new actives, database molecules are mapped to class-specific value ranges and ranked using a similarity function. As a mapping algorithm, MAD is distinct from many other molecular similarity and virtual screening methods. In systematic virtual screening trials, for small selection sets of only 30 database compounds, average hit and recovery rates over six activity classes ranged from about 10% to 25% and about 25% to 75%, respectively. Moreover, when mining a database of bioactive molecules many similar compounds were selected (with hit rates between about 15% and 79%). Our findings suggest that it is possible to generate compound class-directed descriptor reference spaces for molecular similarity analysis.
Collapse
Affiliation(s)
- Hanna Eckert
- Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstrasse 2, D-53113 Bonn, Germany
| | | |
Collapse
|
85
|
Oloff S, Zhang S, Sukumar N, Breneman C, Tropsha A. Chemometric analysis of ligand receptor complementarity: identifying Complementary Ligands Based on Receptor Information (CoLiBRI). J Chem Inf Model 2006; 46:844-51. [PMID: 16563016 PMCID: PMC2755506 DOI: 10.1021/ci050065r] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We have developed a novel structure-based chemoinformatics approach to search for Complimentary Ligands Based on Receptor Information (CoLiBRI). CoLiBRI is based on the representation of both receptor binding sites and their respective ligands in a space of universal chemical descriptors. The binding site atoms involved in the interaction with ligands are identified by the means of a computational geometry technique known as Delaunay tessellation as applied to X-ray characterized ligand-receptor complexes. TAE/RECON multiple chemical descriptors are calculated independently for each ligand as well as for its active site atoms. The representation of both ligands and active sites using chemical descriptors allows the application of well-known chemometric techniques in order to correlate chemical similarities between active sites and their respective ligands. We have established a protocol to map patterns of nearest neighbor active site vectors in a multidimensional TAE/RECON space onto those of their complementary ligands and vice versa. This protocol affords the prediction of a virtual complementary ligand vector in the ligand chemical space from the position of a known active site vector. This prediction is followed by chemical similarity calculations between this virtual ligand vector and those calculated for molecules in a chemical database to identify real compounds most similar to the virtual ligand. Consequently, the knowledge of the receptor active site structure affords straightforward and efficient identification of its complementary ligands in large databases of chemical compounds using rapid chemical similarity searches. Conversely, starting from the ligand chemical structure, one may identify possible complementary receptor cavities as well. We have applied the CoLiBRI approach to a data set of 800 X-ray characterized ligand-receptor complexes in the PDBbind database. Using a k nearest neighbor (kNN) pattern recognition approach and variable selection, we have shown that knowledge of the active site structure affords identification of its complimentary ligand among the top 1% of a large chemical database in over 90% of all test active sites when a binding site of the same protein family was present in the training set. In the case where test receptors are highly dissimilar and not present among the receptor families in the training set, the prediction accuracy is decreased; however, CoLiBRI was still able to quickly eliminate 75% of the chemical database as improbable ligands. CoLiBRI affords rapid prefiltering of a large chemical database to eliminate compounds that have little chance of binding to a receptor active site.
Collapse
Affiliation(s)
- Scott Oloff
- Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | | | | | | | | |
Collapse
|
86
|
Mwense M, Wang XZ, Buontempo FV, Horan N, Young A, Osborn D. QSAR approach for mixture toxicity prediction using independent latent descriptors and fuzzy membership functions. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2006; 17:53-73. [PMID: 16513552 DOI: 10.1080/10659360600562202] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
The principle of using a singe model to predict the toxicity of mixtures of chemicals based on the characterisation of the degrees of similarity and dissimilarity of the constituent chemicals using descriptors has been demonstrated in a previous work. The current study introduces a feature extraction technique, independent component analysis, to the method to remove the correlations and dependencies between descriptors and reduce the dimension prior to similarity and dissimilarity calculations. In addition, a goal attainment multi-objective optimisation technique is used for the determination of the fuzzy membership function parameters. For three mixtures, which include a new mixture and two previously studied mixtures that all inhibit reproduction (via different mechanisms of action) in green freshwater algae scenedesmus vacuolatus, the approach showed better or equivalent prediction performance than either concentration addition or independent action models. Unlike QSARs for pure compounds that require large collections of data, the new approach for mixtures only requires one mixture at a particular composition to determine the necessary fuzzy membership function parameter values. These values can then be used to predict the toxicity of the mixture at any other compositions. This could potentially lead to a reduction in the frequency of bioassay tests. Use of the fuzzy membership functions and parameter values obtained for one mixture when used to predict the toxicity of a completely different mixture is also tested and it is found that the approach also gives prediction results with good accuracy.
Collapse
Affiliation(s)
- M Mwense
- School of Process, Environmental and Materials Engineering, Institute of Particle Science and Engineering
| | | | | | | | | | | |
Collapse
|
87
|
Willighagen EL, Denissen HMGW, Wehrens R, Buydens LMC. On the Use of 1H and 13C 1D NMR Spectra as QSPR Descriptors. J Chem Inf Model 2006; 46:487-94. [PMID: 16562976 DOI: 10.1021/ci050282s] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Recently, 1D NMR and IR spectra have been proposed as descriptors containing 3D information. And, as such, said to be suitable for making QSAR and QSPR models where 3D molecular geometries matter, for example, in binding affinities. This paper presents a study on the predictive power of 1D NMR spectra-based QSPR models using simulated proton and carbon 1D NMR spectra. It shows that the spectra-based models are outperformed by models based on theoretical molecular descriptors and that spectra-based models are not easy to interpret. We therefore conclude that the use of such NMR spectra offers no added value.
Collapse
Affiliation(s)
- E L Willighagen
- Institute for Molecules and Materials, Radboud University Nijmegen, Toernooiveld 1, NL-6525 ED Nijmegen, The Netherlands
| | | | | | | |
Collapse
|
88
|
Refsgaard HHF, Jensen BF, Christensen IT, Hagen N, Brockhoff PB. In silico prediction of cytochrome P450 inhibitors. Drug Dev Res 2006. [DOI: 10.1002/ddr.20108] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
89
|
Helma C, Cramer T, Kramer S, De Raedt L. Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. ACTA ACUST UNITED AC 2005; 44:1402-11. [PMID: 15272848 DOI: 10.1021/ci034254q] [Citation(s) in RCA: 144] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
This paper explores the utility of data mining and machine learning algorithms for the induction of mutagenicity structure-activity relationships (SARs) from noncongeneric data sets. We compare (i) a newly developed algorithm (MOLFEA) for the generation of descriptors (molecular fragments) for noncongeneric compounds with traditional SAR approaches (molecular properties) and (ii) different machine learning algorithms for the induction of SARs from these descriptors. In addition we investigate the optimal parameter settings for these programs and give an exemplary interpretation of the derived models. The predictive accuracies of models using MOLFEA derived descriptors is approximately 10-15%age points higher than those using molecular properties alone. Using both types of descriptors together does not improve the derived models. From the applied machine learning techniques the rule learner PART and support vector machines gave the best results, although the differences between the learning algorithms are only marginal. We were able to achieve predictive accuracies up to 78% for 10-fold cross-validation. The resulting models are relatively easy to interpret and usable for predictive as well as for explanatory purposes.
Collapse
Affiliation(s)
- Christoph Helma
- Institute for Computer Science, Machine Learning Lab, University Freiburg, Georges Köhler Allee 79, D-79110 Freiburg/Br., Germany.
| | | | | | | |
Collapse
|
90
|
Mwense M, Wang XZ, Buontempo FV, Horan N, Young A, Osborn D. Prediction of noninteractive mixture toxicity of organic compounds based on a fuzzy set method. ACTA ACUST UNITED AC 2005; 44:1763-73. [PMID: 15446835 DOI: 10.1021/ci0499368] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Current methods for the prediction of mixture toxicity have shown to be valid for mixtures that conform to some assumptions that were ideally formulated for mixtures comprising constituents exhibiting either completely similar or dissimilar mechanisms of action. Approaches are needed that predict the toxicity of mixtures representative of real environmental occurrences i.e., those comprising constituents of mixed similar and dissimilar compounds and therefore are more complex. In this paper such a methodology is proposed which uses molecular descriptors and fuzzy set theory to characterize the degree of similarity and dissimilarity of mixture constituents, integrates the concentration addition and independent action models, and therefore is called INFCIM (INtegrated Fuzzy Concentration addition--Independent action Model). INFCIM is tested in two case studies using toxicity data of four mixtures, and its performance is compared against those of both concentration addition and independent action models. Mixture 1 consists of 18 s-triazines acting on green freshwater algae scenedemus vacuolatus. Mixture 2 comprises 16 acting constituents tested on scenedemus vacuolatus. Both mixtures inhibit reproduction in the biological assays. There are 10 quinolone compounds in mixture 3 and 16 phenol derivative compounds in mixture 4 all causing long-term inhibition of bioluminescence in the marine bacterium Vibrio fischeri. It was shown that INFCIM performed comparably or better than the best performing existing model in the original studies for all the mixtures tested.
Collapse
Affiliation(s)
- Mulaisho Mwense
- Department of Chemical Engineering and School of Civil Engineering, The University of Leeds, Leeds LS2 9JT, U.K
| | | | | | | | | | | |
Collapse
|
91
|
Xue Y, Li ZR, Yap CW, Sun LZ, Chen X, Chen YZ. Effect of molecular descriptor feature selection in support vector machine classification of pharmacokinetic and toxicological properties of chemical agents. ACTA ACUST UNITED AC 2005; 44:1630-8. [PMID: 15446820 DOI: 10.1021/ci049869h] [Citation(s) in RCA: 116] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Statistical-learning methods have been developed for facilitating the prediction of pharmacokinetic and toxicological properties of chemical agents. These methods employ a variety of molecular descriptors to characterize structural and physicochemical properties of molecules. Some of these descriptors are specifically designed for the study of a particular type of properties or agents, and their use for other properties or agents might generate noise and affect the prediction accuracy of a statistical learning system. This work examines to what extent the reduction of this noise can improve the prediction accuracy of a statistical learning system. A feature selection method, recursive feature elimination (RFE), is used to automatically select molecular descriptors for support vector machines (SVM) prediction of P-glycoprotein substrates (P-gp), human intestinal absorption of molecules (HIA), and agents that cause torsades de pointes (TdP), a rare but serious side effect. RFE significantly reduces the number of descriptors for each of these properties thereby increasing the computational speed for their classification. The SVM prediction accuracies of P-gp and HIA are substantially increased and that of TdP remains unchanged by RFE. These prediction accuracies are comparable to those of earlier studies derived from a selective set of descriptors. Our study suggests that molecular feature selection is useful for improving the speed and, in some cases, the accuracy of statistical learning methods for the prediction of pharmacokinetic and toxicological properties of chemical agents.
Collapse
Affiliation(s)
- Y Xue
- Department of Computational Science, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
| | | | | | | | | | | |
Collapse
|
92
|
Fechner U, Paetz J, Schneider G. Comparison of Three Holographic Fingerprint Descriptors and their Binary Counterparts. ACTA ACUST UNITED AC 2005. [DOI: 10.1002/qsar.200530118] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
93
|
Bender A, Glen RC. A Discussion of Measures of Enrichment in Virtual Screening: Comparing the Information Content of Descriptors with Increasing Levels of Sophistication. J Chem Inf Model 2005; 45:1369-75. [PMID: 16180913 DOI: 10.1021/ci0500177] [Citation(s) in RCA: 125] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We have performed virtual screening using some very simple features, by employing the number of atoms per element as molecular descriptors but without regard to any structural information whatsoever. Surprisingly, these atom counts are able to outperform virtual-affinity-based fingerprints and Unity fingerprints in some activity classes. Although molecular weight and other biases were known in target-based virtual screening settings (docking), we report the effect of using very simple descriptors for ligand-based virtual screening, by using clearly defined biological targets and employing a large data set (>100,000 compounds) containing multiple (11) activity classes. Structure-unaware atom count vectors as descriptors in combination with the Euclidean distance measure are able to achieve "enrichment factors" over random selection of around 4 (depending on the particular class of active compounds), putting the enrichment factors reported for more sophisticated virtual screening methods in a different light. They are also able to retrieve active compounds with novel scaffolds instead of merely the expected structural analogues. The added value of many currently used virtual screening methods (calculated as enrichment factors) drops down to a factor of between 1 and 2, instead of often reported double-digit figures. The observed effect is much less profound for simple descriptors such as molecular weight and is only present in cases of atypical (larger) ligands. The current state of virtual screening is not as sophisticated as might be expected, which is due to descriptors still not being able to capture structural properties relevant to binding. This fact can partly be explained by highly nonlinear structure-activity relationships, which represent a severe limitation of the "similar property principle" in the context of bioactivity.
Collapse
Affiliation(s)
- Andreas Bender
- Unilever Centre for Molecular Science Informatics, Chemistry Department, University of Cambridge, Cambridge CB2 1EW, United Kingdom
| | | |
Collapse
|
94
|
Olah M, Bologa C, Oprea TI. An automated PLS search for biologically relevant QSAR descriptors. J Comput Aided Mol Des 2005; 18:437-49. [PMID: 15729845 DOI: 10.1007/s10822-004-4060-8] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
An automated PLS engine, WB-PLS, was applied to 1632 QSAR series with at least 25 compounds per series extracted from WOMBAT (WOrld of Molecular BioAcTivity). WB-PLS extracts a single Y variable per series, as well as pre-computed X variables from a table. The table contained 2D descriptors, the drug-like MDL 320 keys as implemented in the Mesa A&C Fingerprint module, and in-house generated topological-pharmacophore SMARTS counts and fingerprints. Each descriptor type was treated as a block, with or without scaling. Cross-validation, variable importance on projections (VIP) above 0.8 and q2 > or = 0.3 were applied for model significance. Among cross-validation methods, leave-one-in-seven-out (CV7) is a better measure of model significance, compared to leave-one-out (measuring redundancy) and leave-half-out (too restrictive). SMARTS counts overlap with 2D descriptors (having a more quantitative nature), whereas MDL keys overlap with in-house fingerprints (both are more qualitative). The SMARTS counts is the most effective descriptor system, when compared to the other three. At the individual level, size-related descriptors and topological indices (in the 2D property space), and branched SMARTS, aromatic and ring atom types and halogens are found to be most relevant according to the VIP criterion.
Collapse
Affiliation(s)
- Marius Olah
- Division of Biocomputing, University of New Mexico School of Medicine, I University of New Mexico, MSC08 4560, Albuquerque, NM 87131, USA
| | | | | |
Collapse
|
95
|
Farkas O, Héberger K. Comparison of Ridge Regression, Partial Least-Squares, Pairwise Correlation, Forward- and Best Subset Selection Methods for Prediction of Retention Indices for Aliphatic Alcohols. J Chem Inf Model 2005; 45:339-46. [PMID: 15807497 DOI: 10.1021/ci049827t] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A quantitative structure-retention relationship (QSRR) study based on multiple linear regression (MLR) was performed for the description and prediction of Kováts retention indices (RI) of alcohol compounds. Alcohols were of saturated, linear or branched types and contained a hydroxyl group on the primary, secondary or tertiary carbon atoms. Constitutive and weighted holistic invariant molecular (WHIM) descriptors were used to represent the structure of alcohols in the MLR models. Before the model building, five variable selection methods were applied to select the most relevant variables from a large set of descriptors, respectively. The selected molecular properties were included into the MLR models. The efficiency of the variable selection methods was also compared. The selection methods were as follows: ridge regression (RR), partial least-squares method (PLS), pair-correlation method (PCM), forward selection (FS) and best subset selection (BSS). The stability and the validity of the MLR models were tested by a cross-validation technique using a leave-n-out technique. Neither RR nor PLS selected variables were able to describe the Kováts retention index properly, and PCM gave reliable results in the description but not for prediction. We built models with good predicting ability using FS and BSS as a selection method. The most relevant variables in the description and prediction of RIs were the mean electrotopological state index, the molecular mass, and WHIM indices characterizing size and shape.
Collapse
Affiliation(s)
- Orsolya Farkas
- Institute of Chemistry, Chemical Research Center, Hungarian Academy of Sciences, H-1525 Budapest, P.O. Box 17, Hungary.
| | | |
Collapse
|
96
|
Sheridan RP, Feuston BP, Maiorov VN, Kearsley SK. Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSAR. ACTA ACUST UNITED AC 2004; 44:1912-28. [PMID: 15554660 DOI: 10.1021/ci049782w] [Citation(s) in RCA: 179] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
How well can a QSAR model predict the activity of a molecule not in the training set used to create the model? A set of retrospective cross-validation experiments using 20 diverse in-house activity sets were done to find a good discriminator of prediction accuracy as measured by root-mean-square difference between observed and predicted activity. Among the measures we tested, two seem useful: the similarity of the molecule to be predicted to the nearest molecule in the training set and/or the number of neighbors in the training set, where neighbors are those more similar than a user-chosen cutoff. The molecules with the highest similarity and/or the most neighbors are the best-predicted. This trend holds true for narrow training sets and, to a lesser degree, for many diverse training sets and does not depend on which QSAR method or descriptor is used. One may define the similarity using a different descriptor than that used for the QSAR model. The similarity dependence for diverse training sets is somewhat unexpected. It appears to be greater for those data sets where the association of similar activities vs similar structures (as encoded in the Patterson plot) is stronger. We propose a way to estimate the reliability of the prediction of an arbitrary chemical structure on a given QSAR model, given the training set from which the model was derived.
Collapse
Affiliation(s)
- Robert P Sheridan
- Molecular Systems Department, RY50S-100 Merck Research Laboratories, Rahway, New Jersey 07065, USA.
| | | | | | | |
Collapse
|
97
|
Lewis DFV. Quantitative structure-activity relationships (QSARs) within the cytochrome P450 system: QSARs describing substrate binding, inhibition and induction of P450s. Inflammopharmacology 2004; 11:43-73. [PMID: 15035734 DOI: 10.1163/156856003321547112] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Quantitative structure-activity relationships (QSARs) within substrates, inducers and inhibitors of cytochromes P450 involved in xenobiotic metabolism are reported, together with QSARs associated with induction, inhibition and metabolic rate. The importance of frontier orbitals and shape descriptors, such as planarity (estimated by the area/depth(2) parameter) and rectangularity (estimated by the length/width parameter) is discussed, particularly in the context of the COMPACT system which discriminates between several P450 families associated with the activation and detoxication of xenobiotics. The use of parameters, particularly those derived from homology modelling of mammalian (especially human) P450s that are involved in exogenous metabolism, in generating QSARs for P450 substrates is discussed in the context of explaining differences in the binding affinities of human P450 substrates which are pharmacologically active.
Collapse
Affiliation(s)
- David F V Lewis
- School of Biomedical and Life Sciences, University of Surrey, Guildford, Surrey, GU2 7XH, UK.
| |
Collapse
|
98
|
Mälkiä A, Murtomäki L, Urtti A, Kontturi K. Drug permeation in biomembranes. Eur J Pharm Sci 2004; 23:13-47. [PMID: 15324921 DOI: 10.1016/j.ejps.2004.05.009] [Citation(s) in RCA: 134] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2003] [Revised: 05/13/2004] [Accepted: 05/24/2004] [Indexed: 11/21/2022]
Abstract
In the past decades, it has become increasingly apparent that in addition to therapeutic effect, drugs need to exhibit favourable absorption, distribution, metabolism and excretion (ADME) characteristics to produce a desirable response in vivo. As the recent progress in drug discovery technology enables rapid synthesis of vast numbers of potential drug candidates, robust methods are required for the effective screening of compounds synthesized within such programs, so that compounds with poor pharmacokinetic properties can be rejected at an early stage of drug development. Furthermore, a viable in silico method would save resources by enabling virtual screening of drug candidates already prior to synthesis. This review gives a general overview of the approaches aimed at predicting biological permeation, one of the cornerstones behind the ADME behaviour of drugs. The most important experimental and computational models are reviewed. Physicochemical factors underlying the permeation process are discussed.
Collapse
Affiliation(s)
- Annika Mälkiä
- Laboratory of Physical Chemistry and Electrochemistry, Helsinki University of Technology, P.O. Box 6100, FIN-02015 HUT, Finland
| | | | | | | |
Collapse
|
99
|
Lewis DFV, Jacobs MN, Dickins M. Compound lipophilicity for substrate binding to human P450s in drug metabolism. Drug Discov Today 2004; 9:530-7. [PMID: 15183161 DOI: 10.1016/s1359-6446(04)03115-0] [Citation(s) in RCA: 116] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Compound lipophilicity is of key importance to P450 binding affinity and enzyme selectivity. Here, lipophilicity is discussed with reference to the human drug-metabolizing P450 enzymes of families CYP1, CYP2 and CYP3. From an extensive compilation of log P values for P450 substrates, and by analysis of relationships between partitioning energy and substrate-binding free energy, the relevance of lipophilicity and other factors pertaining to P450 binding affinity is explained, leading to the formulation of lipophilicity relationships within substrates of each human P450 enzyme involved in drug metabolism. Furthermore, log P values for P450 substrates appear to represent markers for enzyme selectivity. Together with the important roles of hydrogen bonding and pi-pi stacking interaction energies, the desolvation of the P450 active site makes a major contribution to the overall substrate-binding energy and, consequently, a good agreement with experimental information is reported based on this analysis.
Collapse
Affiliation(s)
- David F V Lewis
- School of Biomedical and Molecular Sciences, University of Surrey, Guildford, Surrey, UK.
| | | | | |
Collapse
|
100
|
Oberg T. Boiling Points of Halogenated Aliphatic Compounds: A Quantitative Structure−Property Relationship for Prediction and Validation. ACTA ACUST UNITED AC 2004; 44:187-92. [PMID: 14741027 DOI: 10.1021/ci034183v] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Halogenated aliphatic compounds have many technical uses, but substances within this group are also ubiquitous environmental pollutants that can affect the ozone layer and contribute to global warming. The establishment of quantitative structure-property relationships is of interest not only to fill in gaps in the available database but also to validate experimental data already acquired. The three-dimensional structures of 240 compounds were modeled with molecular mechanics prior to the generation of empirical descriptors. Two bilinear projection methods, principal component analysis (PCA) and partial-least-squares regression (PLSR), were used to identify outliers. PLSR was subsequently used to build a multivariate calibration model by extracting the latent variables that describe most of the covariation between the molecular structure and the boiling point. Boiling points were also estimated with an extension of the group contribution method of Stein and Brown.
Collapse
Affiliation(s)
- Tomas Oberg
- Department of Biology and Environmental Science, University of Kalmar, SE-391 82 Kalmar, Sweden.
| |
Collapse
|