51
|
Yamashita H, Higuchi T, Yoshida R. Atom environment kernels on molecules. J Chem Inf Model 2014; 54:1289-300. [PMID: 24802375 DOI: 10.1021/ci400403w] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The measurement of molecular similarity is an essential part of various machine learning tasks in chemical informatics. Graph kernels provide good similarity measures between molecules. Conventional graph kernels are based on counting common subgraphs of specific types in the molecular graphs. This approach has two primary limitations: (i) only exact subgraph matching is considered in the counting operation, and (ii) most of the subgraphs will be less relevant to a given task. In order to address the above-mentioned limitations, we propose a new graph kernel as an extension of the subtree kernel initially proposed by Ramon and Gärtner (2003). The proposed kernel tolerates an inexact match between subgraphs by allowing matching between atoms with similar local environments. In addition, the proposed kernel provides a method to assign an importance weight to each subgraph according to the relevance to the task, which is predetermined by a statistical test. These extensions are evaluated for classification and regression tasks of predicting a wide range of pharmaceutical properties from molecular structures, with promising results.
Collapse
Affiliation(s)
- Hiroshi Yamashita
- The Graduate University for Advanced Studies , 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan
| | | | | |
Collapse
|
52
|
Raevsky O, Solodova S, Lagunin A, Poroikov V. Computer modeling of blood brain barrier permeability of physiologically active compounds. ACTA ACUST UNITED AC 2014; 60:161-81. [DOI: 10.18097/pbmc20146002161] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
At present work discusses the current level of computer modeling the relationship structure of organic compounds and drugs and their ability to penetrate the BBB. All descriptors that influence to this permeability within classification and regression QSAR models are generalized and analyzed. The crucial role of H-bond in processes both passive, and active transport across BBB is observed. It is concluded that further research should be focused on interpretation the spatial structure of a full-size P-glycoprotein molecule with high resolution and the creation of QSAR models describing the quantitative relationship between structure and active transport of substances across BBB.
Collapse
Affiliation(s)
- O.A. Raevsky
- Institute of Physiologically Active Compounds, Russian Academy of Science
| | - S.L. Solodova
- Institute of Physiologically Active Compounds, Russian Academy of Science
| | - A.A. Lagunin
- Orekhovich Institute of Biomedical Chemistry of Russian Academy of Medical Sciences
| | - V.V. Poroikov
- Orekhovich Institute of Biomedical Chemistry of Russian Academy of Medical Sciences
| |
Collapse
|
53
|
Raevsky OA, Solodova SL, Lagunin AA, Poroikov VV. Computer modeling of blood brain barrier permeability for physiologically active compounds. BIOCHEMISTRY MOSCOW-SUPPLEMENT SERIES B-BIOMEDICAL CHEMISTRY 2013. [DOI: 10.1134/s199075081302008x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
54
|
Li BK, Cong Y, Yang XG, Xue Y, Chen YZ. In silico prediction of spleen tyrosine kinase inhibitors using machine learning approaches and an optimized molecular descriptor subset generated by recursive feature elimination method. Comput Biol Med 2013; 43:395-404. [DOI: 10.1016/j.compbiomed.2013.01.015] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2012] [Revised: 12/31/2012] [Accepted: 01/21/2013] [Indexed: 11/16/2022]
|
55
|
|
56
|
Shahid M, Shahzad Cheema M, Klenner A, Younesi E, Hofmann-Apitius M. SVM Based Descriptor Selection and Classification of Neurodegenerative Disease Drugs for Pharmacological Modeling. Mol Inform 2013; 32:241-9. [PMID: 27481519 DOI: 10.1002/minf.201200116] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2012] [Accepted: 01/07/2013] [Indexed: 11/10/2022]
Abstract
Systems pharmacological modeling of drug mode of action for the next generation of multitarget drugs may open new routes for drug design and discovery. Computational methods are widely used in this context amongst which support vector machines (SVM) have proven successful in addressing the challenge of classifying drugs with similar features. We have applied a variety of such SVM-based approaches, namely SVM-based recursive feature elimination (SVM-RFE). We use the approach to predict the pharmacological properties of drugs widely used against complex neurodegenerative disorders (NDD) and to build an in-silico computational model for the binary classification of NDD drugs from other drugs. Application of an SVM-RFE model to a set of drugs successfully classified NDD drugs from non-NDD drugs and resulted in overall accuracy of ∼80 % with 10 fold cross validation using 40 top ranked molecular descriptors selected out of total 314 descriptors. Moreover, SVM-RFE method outperformed linear discriminant analysis (LDA) based feature selection and classification. The model reduced the multidimensional descriptors space of drugs dramatically and predicted NDD drugs with high accuracy, while avoiding over fitting. Based on these results, NDD-specific focused libraries of drug-like compounds can be designed and existing NDD-specific drugs can be characterized by a well-characterized set of molecular descriptors.
Collapse
Affiliation(s)
- Mohammad Shahid
- Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for IT, Dahlmannstr. 2, 53113 Bonn, Germany
| | - Muhammad Shahzad Cheema
- Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for IT, Dahlmannstr. 2, 53113 Bonn, Germany
| | - Alexander Klenner
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53754 Sankt Augustin, Germany
| | - Erfan Younesi
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53754 Sankt Augustin, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53754 Sankt Augustin, Germany..
| |
Collapse
|
57
|
Wildenhain J, Fitzgerald N, Tyers M. MolClass: a web portal to interrogate diverse small molecule screen datasets with different computational models. Bioinformatics 2012; 28:2200-1. [PMID: 22711790 PMCID: PMC3413392 DOI: 10.1093/bioinformatics/bts349] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
UNLABELLED The MolClass toolkit and data portal generate computational models from user-defined small molecule datasets based on structural features identified in hit and non-hit molecules in different screens. Each new model is applied to all datasets in the database to classify compound specificity. MolClass thus defines a likelihood value for each compound entry and creates an activity fingerprint across diverse sets of screens. MolClass uses a variety of machine-learning methods to find molecular patterns and can therefore also assign a priori predictions of bioactivities for previously untested molecules. The power of the MolClass resource will grow as a function of the number of screens deposited in the database. AVAILABILITY AND IMPLEMENTATION The MolClass webportal, software package and source code are freely available for non-commercial use at http://tyerslab.bio.ed.ac.uk/molclass. A MolClass tutorial and a guide on how to build models from datasets can also be found on the web site. MolClass uses the chemistry development kit (CDK), WEKA and MySQL for its core functionality. A REST service is available at http://tyerslab.bio.ed.ac.uk/molclass/api based on the OpenTox API 1.2.
Collapse
Affiliation(s)
- Jan Wildenhain
- Wellcome Trust Centre for Cell Biology and School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3JR, UK.
| | | | | |
Collapse
|
58
|
Martins IF, Teixeira AL, Pinheiro L, Falcao AO. A Bayesian Approach to in Silico Blood-Brain Barrier Penetration Modeling. J Chem Inf Model 2012; 52:1686-97. [DOI: 10.1021/ci300124c] [Citation(s) in RCA: 104] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Affiliation(s)
| | - Ana L. Teixeira
- CQB - Centro de Quimica e Bioquimica,
Faculty of Sciences, University of Lisbon, Lisbon, Portugal
| | | | | |
Collapse
|
59
|
Abstract
In silico tools specifically developed for prediction of pharmacokinetic parameters are of particular interest to pharmaceutical industry because of the high potential of discarding inappropriate molecules during an early stage of drug development itself with consequent saving of vital resources and valuable time. The ultimate goal of the in silico models of absorption, distribution, metabolism, and excretion (ADME) properties is the accurate prediction of the in vivo pharmacokinetics of a potential drug molecule in man, whilst it exists only as a virtual structure. Various types of in silico models developed for successful prediction of the ADME parameters like oral absorption, bioavailability, plasma protein binding, tissue distribution, clearance, half-life, etc. have been briefly described in this chapter.
Collapse
Affiliation(s)
- A K Madan
- Pt. BD Sharma University of Health Sciences, Rohtak, India.
| | | |
Collapse
|
60
|
Zhang S. Application of Machine Leaning in Drug Discovery and Development. Mach Learn 2012. [DOI: 10.4018/978-1-60960-818-7.ch517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Machine learning techniques have been widely used in drug discovery and development, particularly in the areas of cheminformatics, bioinformatics and other types of pharmaceutical research. It has been demonstrated they are suitable for large high dimensional data, and the models built with these methods can be used for robust external predictions. However, various problems and challenges still exist, and new approaches are in great need. In this Chapter, the authors will review the current development of machine learning techniques, and especially focus on several machine learning techniques they developed as well as their application to model building, lead discovery via virtual screening, integration with molecular docking, and prediction of off-target properties. The authors will suggest some potential different avenues to unify different disciplines, such as cheminformatics, bioinformatics and systems biology, for the purpose of developing integrated in silico drug discovery and development approaches.
Collapse
Affiliation(s)
- Shuxing Zhang
- The University of Texas at M.D. Anderson Cancer Center, USA
| |
Collapse
|
61
|
Muehlbacher M, Spitzer GM, Liedl KR, Kornhuber J. Qualitative prediction of blood-brain barrier permeability on a large and refined dataset. J Comput Aided Mol Des 2011; 25:1095-106. [PMID: 22109848 PMCID: PMC3241963 DOI: 10.1007/s10822-011-9478-1] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2011] [Accepted: 10/10/2011] [Indexed: 12/14/2022]
Abstract
The prediction of blood-brain barrier permeation is vitally important for the optimization of drugs targeting the central nervous system as well as for avoiding side effects of peripheral drugs. Following a previously proposed model on blood-brain barrier penetration, we calculated the cross-sectional area perpendicular to the amphiphilic axis. We obtained a high correlation between calculated and experimental cross-sectional area (r = 0.898, n = 32). Based on these results, we examined a correlation of the calculated cross-sectional area with blood-brain barrier penetration given by logBB values. We combined various literature data sets to form a large-scale logBB dataset with 362 experimental logBB values. Quantitative models were calculated using bootstrap validated multiple linear regression. Qualitative models were built by a bootstrapped random forest algorithm. Both methods found similar descriptors such as polar surface area, pKa, logP, charges and number of positive ionisable groups to be predictive for logBB. In contrast to our initial assumption, we were not able to obtain models with the cross-sectional area chosen as relevant parameter for both approaches. Comparing those two different techniques, qualitative random forest models are better suited for blood-brain barrier permeability prediction, especially when reducing the number of descriptors and using a large dataset. A random forest prediction system (n(trees) = 5) based on only four descriptors yields a validated accuracy of 88%.
Collapse
Affiliation(s)
- Markus Muehlbacher
- Department of Psychiatry and Psychotherapy, Friedrich-Alexander University of Erlangen-Nuremberg, Erlangen-Nuremberg, Germany
| | - Gudrun M. Spitzer
- Theoretical Chemistry, Center for Molecular Biosciences, University of Innsbruck, Innsbruck, Austria
| | - Klaus R. Liedl
- Theoretical Chemistry, Center for Molecular Biosciences, University of Innsbruck, Innsbruck, Austria
| | - Johannes Kornhuber
- Department of Psychiatry and Psychotherapy, Friedrich-Alexander University of Erlangen-Nuremberg, Erlangen-Nuremberg, Germany
| |
Collapse
|
62
|
Demir-Kavuk O, Bentzien J, Muegge I, Knapp EW. DemQSAR: predicting human volume of distribution and clearance of drugs. J Comput Aided Mol Des 2011; 25:1121-33. [DOI: 10.1007/s10822-011-9496-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2011] [Accepted: 11/13/2011] [Indexed: 12/11/2022]
|
63
|
Quantifying structure and performance diversity for sets of small molecules comprising small-molecule screening collections. Proc Natl Acad Sci U S A 2011; 108:6817-22. [PMID: 21482810 DOI: 10.1073/pnas.1015024108] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Using a diverse collection of small molecules we recently found that compound sets from different sources (commercial; academic; natural) have different protein-binding behaviors, and these behaviors correlate with trends in stereochemical complexity for these compound sets. These results lend insight into structural features that synthetic chemists might target when synthesizing screening collections for biological discovery. We report extensive characterization of structural properties and diversity of biological performance for these compounds and expand comparative analyses to include physicochemical properties and three-dimensional shapes of predicted conformers. The results highlight additional similarities and differences between the sets, but also the dependence of such comparisons on the choice of molecular descriptors. Using a protein-binding dataset, we introduce an information-theoretic measure to assess diversity of performance with a constraint on specificity. Rather than relying on finding individual active compounds, this measure allows rational judgment of compound subsets as groups. We also apply this measure to publicly available data from ChemBank for the same compound sets across a diverse group of functional assays. We find that performance diversity of compound sets is relatively stable across a range of property values as judged by this measure, both in protein-binding studies and functional assays. Because building screening collections with improved performance depends on efficient use of synthetic organic chemistry resources, these studies illustrate an important quantitative framework to help prioritize choices made in building such collections.
Collapse
|
64
|
Hecht D. Applications of machine learning and computational intelligence to drug discovery and development. Drug Dev Res 2010. [DOI: 10.1002/ddr.20402] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- David Hecht
- Southwestern College, Chula Vista, California
| |
Collapse
|
65
|
Sá MMD, Pasqualoto KFM, Rangel-Yagui CDO. A 2D-QSPR approach to predict blood-brain barrier penetration of drugs acting on the central nervous system. BRAZ J PHARM SCI 2010. [DOI: 10.1590/s1984-82502010000400016] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Drugs acting on the central nervous system (CNS) have to cross the blood-brain barrier (BBB) in order to perform their pharmacological actions. Passive BBB diffusion can be partially expressed by the blood/brain partition coefficient (logBB). As the experimental evaluation of logBB is time and cost consuming, theoretical methods such as quantitative structure-property relationships (QSPR) can be useful to predict logBB values. In this study, a 2D-QSPR approach was applied to a set of 28 drugs acting on the CNS, using the logBB property as biological data. The best QSPR model [n = 21, r = 0.94 (r² = 0.88), s = 0.28, and Q² = 0.82] presented three molecular descriptors: calculated n-octanol/water partition coefficient (ClogP), polar surface area (PSA), and polarizability (α). Six out of the seven compounds from the test set were well predicted, which corresponds to good external predictability (85.7%). These findings can be helpful to guide future approaches regarding those molecular descriptors which must be considered for estimating the logBB property, and also for predicting the BBB crossing ability for molecules structurally related to the investigated set.
Collapse
|
66
|
Abstract
The ability of a compound to elicit a toxic effect within an organism is dependent upon three factors (i) the external exposure of the organism to the toxicant in the environment or via the food chain (ii) the internal uptake of the compound into the organism and its transport to the site of action in sufficient concentration and (iii) the inherent toxicity of the compound. The in silico prediction of toxicity and the role of external exposure have been dealt with in other chapters of this book. This chapter focuses on the importance of ‘internal exposure’ i.e. the absorption, distribution, metabolism and elimination (ADME) properties of compounds which determine their toxicokinetic profile. An introduction to key concepts in toxicokinetics will be provided, along with examples of modelling approaches and software available to predict these properties. A brief introduction will also be given into the theory of physiologically-based toxicokinetic modelling.
Collapse
Affiliation(s)
- J. C. Madden
- School of Pharmacy and Chemistry, Liverpool John Moores University Byrom Street Liverpool L3 3AF UK
| |
Collapse
|
67
|
Ma EYT, Cameron CJF, Kremer SC. Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization. BMC Bioinformatics 2010; 11 Suppl 8:S4. [PMID: 21034429 PMCID: PMC2966291 DOI: 10.1186/1471-2105-11-s8-s4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This paper demonstrates how a Neural Grammar Network learns to classify and score molecules for a variety of tasks in chemistry and toxicology. In addition to a more detailed analysis on datasets previously studied, we introduce three new datasets (BBB, FXa, and toxicology) to show the generality of the approach. A new experimental methodology is developed and applied to both the new datasets as well as previously studied datasets. This methodology is rigorous and statistically grounded, and ultimately culminates in a Wilcoxon significance test that proves the effectiveness of the system. We further include a complete generalization of the specific technique to arbitrary grammars and datasets using a mathematical abstraction that allows researchers in different domains to apply the method to their own work. Background Our work can be viewed as an alternative to existing methods to solve the quantitative structure-activity relationship (QSAR) problem. To this end, we review a number approaches both from a methodological and also a performance perspective. In addition to these approaches, we also examined a number of chemical properties that can be used by generic classifier systems, such as feed-forward artificial neural networks. In studying these approaches, we identified a set of interesting benchmark problem sets to which many of the above approaches had been applied. These included: ACE, AChE, AR, BBB, BZR, Cox2, DHFR, ER, FXa, GPB, Therm, and Thr. Finally, we developed our own benchmark set by collecting data on toxicology. Results Our results show that our system performs better than, or comparatively to, the existing methods over a broad range of problem types. Our method does not require the expert knowledge that is necessary to apply the other methods to novel problems. Conclusions We conclude that our success is due to the ability of our system to: 1) encode molecules losslessly before presentation to the learning system, and 2) leverage the design of molecular description languages to facilitate the identification of relevant structural attributes of the molecules over different problem domains.
Collapse
|
68
|
Shen J, Cheng F, Xu Y, Li W, Tang Y. Estimation of ADME properties with substructure pattern recognition. J Chem Inf Model 2010; 50:1034-41. [PMID: 20578727 DOI: 10.1021/ci100104j] [Citation(s) in RCA: 205] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Over the past decade, absorption, distribution, metabolism, and excretion (ADME) property evaluation has become one of the most important issues in the process of drug discovery and development. Since in vivo and in vitro evaluations are costly and laborious, in silico techniques had been widely used to estimate ADME properties of chemical compounds. Traditional prediction methods usually try to build a functional relationship between a set of molecular descriptors and a given ADME property. Although traditional methods have been successfully used in many cases, the accuracy and efficiency of molecular descriptors must be concerned. Herein, we report a new classification method based on substructure pattern recognition, in which each molecule is represented as a substructure pattern fingerprint based on a predefined substructure dictionary, and then a support vector machine (SVM) algorithm is applied to build the prediction model. Therefore, a direct connection between substructures and molecular properties is built. The most important substructure patterns can be identified via the information gain analysis, which could help to interpret the models from a medicinal chemistry perspective. Afterward, this method was verified with two data sets, one for blood-brain barrier (BBB) penetration and the other for human intestinal absorption (HIA). The results demonstrated that the overall predictive accuracies of the best HIA model for the training and test sets were 98.5 and 98.8%, and the overall predictive accuracies of the best BBB model for the training and test sets were 98.8 and 98.4%, which confirmed the reliability of our method. In the additional validations, the predictive accuracies were 94 and 69.5% for the HIA and the BBB models, respectively. Moreover, some of the representative key substructure patterns which significantly correlated with the HIA and BBB penetration properties were also presented.
Collapse
Affiliation(s)
- Jie Shen
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | | | | | | | | |
Collapse
|
69
|
Haque IS, Pande VS. SCISSORS: a linear-algebraical technique to rapidly approximate chemical similarities. J Chem Inf Model 2010; 50:1075-88. [PMID: 20509629 DOI: 10.1021/ci1000136] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Algorithms for several emerging large-scale problems in cheminformatics have as their rate-limiting step the evaluation of relatively slow chemical similarity measures, such as structural similarity or three-dimensional (3-D) shape comparison. In this article we present SCISSORS, a linear-algebraical technique (related to multidimensional scaling and kernel principal components analysis) to rapidly estimate chemical similarities for several popular measures. We demonstrate that SCISSORS faithfully reflects its source similarity measures for both Tanimoto calculation and rank ordering. After an efficient precalculation step on a database, SCISSORS affords several orders of magnitude of speedup in database screening. SCISSORS furthermore provides an asymptotic speedup for large similarity matrix construction problems, reducing the number of conventional slow similarity evaluations required from quadratic to linear scaling.
Collapse
Affiliation(s)
- Imran S Haque
- Department of Computer Science and Department of Chemistry, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
70
|
Fan Y, Unwalla R, Denny RA, Di L, Kerns EH, Diller DJ, Humblet C. Insights for Predicting Blood-Brain Barrier Penetration of CNS Targeted Molecules Using QSPR Approaches. J Chem Inf Model 2010; 50:1123-33. [DOI: 10.1021/ci900384c] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Yi Fan
- Chemical and Screening Sciences, Wyeth Research, Princeton, CN8000, New Jersey 08543-8000, Chemical and Screening Sciences, Wyeth Research, 500 Arcola Road, Collegeville, Pennsylvania 19426, and Chemical and Screening Sciences, Wyeth Research, 35 Cambridge Park Drive, Cambridge, Massachusetts 02140
| | - Rayomand Unwalla
- Chemical and Screening Sciences, Wyeth Research, Princeton, CN8000, New Jersey 08543-8000, Chemical and Screening Sciences, Wyeth Research, 500 Arcola Road, Collegeville, Pennsylvania 19426, and Chemical and Screening Sciences, Wyeth Research, 35 Cambridge Park Drive, Cambridge, Massachusetts 02140
| | - Rajiah A. Denny
- Chemical and Screening Sciences, Wyeth Research, Princeton, CN8000, New Jersey 08543-8000, Chemical and Screening Sciences, Wyeth Research, 500 Arcola Road, Collegeville, Pennsylvania 19426, and Chemical and Screening Sciences, Wyeth Research, 35 Cambridge Park Drive, Cambridge, Massachusetts 02140
| | - Li Di
- Chemical and Screening Sciences, Wyeth Research, Princeton, CN8000, New Jersey 08543-8000, Chemical and Screening Sciences, Wyeth Research, 500 Arcola Road, Collegeville, Pennsylvania 19426, and Chemical and Screening Sciences, Wyeth Research, 35 Cambridge Park Drive, Cambridge, Massachusetts 02140
| | - Edward H. Kerns
- Chemical and Screening Sciences, Wyeth Research, Princeton, CN8000, New Jersey 08543-8000, Chemical and Screening Sciences, Wyeth Research, 500 Arcola Road, Collegeville, Pennsylvania 19426, and Chemical and Screening Sciences, Wyeth Research, 35 Cambridge Park Drive, Cambridge, Massachusetts 02140
| | - David J. Diller
- Chemical and Screening Sciences, Wyeth Research, Princeton, CN8000, New Jersey 08543-8000, Chemical and Screening Sciences, Wyeth Research, 500 Arcola Road, Collegeville, Pennsylvania 19426, and Chemical and Screening Sciences, Wyeth Research, 35 Cambridge Park Drive, Cambridge, Massachusetts 02140
| | - Christine Humblet
- Chemical and Screening Sciences, Wyeth Research, Princeton, CN8000, New Jersey 08543-8000, Chemical and Screening Sciences, Wyeth Research, 500 Arcola Road, Collegeville, Pennsylvania 19426, and Chemical and Screening Sciences, Wyeth Research, 35 Cambridge Park Drive, Cambridge, Massachusetts 02140
| |
Collapse
|
71
|
Sakiyama Y. The use of machine learning and nonlinear statistical tools for ADME prediction. Expert Opin Drug Metab Toxicol 2010; 5:149-69. [PMID: 19239395 DOI: 10.1517/17425250902753261] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Absorption, distribution, metabolism and excretion (ADME)-related failure of drug candidates is a major issue for the pharmaceutical industry today. Prediction of ADME by in silico tools has now become an inevitable paradigm to reduce cost and enhance efficiency in pharmaceutical research. Recently, machine learning as well as nonlinear statistical tools has been widely applied to predict routine ADME end points. To achieve accurate and reliable predictions, it would be a prerequisite to understand the concepts, mechanisms and limitations of these tools. Here, we have devised a small synthetic nonlinear data set to help understand the mechanism of machine learning by 2D-visualisation. We applied six new machine learning methods to four different data sets. The methods include Naive Bayes classifier, classification and regression tree, random forest, Gaussian process, support vector machine and k nearest neighbour. The results demonstrated that ensemble learning and kernel machine displayed greater accuracy of prediction than classical methods irrespective of the data set size. The importance of interaction with the engineering field is also addressed. The results described here provide insights into the mechanism of machine learning, which will enable appropriate usage in the future.
Collapse
Affiliation(s)
- Yojiro Sakiyama
- Pharmacokinetics Dynamics Metabolism, Pfizer Global Research and Development, Sandwich Laboratories, Kent, UK.
| |
Collapse
|
72
|
Obrezanova O, Segall MD. Gaussian Processes for Classification: QSAR Modeling of ADMET and Target Activity. J Chem Inf Model 2010; 50:1053-61. [DOI: 10.1021/ci900406x] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Olga Obrezanova
- Optibrium Ltd., 7226 IQ Cambridge, Beach Drive, Cambridge, CB25 9TL, United Kingdom
| | - Matthew D. Segall
- Optibrium Ltd., 7226 IQ Cambridge, Beach Drive, Cambridge, CB25 9TL, United Kingdom
| |
Collapse
|
73
|
Madden JC. In Silico Approaches for Predicting Adme Properties. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2010. [DOI: 10.1007/978-1-4020-9783-6_10] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
74
|
Mensch J, Oyarzabal J, Mackie C, Augustijns P. In vivo, in vitro and in silico methods for small molecule transfer across the BBB. J Pharm Sci 2009; 98:4429-68. [DOI: 10.1002/jps.21745] [Citation(s) in RCA: 116] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
75
|
Wang Z, Yan A, Yuan Q. Classification of Blood-Brain Barrier Permeation by Kohonen's Self-Organizing Neural Network (KohNN) and Support Vector Machine (SVM). ACTA ACUST UNITED AC 2009. [DOI: 10.1002/qsar.200960008] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
|
76
|
An integrated scheme for feature selection and parameter setting in the support vector machine modeling and its application to the prediction of pharmacokinetic properties of drugs. Artif Intell Med 2009; 46:155-63. [DOI: 10.1016/j.artmed.2008.07.001] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2007] [Revised: 07/02/2008] [Accepted: 07/04/2008] [Indexed: 02/01/2023]
|
77
|
Yang XG, Chen D, Wang M, Xue Y, Chen YZ. Prediction of antibacterial compounds by machine learning approaches. J Comput Chem 2009; 30:1202-11. [DOI: 10.1002/jcc.21148] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
78
|
Three-class classification models of logS and logP derived by using GA–CG–SVM approach. Mol Divers 2009; 13:261-8. [DOI: 10.1007/s11030-009-9108-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2008] [Accepted: 01/09/2009] [Indexed: 10/21/2022]
|
79
|
Ekins S, Tropsha A. A turning point for blood-brain barrier modeling. Pharm Res 2009; 26:1283-4. [PMID: 19165578 DOI: 10.1007/s11095-009-9832-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2009] [Accepted: 01/12/2009] [Indexed: 10/21/2022]
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, 601 Runnymede Avenue, Jenkintown, Pensylvannia 19046, USA.
| | | |
Collapse
|
80
|
Chen X, Liang YZ, Yuan DL, Xu QS. A modified uncorrelated linear discriminant analysis model coupled with recursive feature elimination for the prediction of bioactivity. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2009; 20:1-26. [PMID: 19343582 DOI: 10.1080/10629360902724127] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
To meet the requirements of providing accurate, robust, and interpretable prediction of bioactivity, a modified uncorrelated linear discriminant analysis (M-ULDA) model was developed. In addition, a feature selection method called recursive feature elimination (RFE), originally used for support vector machine (SVM), was introduced and modified to fit the scheme of ULDA. From the evaluation of six pharmaceutical datasets, the M-UDLA coupled with RFE showed better or comparable classification accuracy with respect to other well-studied methods such as SVM and decision trees. The RFE used for ULDA has the advantage of increasing the computational speed and provides useful insights into biochemical mechanisms related to pharmaceutical activity by significantly reducing the number of variables used for the final model.
Collapse
Affiliation(s)
- X Chen
- College of Chemistry and Chemical Engineering, Central South University, Changsha, People's Republic of China
| | | | | | | |
Collapse
|
81
|
Weis DC, Visco DP, Faulon JL. Data mining PubChem using a support vector machine with the Signature molecular descriptor: classification of factor XIa inhibitors. J Mol Graph Model 2008; 27:466-75. [PMID: 18829357 DOI: 10.1016/j.jmgm.2008.08.004] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2008] [Revised: 08/19/2008] [Accepted: 08/20/2008] [Indexed: 01/04/2023]
Abstract
The amount of high-throughput screening (HTS) data readily available has significantly increased because of the PubChem project (http://pubchem.ncbi.nlm.nih.gov/). There is considerable opportunity for data mining of small molecules for a variety of biological systems using cheminformatic tools and the resources available through PubChem. In this work, we trained a support vector machine (SVM) classifier using the Signature molecular descriptor on factor XIa inhibitor HTS data. The optimal number of Signatures was selected by implementing a feature selection algorithm of highly correlated clusters. Our method included an improvement that allowed clusters to work together for accuracy improvement, where previous methods have scored clusters on an individual basis. The resulting model had a 10-fold cross-validation accuracy of 89%, and additional validation was provided by two independent test sets. We applied the SVM to rapidly predict activity for approximately 12 million compounds also deposited in PubChem. Confidence in these predictions was assessed by considering the number of Signatures within the training set range for a given compound, defined as the overlap metric. To further evaluate compounds identified as active by the SVM, docking studies were performed using AutoDock. A focused database of compounds predicted to be active was obtained with several of the compounds appreciably dissimilar to those used in training the SVM. This focused database is suitable for further study. The data mining technique presented here is not specific to factor XIa inhibitors, and could be applied to other bioassays in PubChem where one is looking to expand the search for small molecules as chemical probes.
Collapse
Affiliation(s)
- Derick C Weis
- Department of Chemical Engineering, Tennessee Technological University, Box 5013, Cookeville, TN 38505, USA.
| | | | | |
Collapse
|
82
|
Zhang L, Zhu H, Oprea TI, Golbraikh A, Tropsha A. QSAR Modeling of the Blood–Brain Barrier Permeability for Diverse Organic Compounds. Pharm Res 2008; 25:1902-14. [DOI: 10.1007/s11095-008-9609-0] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2007] [Accepted: 04/23/2008] [Indexed: 01/16/2023]
|
83
|
Ma XH, Wang R, Yang SY, Li ZR, Xue Y, Wei YC, Low BC, Chen YZ. Evaluation of virtual screening performance of support vector machines trained by sparsely distributed active compounds. J Chem Inf Model 2008; 48:1227-37. [PMID: 18533644 DOI: 10.1021/ci800022e] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Virtual screening performance of support vector machines (SVM) depends on the diversity of training active and inactive compounds. While diverse inactive compounds can be routinely generated, the number and diversity of known actives are typically low. We evaluated the performance of SVM trained by sparsely distributed actives in six MDDR biological target classes composed of a high number of known actives (983-1645) of high, intermediate, and low structural diversity (muscarinic M1 receptor agonists, NMDA receptor antagonists, thrombin inhibitors, HIV protease inhibitors, cephalosporins, and renin inhibitors). SVM trained by regularly sparse data sets of 100 actives show improved yields at substantially reduced false-hit rates compared to those of published studies and those of Tanimoto-based similarity searching method based on the same data sets and molecular descriptors. SVM trained by very sparse data sets of 40 actives (2.4%-4.1% of the known actives) predicted 17.5-39.5%, 23.0-48.1%, and 70.2-92.4% of the remaining 943-1605 actives in the high, intermediate, and low diversity classes, respectively, 13.8-68.7% of which are outside the training compound families. SVM predicted 99.97% and 97.1% of the 9.997 M PUBCHEM and 167K remaining MDDR compounds as inactive and 2.6%-8.3% of the 19,495-38,483 MDDR compounds similar to the known actives as active. These suggest that SVM has substantial capability in identifying novel active compounds from sparse active data sets at low false-hit rates.
Collapse
Affiliation(s)
- X H Ma
- Centre for Computational Science and Engineering, National University of Singapore, Singapore
| | | | | | | | | | | | | | | |
Collapse
|
84
|
Shen J, Du Y, Zhao Y, Liu G, Tang Y. In SilicoPrediction of Blood–Brain Partitioning Using a Chemometric Method Called Genetic Algorithm Based Variable Selection. ACTA ACUST UNITED AC 2008. [DOI: 10.1002/qsar.200710129] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
85
|
Abstract
Background Activities of drug molecules can be predicted by QSAR (quantitative structure activity relationship) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer than that of negatives, it is important to predict molecular activities considering such an unbalanced situation. Results Here, asymmetric bagging and feature selection are introduced into the problem and asymmetric bagging of support vector machines (asBagging) is proposed on predicting drug activities to treat the unbalanced problem. At the same time, the features extracted from the structures of drug molecules affect prediction accuracy of QSAR models. Therefore, a novel algorithm named PRIFEAB is proposed, which applies an embedded feature selection method to remove redundant and irrelevant features for asBagging. Numerical experimental results on a data set of molecular activities show that asBagging improve the AUC and sensitivity values of molecular activities and PRIFEAB with feature selection further helps to improve the prediction ability. Conclusion Asymmetric bagging can help to improve prediction accuracy of activities of drug molecules, which can be furthermore improved by performing feature selection to select relevant features from the drug molecules data sets.
Collapse
|
86
|
New predictive models for blood-brain barrier permeability of drug-like molecules. Pharm Res 2008; 25:1836-45. [PMID: 18415049 DOI: 10.1007/s11095-008-9584-5] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2008] [Accepted: 03/27/2008] [Indexed: 01/16/2023]
Abstract
PURPOSE The goals of the present study were to apply a generalized regression model and support vector machine (SVM) models with Shape Signatures descriptors, to the domain of blood-brain barrier (BBB) modeling. MATERIALS AND METHODS The Shape Signatures method is a novel computational tool that was used to generate molecular descriptors utilized with the SVM classification technique with various BBB datasets. For comparison purposes we have created a generalized linear regression model with eight MOE descriptors and these same descriptors were also used to create SVM models. RESULTS The generalized regression model was tested on 100 molecules not in the model and resulted in a correlation r2 = 0.65. SVM models with MOE descriptors were superior to regression models, while Shape Signatures SVM models were comparable or better than those with MOE descriptors. The best 2D shape signature models had 10-fold cross validation prediction accuracy between 80-83% and leave-20%-out testing prediction accuracy between 80-82% as well as correctly predicting 84% of BBB+ compounds (n = 95) in an external database of drugs. CONCLUSIONS Our data indicate that Shape Signatures descriptors can be used with SVM and these models may have utility for predicting blood-brain barrier permeation in drug discovery.
Collapse
|
87
|
|
88
|
Han LY, Ma XH, Lin HH, Jia J, Zhu F, Xue Y, Li ZR, Cao ZW, Ji ZL, Chen YZ. A support vector machines approach for virtual screening of active compounds of single and multiple mechanisms from large libraries at an improved hit-rate and enrichment factor. J Mol Graph Model 2007; 26:1276-86. [PMID: 18218332 DOI: 10.1016/j.jmgm.2007.12.002] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2007] [Revised: 12/05/2007] [Accepted: 12/05/2007] [Indexed: 01/04/2023]
Abstract
Support vector machines (SVM) and other machine-learning (ML) methods have been explored as ligand-based virtual screening (VS) tools for facilitating lead discovery. While exhibiting good hit selection performance, in screening large compound libraries, these methods tend to produce lower hit-rate than those of the best performing VS tools, partly because their training-sets contain limited spectrum of inactive compounds. We tested whether the performance of SVM can be improved by using training-sets of diverse inactive compounds. In retrospective database screening of active compounds of single mechanism (HIV protease inhibitors, DHFR inhibitors, dopamine antagonists) and multiple mechanisms (CNS active agents) from large libraries of 2.986 million compounds, the yields, hit-rates, and enrichment factors of our SVM models are 52.4-78.0%, 4.7-73.8%, and 214-10,543, respectively, compared to those of 62-95%, 0.65-35%, and 20-1200 by structure-based VS and 55-81%, 0.2-0.7%, and 110-795 by other ligand-based VS tools in screening libraries of >or=1 million compounds. The hit-rates are comparable and the enrichment factors are substantially better than the best results of other VS tools. 24.3-87.6% of the predicted hits are outside the known hit families. SVM appears to be potentially useful for facilitating lead discovery in VS of large compound libraries.
Collapse
Affiliation(s)
- L Y Han
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543, Singapore
| | | | | | | | | | | | | | | | | | | |
Collapse
|
89
|
Guerra A, Páez J, Campillo N. Artificial Neural Networks in ADMET Modeling: Prediction of Blood-Brain Barrier Permeation. ACTA ACUST UNITED AC 2007. [DOI: 10.1002/qsar.200710019] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
|
90
|
|
91
|
Li H, Yap CW, Ung CY, Xue Y, Li ZR, Han LY, Lin HH, Chen YZ. Machine learning approaches for predicting compounds that interact with therapeutic and ADMET related proteins. J Pharm Sci 2007; 96:2838-60. [PMID: 17786989 DOI: 10.1002/jps.20985] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Computational methods for predicting compounds of specific pharmacodynamic and ADMET (absorption, distribution, metabolism, excretion and toxicity) property are useful for facilitating drug discovery and evaluation. Recently, machine learning methods such as neural networks and support vector machines have been explored for predicting inhibitors, antagonists, blockers, agonists, activators and substrates of proteins related to specific therapeutic and ADMET property. These methods are particularly useful for compounds of diverse structures to complement QSAR methods, and for cases of unavailable receptor 3D structure to complement structure-based methods. A number of studies have demonstrated the potential of these methods for predicting such compounds as substrates of P-glycoprotein and cytochrome P450 CYP isoenzymes, inhibitors of protein kinases and CYP isoenzymes, and agonists of serotonin receptor and estrogen receptor. This article is intended to review the strategies, current progresses and underlying difficulties in using machine learning methods for predicting these protein binders and as potential virtual screening tools. Algorithms for proper representation of the structural and physicochemical properties of compounds are also evaluated.
Collapse
Affiliation(s)
- H Li
- Bioinformatics and Drug Design Group, Department of Pharmacy and Department of Computational Science, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543, Singapore
| | | | | | | | | | | | | | | |
Collapse
|
92
|
Rückert U, Kramer S. Optimizing Feature Sets for Structured Data. MACHINE LEARNING: ECML 2007 2007. [DOI: 10.1007/978-3-540-74958-5_72] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
93
|
Ung CY, Li H, Cao ZW, Li YX, Chen YZ. Are herb-pairs of traditional Chinese medicine distinguishable from others? Pattern analysis and artificial intelligence classification study of traditionally defined herbal properties. JOURNAL OF ETHNOPHARMACOLOGY 2007; 111:371-7. [PMID: 17267151 DOI: 10.1016/j.jep.2006.11.037] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2006] [Revised: 11/24/2006] [Accepted: 11/28/2006] [Indexed: 05/13/2023]
Abstract
Multi-herb prescriptions of traditional Chinese medicine (TCM) often include special herb-pairs for mutual enhancement, assistance, and restraint. These TCM herb-pairs have been assembled and interpreted based on traditionally defined herbal properties (TCM-HPs) without knowledge of mechanism of their assumed synergy. While these mechanisms are yet to be determined, properties of TCM herb-pairs can be investigated to determine if they exhibit features consistent with their claimed unique synergistic combinations. We analyzed distribution patterns of TCM-HPs of TCM herb-pairs to detect signs indicative of possible synergy and used artificial intelligence (AI) methods to examine whether combination of their TCM-HPs are distinguishable from those of non-TCM herb-pairs assembled by random combinations and by modification of known TCM herb-pairs. Patterns of the majority of 394 known TCM herb-pairs were found to exhibit signs of herb-pair correlation. Three AI systems, trained and tested by using 394 TCM herb-pairs and 2470 non-TCM herb-pairs, correctly classified 72.1-87.9% of TCM herb-pairs and 91.6-97.6% of the non-TCM herb-pairs. The best AI system predicted 96.3% of the 27 known non-TCM herb-pairs and 99.7% of the other 1,065,100 possible herb-pairs as non-TCM herb-pairs. Our studies suggest that TCM-HPs of known TCM herb-pairs contain features distinguishable from those of non-TCM herb-pairs consistent with their claimed synergistic or modulating combinations.
Collapse
Affiliation(s)
- Choong Yong Ung
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543, Singapore
| | | | | | | | | |
Collapse
|
94
|
Ung CY, Li H, Kong CY, Wang JF, Chen YZ. Usefulness of traditionally defined herbal properties for distinguishing prescriptions of traditional Chinese medicine from non-prescription recipes. JOURNAL OF ETHNOPHARMACOLOGY 2007; 109:21-8. [PMID: 16884871 DOI: 10.1016/j.jep.2006.06.007] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2006] [Revised: 05/31/2006] [Accepted: 06/14/2006] [Indexed: 05/11/2023]
Abstract
Traditional Chinese medicine (TCM) has been widely practiced and is considered as an attractive to conventional medicine. Multi-herb recipes have been routinely used in TCM. These have been formulated by using TCM-defined herbal properties (TCM-HPs), the scientific basis of which is unclear. The usefulness of TCM-HPs was evaluated by analyzing the distribution pattern of TCM-HPs of the constituent herbs in 1161 classical TCM prescriptions, which shows patterns of multi-herb correlation. Two artificial intelligence (AI) methods were used to examine whether TCM-HPs are capable of distinguishing TCM prescriptions from non-TCM recipes. Two AI systems were trained and tested by using 1161 TCM prescriptions, 11,202 non-TCM recipes, and two separate evaluation methods. These systems correctly classified 83.1-97.3% of the TCM prescriptions, 90.8-92.3% of the non-TCM recipes. These results suggest that TCM-HPs are capable of separating TCM prescriptions from non-TCM recipes, which are useful for formulating TCM prescriptions and consistent with the expected correlation between TCM-HPs and the physicochemical properties of herbal ingredients responsible for producing the collective pharmacological and other effects of specific TCM prescriptions.
Collapse
Affiliation(s)
- C Y Ung
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore 117543, Singapore
| | | | | | | | | |
Collapse
|
95
|
Li ZR, Han LY, Xue Y, Yap CW, Li H, Jiang L, Chen YZ. MODEL—molecular descriptor lab: A web-based server for computing structural and physicochemical features of compounds. Biotechnol Bioeng 2007; 97:389-96. [PMID: 17013940 DOI: 10.1002/bit.21214] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Molecular descriptors represent structural and physicochemical features of compounds. They have been extensively used for developing statistical models, such as quantitative structure activity relationship (QSAR) and artificial neural networks (NN), for computer prediction of the pharmacodynamic, pharmacokinetic, or toxicological properties of compounds from their structure. While computer programs have been developed for computing molecular descriptors, there is a lack of a freely accessible one. We have developed a web-based server, MODEL (Molecular Descriptor Lab), for computing a comprehensive set of 3,778 molecular descriptors, which is significantly more than the approximately 1,600 molecular descriptors computed by other software. Our computational algorithms have been extensively tested and the computed molecular descriptors have been used in a number of published works of statistical models for predicting variety of pharmacodynamic, pharmacokinetic, and toxicological properties of compounds. Several testing studies on the computed molecular descriptors are discussed. MODEL is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/model/model.cgi free of charge for academic use.
Collapse
Affiliation(s)
- Z R Li
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore
| | | | | | | | | | | | | |
Collapse
|
96
|
Li H, Ung CY, Yap CW, Xue Y, Li ZR, Chen YZ. Prediction of estrogen receptor agonists and characterization of associated molecular descriptors by statistical learning methods. J Mol Graph Model 2006; 25:313-23. [PMID: 16497524 DOI: 10.1016/j.jmgm.2006.01.007] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2005] [Revised: 12/21/2005] [Accepted: 01/19/2006] [Indexed: 01/04/2023]
Abstract
Specific estrogen receptor (ER) agonists have been used for hormone replacement therapy, contraception, osteoporosis prevention, and prostate cancer treatment. Some ER agonists and partial-agonists induce cancer and endocrine function disruption. Methods for predicting ER agonists are useful for facilitating drug discovery and chemical safety evaluation. Structure-activity relationships and rule-based decision forest models have been derived for predicting ER binders at impressive accuracies of 87.1-97.6% for ER binders and 80.2-96.0% for ER non-binders. However, these are not designed for identifying ER agonists and they were developed from a subset of known ER binders. This work explored several statistical learning methods (support vector machines, k-nearest neighbor, probabilistic neural network and C4.5 decision tree) for predicting ER agonists from comprehensive set of known ER agonists and other compounds. The corresponding prediction systems were developed and tested by using 243 ER agonists and 463 ER non-agonists, respectively, which are significantly larger in number and structural diversity than those in previous studies. A feature selection method was used for selecting molecular descriptors responsible for distinguishing ER agonists from non-agonists, some of which are consistent with those used in other studies and the findings from X-ray crystallography data. The prediction accuracies of these methods are comparable to those of earlier studies despite the use of significantly more diverse range of compounds. SVM gives the best accuracy of 88.9% for ER agonists and 98.1% for non-agonists. Our study suggests that statistical learning methods such as SVM are potentially useful for facilitating the prediction of ER agonists and for characterizing the molecular descriptors associated with ER agonists.
Collapse
Affiliation(s)
- H Li
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore 117543, Singapore
| | | | | | | | | | | |
Collapse
|
97
|
Ung CY, Li H, Yap CW, Chen YZ. In Silico Prediction of Pregnane X Receptor Activators by Machine Learning Approache. Mol Pharmacol 2006; 71:158-68. [PMID: 17003167 DOI: 10.1124/mol.106.027623] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Pregnane X receptor (PXR) regulates drug metabolism and is involved in drug-drug interactions. Prediction of PXR activators is important for evaluating drug metabolism and toxicity. Computational pharmacophore and quantitative structure-activity relationship models have been developed for predicting PXR activators. Because of the structural diversity of PXR activators, more efforts are needed for exploring methods applicable to a broader spectrum of compounds. We explored three machine learning methods (MLMs) for predicting PXR activators, which were trained and tested by using significantly higher number of compounds, 128 PXR activators (98 human) and 77 PXR non-activators, than those of previous studies. The recursive feature-selection method was used to select molecular descriptors relevant to PXR activator prediction, which are consistent with conclusions from other computational and structural studies. In a 10-fold cross-validation test, our MLM systems correctly predicted 81.2 to 84.0% of PXR activators, 80.8 to 85.0% of hPXR activators, 61.2 to 70.3% of PXR nonactivators, and 67.7 to 73.6% of hPXR nonactivators. Our systems also correctly predicted 73.3 to 86.7% of 15 newly published hPXR activators. MLMs seem to be useful for predicting PXR activators and for providing clues to physicochemical features of PXR activation.
Collapse
Affiliation(s)
- C Y Ung
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543
| | | | | | | |
Collapse
|
98
|
|
99
|
Li H, Yap CW, Xue Y, Li ZR, Ung CY, Han LY, Chen YZ. Statistical learning approach for predicting specific pharmacodynamic, pharmacokinetic, or toxicological properties of pharmaceutical agents. Drug Dev Res 2005. [DOI: 10.1002/ddr.20044] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|