76
|
Zhu F, Han LY, Chen X, Lin HH, Ong S, Xie B, Zhang HL, Chen YZ. Homology-free prediction of functional class of proteins and peptides by support vector machines. Curr Protein Pept Sci 2008; 9:70-95. [PMID: 18336324 DOI: 10.2174/138920308783565697] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Protein and peptide sequences contain clues for functional prediction. A challenge is to predict sequences that show low or no homology to proteins or peptides of known function. A machine learning method, support vector machines (SVM), has recently been explored for predicting functional class of proteins and peptides from sequence-derived properties irrespective of sequence similarity, which has shown impressive performance for predicting a wide range of protein and peptide classes including certain low- and non- homologous sequences. This method serves as a new and valuable addition to complement the extensively-used alignment-based, clustering-based, and structure-based functional prediction methods. This article evaluates the strategies, current progresses, reported prediction performances, available software tools, and underlying difficulties in using SVM for predicting the functional class of proteins and peptides.
Collapse
|
77
|
Yap CW, Li H, Ji ZL, Chen YZ. Regression methods for developing QSAR and QSPR models to predict compounds of specific pharmacodynamic, pharmacokinetic and toxicological properties. Mini Rev Med Chem 2008; 7:1097-107. [PMID: 18045213 DOI: 10.2174/138955707782331696] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) models have been extensively used for predicting compounds of specific pharmacodynamic, pharmacokinetic, or toxicological property from structure-derived physicochemical and structural features. These models can be developed by using various regression methods including conventional approaches (multiple linear regression and partial least squares) and more recently explored genetic (genetic function approximation) and machine learning (k-nearest neighbour, neural networks, and support vector regression) approaches. This article describes the algorithms of these methods, evaluates their advantages and disadvantages, and discusses the application potential of the recently explored methods. Freely available online and commercial software for these regression methods and the areas of their applications are also presented.
Collapse
|
78
|
Han LY, Ma XH, Lin HH, Jia J, Zhu F, Xue Y, Li ZR, Cao ZW, Ji ZL, Chen YZ. A support vector machines approach for virtual screening of active compounds of single and multiple mechanisms from large libraries at an improved hit-rate and enrichment factor. J Mol Graph Model 2007; 26:1276-86. [PMID: 18218332 DOI: 10.1016/j.jmgm.2007.12.002] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2007] [Revised: 12/05/2007] [Accepted: 12/05/2007] [Indexed: 01/04/2023]
Abstract
Support vector machines (SVM) and other machine-learning (ML) methods have been explored as ligand-based virtual screening (VS) tools for facilitating lead discovery. While exhibiting good hit selection performance, in screening large compound libraries, these methods tend to produce lower hit-rate than those of the best performing VS tools, partly because their training-sets contain limited spectrum of inactive compounds. We tested whether the performance of SVM can be improved by using training-sets of diverse inactive compounds. In retrospective database screening of active compounds of single mechanism (HIV protease inhibitors, DHFR inhibitors, dopamine antagonists) and multiple mechanisms (CNS active agents) from large libraries of 2.986 million compounds, the yields, hit-rates, and enrichment factors of our SVM models are 52.4-78.0%, 4.7-73.8%, and 214-10,543, respectively, compared to those of 62-95%, 0.65-35%, and 20-1200 by structure-based VS and 55-81%, 0.2-0.7%, and 110-795 by other ligand-based VS tools in screening libraries of >or=1 million compounds. The hit-rates are comparable and the enrichment factors are substantially better than the best results of other VS tools. 24.3-87.6% of the predicted hits are outside the known hit families. SVM appears to be potentially useful for facilitating lead discovery in VS of large compound libraries.
Collapse
|
79
|
Li H, Yap CW, Ung CY, Xue Y, Li ZR, Han LY, Lin HH, Chen YZ. Machine learning approaches for predicting compounds that interact with therapeutic and ADMET related proteins. J Pharm Sci 2007; 96:2838-60. [PMID: 17786989 DOI: 10.1002/jps.20985] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Computational methods for predicting compounds of specific pharmacodynamic and ADMET (absorption, distribution, metabolism, excretion and toxicity) property are useful for facilitating drug discovery and evaluation. Recently, machine learning methods such as neural networks and support vector machines have been explored for predicting inhibitors, antagonists, blockers, agonists, activators and substrates of proteins related to specific therapeutic and ADMET property. These methods are particularly useful for compounds of diverse structures to complement QSAR methods, and for cases of unavailable receptor 3D structure to complement structure-based methods. A number of studies have demonstrated the potential of these methods for predicting such compounds as substrates of P-glycoprotein and cytochrome P450 CYP isoenzymes, inhibitors of protein kinases and CYP isoenzymes, and agonists of serotonin receptor and estrogen receptor. This article is intended to review the strategies, current progresses and underlying difficulties in using machine learning methods for predicting these protein binders and as potential virtual screening tools. Algorithms for proper representation of the structural and physicochemical properties of compounds are also evaluated.
Collapse
|
80
|
Chen X, Zheng CJ, Han LY, Xie B, Chen YZ. Trends in the exploration of therapeutic targets for the treatment of endocrine, metabolic and immune disorders. Endocr Metab Immune Disord Drug Targets 2007; 7:225-31. [PMID: 17897049 DOI: 10.2174/187153007781662576] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
A number of therapeutic targets have been explored for developing drugs in the treatment of endocrine, metabolic and immune disorders. Continuous efforts and increasing interest have been directed at the search of new targets. Data from the therapeutic target database at http://bidd.nus.edu.sg/group/cjttd/ttd.asp, shows that there are 26, 24, and 22 targets of marketed drugs for the treatment of these three classes of diseases, respectively. The number of targets of investigational agents has reached 98, 124, and 72, respectively. An analysis of these targets, particularly those of recently approved drugs and patented investigational agents, provides useful hint about the general trends of target exploration, with current focus on drug discovery and the difficulties encountered in developing drugs against these targets. Multiple profiles of these targets have been analyzed to probe the sequence, structural, physicochemical and systems-related features contributing to the successful exploration of a target against these diseases.
Collapse
|
81
|
Lin HH, Han LY, Yap CW, Xue Y, Liu XH, Zhu F, Chen YZ. Prediction of factor Xa inhibitors by machine learning methods. J Mol Graph Model 2007; 26:505-18. [PMID: 17418603 DOI: 10.1016/j.jmgm.2007.03.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2006] [Revised: 02/04/2007] [Accepted: 03/07/2007] [Indexed: 01/04/2023]
Abstract
Factor Xa (FXa) inhibitors have been explored as anticoagulants for treatment and prevention of thrombotic diseases. Molecular docking, pharmacophore, quantitative structure-activity relationships, and support vector machines (SVM) have been used for computer prediction of FXa inhibitors. These methods achieve promising prediction accuracies of 69-80% for FXa inhibitors and 85-99% for non-inhibitors. Prediction performance, particularly for inhibitors, may be further improved by exploring methods applicable to more diverse range of compounds and by using more appropriate set of molecular descriptors. We tested the capability of several machine learning methods (C4.5 decision tree, k-nearest neighbor, probabilistic neural network, and support vector machine) by using a much more diverse set of 1098 compounds (360 inhibitors and 738 non-inhibitors) than those in other studies. A feature selection method was used for selecting molecular descriptors appropriate for distinguishing FXa inhibitors and non-inhibitors. The prediction accuracies of these methods are 89.1-97.5% for FXa inhibitors and 92.3-98.1% for non-inhibitors. In particular, compared to other studies, support vector machine gives a substantially improved accuracy of 94.6% for FXa non-inhibitors and maintains a comparable accuracy of 98.1% for inhibitors, based-on a more rigorous test with more diverse range of compounds. Our study suggests that machine learning methods such as SVM are useful for facilitating the prediction of FXa inhibitors.
Collapse
|
82
|
Tao YM, Chen YZ, Liang YL, Xu MY, Xu XM. Cadmium-induced membrane lipid peroxidation and changes in antioxidant enzyme activities and peroxidase isoforms in Jerusalem artichoke seedlings. ZHI WU SHENG LI YU FEN ZI SHENG WU XUE XUE BAO = JOURNAL OF PLANT PHYSIOLOGY AND MOLECULAR BIOLOGY 2007; 33:301-8. [PMID: 17675753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Jerusalem artichoke (Helianthus tuberosus L.) seedlings cultured in sandy media were treated with Hoagland nutrition solution with different concentrations of Cd(NO(3))(2) from 0 to 400 micromol/L. After 50 days' treatment, Cd accumulation, activities of peroxidase (POD, EC 1.11.1.7), superoxide dismutase (SOD, EC 1.15.1.1), catalase (CAT, EC 1.11.1.6) were measured and electrophoretograms of POD isoenzymes were analyzed. The accumulation of Cd in seedlings increased from Cd 50-100 micromol/L, after which further increases in Cd concentration resulted in only small increases in accumulation of Cd in seedlings. MDA content was markedly higher than control values indicating the enhanced membrane lipid peroxidation in roots and leaves. POD activities in leaf and root extracts increased with an increase of Cd concentration from 0 to 50 and 100 micromol/L and then decreased with further increases to 200 and 400 micromol/L. Under moderate Cd level of 50-200 micromol/L, SOD activities in leaf and root extracts increased whereas with a higher Cd level of 400 micromol/L marked inhibitions in enzyme activities were observed. With increase in Cd concentration marked elevations in CAT activities in leaves and roots were observed. Results of electrophoresis show that the alteration of POD isoenzyme was noticeable to Cd and an additional POD isoenzyme LP10 appeared. It is suggested that POD isoenzyme of Jerusalem artichoke seedlings could be used as bioindicator for soil contamination by Cd.
Collapse
|
83
|
Kang L, Yap CW, Lim PFC, Chen YZ, Ho PC, Chan YW, Wong GP, Chan SY. Formulation development of transdermal dosage forms: Quantitative structure-activity relationship model for predicting activities of terpenes that enhance drug penetration through human skin. J Control Release 2007; 120:211-9. [PMID: 17582639 DOI: 10.1016/j.jconrel.2007.05.006] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2006] [Revised: 04/19/2007] [Accepted: 05/05/2007] [Indexed: 10/23/2022]
Abstract
Terpenes and terpenoids have been used as enhancers in transdermal formulations for facilitating penetration of drugs into human skin. Knowledge of the correlation between the human skin penetration effect (HSPE) and the physicochemical properties of these enhancers is important for facilitating the discovery and development of more enhancers. In this work, the HSPE of 49 terpenes and terpenoids were compared by the in vitro permeability coefficients of haloperidol (HP) through excised human skin. A first-order multiple linear regression (MLR) model was constructed to link the permeability coefficient of the drug to the lipophilicity, molecular weight, boiling point, the terpene type and the functional group of each enhancer. The Quantitative Structure-Activity Relationship (QSAR) model was derived from our data generated by using standardized experimental protocols, which include: HP in propylene glycol (PG) of 3 mg/ml as the donor solution containing 5% (w/v) of the respective terpene, the same composition and volume of receptor solution, similar human skin samples, in the same set of automated flow-through diffusion cells. The model provided a simple method to predict the enhancing effects of terpenes for drugs with physicochemical properties similar to HP. Our study suggested that an ideal terpene enhancer should possess at least one or combinations of the following properties: hydrophobic, in liquid form at room temperature, with an ester or aldehyde but not acid functional group, and is neither a triterpene nor tetraterpene. Possible mechanisms revealed by the QSAR model were discussed.
Collapse
|
84
|
Di HB, Yu SM, Weng XC, Laureys S, Yu D, Li JQ, Qin PM, Zhu YH, Zhang SZ, Chen YZ. Cerebral response to patient's own name in the vegetative and minimally conscious states. Neurology 2007; 68:895-9. [PMID: 17372124 DOI: 10.1212/01.wnl.0000258544.79024.d0] [Citation(s) in RCA: 211] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND A challenge in the management of severely brain-damaged patients with altered states of consciousness is the differential diagnosis between the vegetative state (VS) and the minimally conscious state (MCS), especially for the gray zone separating these clinical entities. OBJECTIVE To evaluate the differences in brain activation in response to presentation of the patient's own name spoken by a familiar voice (SON-FV) in patients with VS and MCS. METHODS By using fMRI, we prospectively studied residual cerebral activation to SON-FV in seven patients with VS and four with MCS. Behavioral evaluation was performed by means of standardized testing up to 3 months post-fMRI. RESULTS Two patients with VS failed to show any significant cerebral activation. Three patients with VS showed SON-FV induced activation within the primary auditory cortex. Finally, two patients with VS and all four patients with MCS not only showed activation in primary auditory cortex but also in hierarchically higher order associative temporal areas. These two patients with VS showing the most widespread activation subsequently showed clinical improvement to MCS observed 3 months after their fMRI scan. CONCLUSION The cerebral responses to patient's own name spoken by a familiar voice as measured by fMRI might be a useful tool to preclinically distinguish minimally conscious state-like cognitive processing in some patients behaviorally classified as vegetative.
Collapse
|
85
|
Yap CW, Xue Y, Li ZR, Chen YZ. Application of support vector machines to in silico prediction of cytochrome p450 enzyme substrates and inhibitors. Curr Top Med Chem 2007; 6:1593-607. [PMID: 16918471 DOI: 10.2174/156802606778108942] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Cytochrome P450 enzymes are responsible for phase I metabolism of the majority of drugs and xenobiotics. Identification of the substrates and inhibitors of these enzymes is important for the analysis of drug metabolism, prediction of drug-drug interactions and drug toxicity, and the design of drugs that modulate cytochrome P450 mediated metabolism. The substrates and inhibitors of these enzymes are structurally diverse. It is thus desirable to explore methods capable of predicting compounds of diverse structures without over-fitting. Support vector machine is an attractive method with these qualities, which has been employed for predicting the substrates and inhibitors of several cytochrome P450 isoenzymes as well as compounds of various other pharmacodynamic, pharmacokinetic, and toxicological properties. This article introduces the methodology, evaluates the performance, and discusses the underlying difficulties and future prospects of the application of support vector machines to in silico prediction of cytochrome P450 substrates and inhibitors.
Collapse
|
86
|
Ung CY, Li H, Kong CY, Wang JF, Chen YZ. Usefulness of traditionally defined herbal properties for distinguishing prescriptions of traditional Chinese medicine from non-prescription recipes. JOURNAL OF ETHNOPHARMACOLOGY 2007; 109:21-8. [PMID: 16884871 DOI: 10.1016/j.jep.2006.06.007] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2006] [Revised: 05/31/2006] [Accepted: 06/14/2006] [Indexed: 05/11/2023]
Abstract
Traditional Chinese medicine (TCM) has been widely practiced and is considered as an attractive to conventional medicine. Multi-herb recipes have been routinely used in TCM. These have been formulated by using TCM-defined herbal properties (TCM-HPs), the scientific basis of which is unclear. The usefulness of TCM-HPs was evaluated by analyzing the distribution pattern of TCM-HPs of the constituent herbs in 1161 classical TCM prescriptions, which shows patterns of multi-herb correlation. Two artificial intelligence (AI) methods were used to examine whether TCM-HPs are capable of distinguishing TCM prescriptions from non-TCM recipes. Two AI systems were trained and tested by using 1161 TCM prescriptions, 11,202 non-TCM recipes, and two separate evaluation methods. These systems correctly classified 83.1-97.3% of the TCM prescriptions, 90.8-92.3% of the non-TCM recipes. These results suggest that TCM-HPs are capable of separating TCM prescriptions from non-TCM recipes, which are useful for formulating TCM prescriptions and consistent with the expected correlation between TCM-HPs and the physicochemical properties of herbal ingredients responsible for producing the collective pharmacological and other effects of specific TCM prescriptions.
Collapse
|
87
|
Chen X, Li H, Yap CW, Ung CY, Jiang L, Cao ZW, Li YX, Chen YZ. Computer prediction of cardiovascular and hematological agents by statistical learning methods. Cardiovasc Hematol Agents Med Chem 2007; 5:11-9. [PMID: 17266544 DOI: 10.2174/187152507779315787] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Computational methods have been explored for predicting agents that produce therapeutic or adverse effects in cardiovascular and hematological systems. The quantitative structure-activity relationship (QSAR) method is the first statistical learning methods successfully used for predicting various classes of cardiovascular and hematological agents. In recent years, more sophisticated statistical learning methods have been explored for predicting cardiovascular and hematological agents particularly those of diverse structures that might not be straightforwardly modelled by single QSAR models. These methods include partial least squares, multiple linear regressions, linear discriminant analysis, k-nearest neighbour, artificial neural networks and support vector machines. Their application potential has been exhibited in the prediction of various classes of cardiovascular and hematological agents including 1, 4-dihydropyridine calcium channel antagonists, angiotensin converting enzyme inhibitors, thrombin inhibitors, AchE inhibitors, HERG potassium channel inhibitors and blockers, potassium channel openers, platelet aggregation inhibitors, protein kinase inhibitors, dopamine antagonists and torsade de pointes causing agents. This article reviews the strategies, current progresses and problems in using statistical learning methods for predicting cardiovascular and hematological agents. It also evaluates algorithms for properly representing and extracting the structural and physicochemical properties of compounds relevant to the prediction of cardiovascular and hematological agents.
Collapse
|
88
|
Xie B, Zheng CJ, Han LY, Ong S, Cui J, Zhang HL, Jiang L, Chen X, Chen YZ. PharmGED: Pharmacogenetic Effect Database. Clin Pharmacol Ther 2007; 81:29. [PMID: 17185995 DOI: 10.1038/sj.clpt.6100008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
89
|
Li ZR, Han LY, Xue Y, Yap CW, Li H, Jiang L, Chen YZ. MODEL—molecular descriptor lab: A web-based server for computing structural and physicochemical features of compounds. Biotechnol Bioeng 2007; 97:389-96. [PMID: 17013940 DOI: 10.1002/bit.21214] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Molecular descriptors represent structural and physicochemical features of compounds. They have been extensively used for developing statistical models, such as quantitative structure activity relationship (QSAR) and artificial neural networks (NN), for computer prediction of the pharmacodynamic, pharmacokinetic, or toxicological properties of compounds from their structure. While computer programs have been developed for computing molecular descriptors, there is a lack of a freely accessible one. We have developed a web-based server, MODEL (Molecular Descriptor Lab), for computing a comprehensive set of 3,778 molecular descriptors, which is significantly more than the approximately 1,600 molecular descriptors computed by other software. Our computational algorithms have been extensively tested and the computed molecular descriptors have been used in a number of published works of statistical models for predicting variety of pharmacodynamic, pharmacokinetic, and toxicological properties of compounds. Several testing studies on the computed molecular descriptors are discussed. MODEL is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/model/model.cgi free of charge for academic use.
Collapse
|
90
|
Zheng CJ, Han LY, Yap CW, Ji ZL, Cao ZW, Chen YZ. Therapeutic targets: progress of their exploration and investigation of their characteristics. Pharmacol Rev 2006; 58:259-79. [PMID: 16714488 DOI: 10.1124/pr.58.2.4] [Citation(s) in RCA: 132] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Modern drug discovery is primarily based on the search and subsequent testing of drug candidates acting on a preselected therapeutic target. Progress in genomics, protein structure, proteomics, and disease mechanisms has led to a growing interest in and effort for finding new targets and more effective exploration of existing targets. The number of reported targets of marketed and investigational drugs has significantly increased in the past 8 years. There are 1535 targets collected in the therapeutic target database compared with approximately 500 targets reported in a 1996 review. Knowledge of these targets is helpful for molecular dissection of the mechanism of action of drugs and for predicting features that guide new drug design and the search for new targets. This article summarizes the progress of target exploration and investigates the characteristics of the currently explored targets to analyze their sequence, structure, family representation, pathway association, tissue distribution, and genome location features for finding clues useful for searching for new targets. Possible "rules" to guide the search for druggable proteins and the feasibility of using a statistical learning method for predicting druggable proteins directly from their sequences are discussed.
Collapse
|
91
|
Lin HH, Han LY, Zhang HL, Zheng CJ, Xie B, Cao ZW, Chen YZ. Prediction of the functional class of metal-binding proteins from sequence derived physicochemical properties by support vector machine approach. BMC Bioinformatics 2006; 7 Suppl 5:S13. [PMID: 17254297 PMCID: PMC1764469 DOI: 10.1186/1471-2105-7-s5-s13] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Metal-binding proteins play important roles in structural stability, signaling, regulation, transport, immune response, metabolism control, and metal homeostasis. Because of their functional and sequence diversity, it is desirable to explore additional methods for predicting metal-binding proteins irrespective of sequence similarity. This work explores support vector machines (SVM) as such a method. SVM prediction systems were developed by using 53,333 metal-binding and 147,347 non-metal-binding proteins, and evaluated by an independent set of 31,448 metal-binding and 79,051 non-metal-binding proteins. The computed prediction accuracy is 86.3%, 81.6%, 83.5%, 94.0%, 81.2%, 85.4%, 77.6%, 90.4%, 90.9%, 74.9% and 78.1% for calcium-binding, cobalt-binding, copper-binding, iron-binding, magnesium-binding, manganese-binding, nickel-binding, potassium-binding, sodium-binding, zinc-binding, and all metal-binding proteins respectively. The accuracy for the non-member proteins of each class is 88.2%, 99.9%, 98.1%, 91.4%, 87.9%, 94.5%, 99.2%, 99.9%, 99.9%, 98.0%, and 88.0% respectively. Comparable accuracies were obtained by using a different SVM kernel function. Our method predicts 67% of the 87 metal-binding proteins non-homologous to any protein in the Swissprot database and 85.3% of the 333 proteins of known metal-binding domains as metal-binding. These suggest the usefulness of SVM for facilitating the prediction of metal-binding proteins. Our software can be accessed at the SVMProt server http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi.
Collapse
|
92
|
Yao LX, Wu ZC, Ji ZL, Chen YZ, Chen X. Internet resources related to drug action and human response: a review. ACTA ACUST UNITED AC 2006; 5:131-9. [PMID: 16922594 DOI: 10.2165/00822942-200605030-00001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
It has been demonstrated that numerous proteins interact with drugs or their metabolites. Knowledge of these proteins is necessary to understand the mechanisms of drug action and human response. Progress in modern genetics, molecular biology, biochemistry and pharmacology is generating a comprehensive mechanistic understanding of drug-target interaction on the molecular level. This is valuable for researchers and pharmaceutical companies in their efforts to improve the efficacy of existing drugs and to discover new ones. Most recently, the integration of a systems biology approach into drug discovery processes calls for more holistic knowledge and easily accessible resources of the proteins that are important in drug action and human response. We have reviewed many publicly accessible internet resources of these proteins, according to their roles in drug action and human response, such as therapeutic effect, adverse reaction, absorption, distribution, metabolism and excretion.
Collapse
|
93
|
Bassuk AG, Chen YZ, Batish SD, Nagan N, Opal P, Chance PF, Bennett CL. In cis autosomal dominant mutation of Senataxin associated with tremor/ataxia syndrome. Neurogenetics 2006; 8:45-9. [PMID: 17096168 DOI: 10.1007/s10048-006-0067-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2006] [Accepted: 09/12/2006] [Indexed: 10/23/2022]
Abstract
Senataxin mutations are the molecular basis of two distinct syndromes: (1) ataxia oculomotor apraxia type 2 (AOA2) and (2) juvenile amyotrophic lateral sclerosis 4 (ALS4). The authors describe clinical and molecular genetic studies of mother and daughter who display symptoms of cerebellar ataxia/atrophy, oculomotor defects, and tremor. Both patients share Senataxin mutations N603D and Q653K in cis (N603D-Q653K), adjacent to an N-terminal domain thought to function in protein-protein interaction. The N-terminal and helicase domains appear to harbor missense mutation clusters associated with AOA2 and ALS4. Working synergistically, the N603D-Q653K mutations may confer a partial dominant negative effect, acting on the senataxin N-terminal, further expanding the phenotypic spectrum associated with Senataxin mutations.
Collapse
|
94
|
Chen X, Zhou H, Liu YB, Wang JF, Li H, Ung CY, Han LY, Cao ZW, Chen YZ. Database of traditional Chinese medicine and its application to studies of mechanism and to prescription validation. Br J Pharmacol 2006; 149:1092-103. [PMID: 17088869 PMCID: PMC2014641 DOI: 10.1038/sj.bjp.0706945] [Citation(s) in RCA: 129] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
BACKGROUND AND PURPOSE Traditional Chinese Medicine (TCM) is widely practised and is viewed as an attractive alternative to conventional medicine. Quantitative information about TCM prescriptions, constituent herbs and herbal ingredients is necessary for studying and exploring TCM. EXPERIMENTAL APPROACH We manually collected information on TCM in books and other printed sources in Medline. The Traditional Chinese Medicine Information Database TCM-ID, at http://tcm.cz3.nus.edu.sg/group/tcm-id/tcmid.asp, was introduced for providing comprehensive information about all aspects of TCM including prescriptions, constituent herbs, herbal ingredients, molecular structure and functional properties of active ingredients, therapeutic and side effects, clinical indication and application and related matters. RESULTS TCM-ID currently contains information for 1,588 prescriptions, 1,313 herbs, 5,669 herbal ingredients, and the 3D structure of 3,725 herbal ingredients. The value of the data in TCM-ID was illustrated by using some of the data for an in-silico study of molecular mechanism of the therapeutic effects of herbal ingredients and for developing a computer program to validate TCM multi-herb preparations. CONCLUSIONS AND IMPLICATIONS The development of systems biology has led to a new design principle for therapeutic intervention strategy, the concept of 'magic shrapnel' (rather than the 'magic bullet'), involving many drugs against multiple targets, administered in a single treatment. TCM offers an extensive source of examples of this concept in which several active ingredients in one prescription are aimed at numerous targets and work together to provide therapeutic benefit. The database and its mining applications described here represent early efforts toward exploring TCM for new theories in drug discovery.
Collapse
|
95
|
Xue Y, Li H, Ung CY, Yap CW, Chen YZ. Classification of a diverse set of Tetrahymena pyriformis toxicity chemical compounds from molecular descriptors by statistical learning methods. Chem Res Toxicol 2006; 19:1030-9. [PMID: 16918241 DOI: 10.1021/tx0600550] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Toxicity of various compounds has been measured in many studies by their toxic effects against Tetrahymena pyriformis. Efforts have also been made to use computational quantitative structure-activity relationship (QSAR) and statistical learning methods (SLMs) for predicting Tetrahymena pyriformis toxicity (TPT) at impressive accuracies. Because of the diversity of compounds and toxicity mechanisms, it is desirable to explore additional methods and to examine if these methods are applicable to more diverse sets of compounds. We tested several SLMs (logistic regression, C4.5 decision tree, k-nearest neighbor, probabilistic neural network, support vector machines) for their capability in predicting TPT by using 1129 compounds (841 TPT and 288 non-TPT agents) which are more diverse than those in other studies. A feature selection method was used for improving prediction performance and selecting molecular descriptors responsible for distinguishing TPT and non-TPT agents. The prediction accuracies are 86.9% approximately 94.2% for TPT and 71.2% approximately 87.5% for non-TPT agents based on 5-fold cross-validation studies, which are comparable to some of earlier studies despite the use of more diverse sets of compounds. The selected molecular descriptors are consistent with those used in other studies and experimental findings. These suggest that SLMs are useful for predicting TPT potential of diverse sets of compounds and for characterizing the molecular descriptors associated with TPT.
Collapse
|
96
|
Li H, Ung CY, Yap CW, Xue Y, Li ZR, Chen YZ. Prediction of estrogen receptor agonists and characterization of associated molecular descriptors by statistical learning methods. J Mol Graph Model 2006; 25:313-23. [PMID: 16497524 DOI: 10.1016/j.jmgm.2006.01.007] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2005] [Revised: 12/21/2005] [Accepted: 01/19/2006] [Indexed: 01/04/2023]
Abstract
Specific estrogen receptor (ER) agonists have been used for hormone replacement therapy, contraception, osteoporosis prevention, and prostate cancer treatment. Some ER agonists and partial-agonists induce cancer and endocrine function disruption. Methods for predicting ER agonists are useful for facilitating drug discovery and chemical safety evaluation. Structure-activity relationships and rule-based decision forest models have been derived for predicting ER binders at impressive accuracies of 87.1-97.6% for ER binders and 80.2-96.0% for ER non-binders. However, these are not designed for identifying ER agonists and they were developed from a subset of known ER binders. This work explored several statistical learning methods (support vector machines, k-nearest neighbor, probabilistic neural network and C4.5 decision tree) for predicting ER agonists from comprehensive set of known ER agonists and other compounds. The corresponding prediction systems were developed and tested by using 243 ER agonists and 463 ER non-agonists, respectively, which are significantly larger in number and structural diversity than those in previous studies. A feature selection method was used for selecting molecular descriptors responsible for distinguishing ER agonists from non-agonists, some of which are consistent with those used in other studies and the findings from X-ray crystallography data. The prediction accuracies of these methods are comparable to those of earlier studies despite the use of significantly more diverse range of compounds. SVM gives the best accuracy of 88.9% for ER agonists and 98.1% for non-agonists. Our study suggests that statistical learning methods such as SVM are potentially useful for facilitating the prediction of ER agonists and for characterizing the molecular descriptors associated with ER agonists.
Collapse
|
97
|
Ung CY, Li H, Yap CW, Chen YZ. In Silico Prediction of Pregnane X Receptor Activators by Machine Learning Approache. Mol Pharmacol 2006; 71:158-68. [PMID: 17003167 DOI: 10.1124/mol.106.027623] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Pregnane X receptor (PXR) regulates drug metabolism and is involved in drug-drug interactions. Prediction of PXR activators is important for evaluating drug metabolism and toxicity. Computational pharmacophore and quantitative structure-activity relationship models have been developed for predicting PXR activators. Because of the structural diversity of PXR activators, more efforts are needed for exploring methods applicable to a broader spectrum of compounds. We explored three machine learning methods (MLMs) for predicting PXR activators, which were trained and tested by using significantly higher number of compounds, 128 PXR activators (98 human) and 77 PXR non-activators, than those of previous studies. The recursive feature-selection method was used to select molecular descriptors relevant to PXR activator prediction, which are consistent with conclusions from other computational and structural studies. In a 10-fold cross-validation test, our MLM systems correctly predicted 81.2 to 84.0% of PXR activators, 80.8 to 85.0% of hPXR activators, 61.2 to 70.3% of PXR nonactivators, and 67.7 to 73.6% of hPXR nonactivators. Our systems also correctly predicted 73.3 to 86.7% of 15 newly published hPXR activators. MLMs seem to be useful for predicting PXR activators and for providing clues to physicochemical features of PXR activation.
Collapse
|
98
|
Cui J, Han LY, Lin HH, Zhang HL, Tang ZQ, Zheng CJ, Cao ZW, Chen YZ. Prediction of MHC-binding peptides of flexible lengths from sequence-derived structural and physicochemical properties. Mol Immunol 2006; 44:866-77. [PMID: 16806474 DOI: 10.1016/j.molimm.2006.04.001] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2006] [Revised: 04/05/2006] [Accepted: 04/06/2006] [Indexed: 11/22/2022]
Abstract
Peptide binding to MHC is critical for antigen recognition by T-cells. To facilitate vaccine design, computational methods have been developed for predicting MHC-binding peptides, which achieve impressive prediction accuracies of 70-90% for binders and 40-80% for non-binders. These methods have been developed for peptides of fixed lengths, for a limited number of alleles, trained from small number of non-binders, and in some cases based straightforwardly on sequence. These limit prediction coverage and accuracy particularly for non-binders. It is desirable to explore methods that predict binders of flexible lengths from sequence-derived physicochemical properties and trained from diverse sets of non-binders. This work explores support vector machines (SVM) as such a method for developing prediction systems of 18 MHC class I and 12 class II alleles by using 4208-3252 binders and 234,333-168,793 non-binders, and evaluated by an independent set of 545-476 binders and 110,564-84,430 non-binders. Binder accuracies are 86-99% for 25 and 70-80% for 5 alleles, non-binder accuracies are 96-99% for 30 alleles. Binder accuracies are comparable and non-binder accuracies substantially improved against other results. Our method correctly predicts 73.3% of the 15 newly-published epitopes in the last 4 months of 2005. Of the 251 recently-published HLA-A*0201 non-epitopes predicted as binders by other methods, 63 are predicted as binders by our method. Screening of HIV-1 genome shows that, compared to other methods, a comparable percentage (75-100%) of its known epitopes is correctly predicted, while a lower percentage (0.01-5% for 24 and 5-8% for 6 alleles) of its constituent peptides are predicted as binders. Our software can be accessed at .
Collapse
|
99
|
Yap CW, Xue Y, Li H, Li ZR, Ung CY, Han LY, Zheng CJ, Cao ZW, Chen YZ. Prediction of compounds with specific pharmacodynamic, pharmacokinetic or toxicological property by statistical learning methods. Mini Rev Med Chem 2006; 6:449-59. [PMID: 16613581 DOI: 10.2174/138955706776361501] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Computational methods for predicting compounds of specific pharmacodynamic, pharmacokinetic, or toxicological property are useful for facilitating drug discovery and drug safety evaluation. The quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) methods are the most successfully used statistical learning methods for predicting compounds of specific property. More recently, other statistical learning methods such as neural networks and support vector machines have been explored for predicting compounds of higher structural diversity than those covered by QSAR and QSPR. These methods have shown promising potential in a number of studies. This article is intended to review the strategies, current progresses and underlying difficulties in using statistical learning methods for predicting compounds of specific property. It also evaluates algorithms commonly used for representing structural and physicochemical properties of compounds.
Collapse
|
100
|
Yap CW, Chen YZ. Prediction of cytochrome P450 3A4, 2D6, and 2C9 inhibitors and substrates by using support vector machines. J Chem Inf Model 2006; 45:982-92. [PMID: 16045292 DOI: 10.1021/ci0500536] [Citation(s) in RCA: 117] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Statistical learning methods have been used in developing filters for predicting inhibitors of two P450 isoenzymes, CYP3A4 and CYP2D6. This work explores the use of different statistical learning methods for predicting inhibitors of these enzymes and an additional P450 enzyme, CYP2C9, and the substrates of the three P450 isoenzymes. Two consensus support vector machine (CSVM) methods, "positive majority" (PM-CSVM) and "positive probability" (PP-CSVM), were used in this work. These methods were first tested for the prediction of inhibitors of CYP3A4 and CYP2D6 by using a significantly higher number of inhibitors and noninhibitors than that used in earlier studies. They were then applied to the prediction of inhibitors of CYP2C9 and substrates of the three enzymes. Both methods predict inhibitors of CYP3A4 and CYP2D6 at a similar level of accuracy as those of earlier studies. For classification of inhibitors of CYP2C9, the best CSVM method gives an accuracy of 88.9% for inhibitors and 96.3% for noninhibitors. The accuracies for classification of substrates and nonsubstrates of CYP3A4, CYP2D6, and CYP2C9 are 98.2 and 90.9%, 96.6 and 94.4%, and 85.7 and 98.8%, respectively. Both CSVM methods are potentially useful as filters for predicting inhibitors and substrates of P450 isoenzymes. These methods generally give better accuracies than single SVM classification systems, and the performance of the PP-CSVM method is slightly better than that of the PM-CSVM method.
Collapse
|