1
|
Yang B, Bao W, Chen B. Disease-Ligand Identification Based on Flexible Neural Tree. Front Microbiol 2022; 13:912145. [PMID: 35733966 PMCID: PMC9207514 DOI: 10.3389/fmicb.2022.912145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 05/06/2022] [Indexed: 12/04/2022] Open
Abstract
In order to screen the disease-related compounds of a traditional Chinese medicine prescription in network pharmacology research accurately, a new virtual screening method based on flexible neural tree (FNT) model, hybrid evolutionary method and negative sample selection algorithm is proposed. A novel hybrid evolutionary algorithm based on the Grammar-guided genetic programming and salp swarm algorithm is proposed to infer the optimal FNT. According to hypertension, diabetes, and Corona Virus Disease 2019, disease-related compounds are collected from the up-to-date literatures. The unrelated compounds are chosen by negative sample selection algorithm. ECFP6, MACCS, Macrocycle, and RDKit are utilized to numerically characterize the chemical structure of each compound collected, respectively. The experiment results show that our proposed method performs better than classical classifiers [Support Vector Machine (SVM), random forest (RF), AdaBoost, decision tree (DT), Gradient Boosting Decision Tree (GBDT), KNN, logic regression (LR), and Naive Bayes (NB)], up-to-date classifier (gcForest), and deep learning method (forgeNet) in terms of AUC, ROC, TPR, FPR, Precision, Specificity, and F1. MACCS method is suitable for the maximum number of classifiers. All methods perform poorly with ECFP6 molecular descriptor.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
| | - Wenzheng Bao
- School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou, China
- *Correspondence: Wenzheng Bao,
| | | |
Collapse
|
2
|
Kell DB. The transporter-mediated cellular uptake of pharmaceutical drugs is based on their metabolite-likeness and not on their bulk biophysical properties: Towards a systems pharmacology. ACTA ACUST UNITED AC 2015. [DOI: 10.1016/j.pisc.2015.06.004] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
3
|
Kell DB. Finding novel pharmaceuticals in the systems biology era using multiple effective drug targets, phenotypic screening and knowledge of transporters: where drug discovery went wrong and how to fix it. FEBS J 2013; 280:5957-80. [PMID: 23552054 DOI: 10.1111/febs.12268] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2013] [Revised: 03/20/2013] [Accepted: 03/26/2013] [Indexed: 12/16/2022]
Abstract
Despite the sequencing of the human genome, the rate of innovative and successful drug discovery in the pharmaceutical industry has continued to decrease. Leaving aside regulatory matters, the fundamental and interlinked intellectual issues proposed to be largely responsible for this are: (a) the move from 'function-first' to 'target-first' methods of screening and drug discovery; (b) the belief that successful drugs should and do interact solely with single, individual targets, despite natural evolution's selection for biochemical networks that are robust to individual parameter changes; (c) an over-reliance on the rule-of-5 to constrain biophysical and chemical properties of drug libraries; (d) the general abandoning of natural products that do not obey the rule-of-5; (e) an incorrect belief that drugs diffuse passively into (and presumably out of) cells across the bilayers portions of membranes, according to their lipophilicity; (f) a widespread failure to recognize the overwhelmingly important role of proteinaceous transporters, as well as their expression profiles, in determining drug distribution in and between different tissues and individual patients; and (g) the general failure to use engineering principles to model biology in parallel with performing 'wet' experiments, such that 'what if?' experiments can be performed in silico to assess the likely success of any strategy. These facts/ideas are illustrated with a reasonably extensive literature review. Success in turning round drug discovery consequently requires: (a) decent systems biology models of human biochemical networks; (b) the use of these (iteratively with experiments) to model how drugs need to interact with multiple targets to have substantive effects on the phenotype; (c) the adoption of polypharmacology and/or cocktails of drugs as a desirable goal in itself; (d) the incorporation of drug transporters into systems biology models, en route to full and multiscale systems biology models that incorporate drug absorption, distribution, metabolism and excretion; (e) a return to 'function-first' or phenotypic screening; and (f) novel methods for inferring modes of action by measuring the properties on system variables at all levels of the 'omes. Such a strategy offers the opportunity of achieving a state where we can hope to predict biological processes and the effect of pharmaceutical agents upon them. Consequently, this should both lower attrition rates and raise the rates of discovery of effective drugs substantially.
Collapse
Affiliation(s)
- Douglas B Kell
- School of Chemistry, The University of Manchester, UK; Manchester Institute of Biotechnology, The University of Manchester, UK
| |
Collapse
|
4
|
Pu M, Hayashi T, Cottam H, Mulvaney J, Arkin M, Corr M, Carson D, Messer K. Analysis of high-throughput screening assays using cluster enrichment. Stat Med 2012; 31:4175-89. [PMID: 22763983 DOI: 10.1002/sim.5455] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2011] [Accepted: 05/07/2012] [Indexed: 11/09/2022]
Abstract
In this paper, we describe the implementation and evaluation of a cluster-based enrichment strategy to call hits from a high-throughput screen using a typical cell-based assay of 160,000 chemical compounds. Our focus is on statistical properties of the prospective design choices throughout the analysis, including how to choose the number of clusters for optimal power, the choice of test statistic, the significance thresholds for clusters and the activity threshold for candidate hits, how to rank selected hits for carry-forward to the confirmation screen, and how to identify confirmed hits in a data-driven manner. Whereas previously the literature has focused on choice of test statistic or chemical descriptors, our studies suggest that cluster size is the more important design choice. We recommend clusters to be ranked by enrichment odds ratio, not by p-value. Our conceptually simple test statistic is seen to identify the same set of hits as more complex scoring methods proposed in the literature do. We prospectively confirm that such a cluster-based approach can outperform the naive top X approach and estimate that we improved confirmation rates by about 31.5% from 813 using the top X approach to 1187 using our cluster-based method.
Collapse
Affiliation(s)
- Minya Pu
- Biostatistics/Bioinformatics Shared Resources, Moores Cancer Center, University of California San Diego, La Jolla, CA 92093-0901, USA
| | | | | | | | | | | | | | | |
Collapse
|
5
|
Kwon YJ, Lee W, Genovesio A, Emans N. A high-content subtractive screen for selecting small molecules affecting internalization of GPCRs. ACTA ACUST UNITED AC 2011; 17:379-85. [PMID: 22086721 DOI: 10.1177/1087057111427347] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
G-protein-coupled receptors (GPCRs) are pivotal in cellular responses to the environment and are common drug targets. Identification of selective small molecules acting on single GPCRs is complicated by the shared machinery coupling signal transduction to physiology. Here, we demonstrate a high-content screen using a panel of GPCR assays to identify receptor selective molecules acting within the kinase/phosphatase inhibitor family. A collection of 88 kinase and phosphatase inhibitors was screened against seven agonist-induced GPCR internalization cell models as well as transferrin uptake in human embryonic kidney cells. Molecules acting on a single receptor were identified through excluding pan-specific compounds affecting housekeeping endocytosis or disrupting internalization of multiple receptors. We identified compounds acting on a sole GPCR from activities in a broad range of chemical structures that could not be easily sorted by conventional means. Selective analysis can therefore rapidly select compounds selectively affecting GPCR activity with specificity to one receptor class through high-content screening.
Collapse
|
6
|
Posner BA, Xi H, Mills JEJ. Enhanced HTS Hit Selection via a Local Hit Rate Analysis. J Chem Inf Model 2009; 49:2202-10. [DOI: 10.1021/ci900113d] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Bruce A. Posner
- PGRD Groton Laboratories, Pfizer Inc., Groton, Connecticut 06340, PGRD Computational Sciences CoE, Pfizer Inc., Cambridge, Massachusetts 02139, and PGRD Sandwich Laboratories, Pfizer Inc., 32 Ramsgate Road, Sandwich, Kent CT13 9NJ, Great Britain
| | - Hualin Xi
- PGRD Groton Laboratories, Pfizer Inc., Groton, Connecticut 06340, PGRD Computational Sciences CoE, Pfizer Inc., Cambridge, Massachusetts 02139, and PGRD Sandwich Laboratories, Pfizer Inc., 32 Ramsgate Road, Sandwich, Kent CT13 9NJ, Great Britain
| | - James E. J. Mills
- PGRD Groton Laboratories, Pfizer Inc., Groton, Connecticut 06340, PGRD Computational Sciences CoE, Pfizer Inc., Cambridge, Massachusetts 02139, and PGRD Sandwich Laboratories, Pfizer Inc., 32 Ramsgate Road, Sandwich, Kent CT13 9NJ, Great Britain
| |
Collapse
|
7
|
Klekota J, Roth FP. Chemical substructures that enrich for biological activity. Bioinformatics 2008; 24:2518-25. [PMID: 18784118 PMCID: PMC2732283 DOI: 10.1093/bioinformatics/btn479] [Citation(s) in RCA: 216] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2008] [Revised: 08/13/2008] [Accepted: 09/07/2008] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Certain chemical substructures are present in many drugs. This has led to the claim of 'privileged' substructures which are predisposed to bioactivity. Because bias in screening library construction could explain this phenomenon, the existence of privilege has been controversial. RESULTS Using diverse phenotypic assays, we defined bioactivity for multiple compound libraries. Many substructures were associated with bioactivity even after accounting for substructure prevalence in the library, thus validating the privileged substructure concept. Determinations of privilege were confirmed in independent assays and libraries. Our analysis also revealed 'underprivileged' substructures and 'conditional privilege'-rules relating combinations of substructure to bioactivity. Most previously reported substructures have been flat aromatic ring systems. Although we validated such substructures, we also identified three-dimensional privileged substructures. Most privileged substructures display a wide variety of substituents suggesting an entropic mechanism of privilege. Compounds containing privileged substructures had a doubled rate of bioactivity, suggesting practical consequences for pharmaceutical discovery.
Collapse
Affiliation(s)
- Justin Klekota
- Harvard University Graduate Biophysics Program, Harvard Medical School, 250 Longwood Avenue, Boston, MA 02115, USA
| | | |
Collapse
|
8
|
Zhang H, Ando HY, Chen L, Lee PH. On-the-Fly Selection of a Training Set for Aqueous Solubility Prediction. Mol Pharm 2007; 4:489-97. [PMID: 17628076 DOI: 10.1021/mp0700155] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Training sets are usually chosen so that they represent the database as a whole; random selection helps to maintain this integrity. In this study, the prediction of aqueous solubility was used as a specific example of using the individual molecule for which solubility is desired, the target molecule, as the basis for choosing a training set. Similarity of the training set to the target molecule rather than a random allocation was used as the selection criteria. The Tanimoto coefficients derived from Daylight's binary fingerprints were used as the molecular similarity selection tool. Prediction models derived from this type of customization will be designated as "on-the-fly local" models because a new model is generated for each target molecule which is necessarily local. Such models will be compared with "global" models which are derived from a one-time "preprocessed" partitioning of training and test sets which use fixed fitted parameters for each target molecule prediction. Although both fragment and molecular descriptors were examined, a minimum set of MOE (molecular operating environment) molecular descriptors were found to be more efficient and were use for both on-the-fly local and preprocessed global models. It was found that on-the-fly local predictions were more accurate (r2=0.87) than the preprocessed global predictions (r2=0.74) for the same test set. In addition, their precision was shown to increase as the degree of similarity increases. Correlation and distribution plots were used to visualize similarity cutoff groupings and their chemical structures. In summary, rapid "on-the-fly" similarity selection can enable the customization of a training set to each target molecule for which solubility is desired. In addition, the similarity information and the model's fitting statistics give the user criteria to judge the validity of the prediction since it is always possible that good prediction cannot be obtained because the database and the target molecule are too dissimilar. Although the rapid processing speed of binary fingerprints enable the "on-the-fly" real time prediction, slower but more feature rich similarity measures may improve follow-up predictions.
Collapse
Affiliation(s)
- Hongzhou Zhang
- Research Formulations and Computer-Assisted Drug Discovery, Pfizer Global Research & Development, Michigan Laboratories, 2800 Plymouth Road, Ann Arbor, Michigan 48105, USA
| | | | | | | |
Collapse
|
9
|
Bender A, Jenkins JL, Li Q, Adams SE, Cannon EO, Glen RC. Chapter 9 Molecular Similarity: Advances in Methods, Applications and Validations in Virtual Screening and QSAR. ACTA ACUST UNITED AC 2006; 2:141-168. [PMID: 32362803 PMCID: PMC7185533 DOI: 10.1016/s1574-1400(06)02009-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
This chapter discusses recent developments in some of the areas that exploit the molecular similarity principle, novel approaches to capture molecular properties by the use of novel descriptors, focuses on a crucial aspect of computational models-their validity, and discusses additional ways to examine data available, such as those from high-throughput screening (HTS) campaigns and to gain more knowledge from this data. The chapter also presents some of the recent applications of methods discussed focusing on the successes of virtual screening applications, database clustering and comparisons (such as drug- and in-house-likeness), and the recent large-scale validations of docking and scoring programs. While a great number of descriptors and modeling methods has been proposed until today, the recent trend toward proper model validation is very much appreciated. Although some of their limitations are surely because of underlying principles and limitations of fundamental concepts, others will certainly be eliminated in the future.
Collapse
Affiliation(s)
- Andreas Bender
- Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK.,Lead Discovery Center, Novartis Institutes for BioMedical Research Inc., 250 Massachusetts Ave., Cambridge, MA 02139, USA
| | - Jeremy L Jenkins
- Lead Discovery Center, Novartis Institutes for BioMedical Research Inc., 250 Massachusetts Ave., Cambridge, MA 02139, USA
| | - Qingliang Li
- College of Chemistry and Molecular Engineering, Center for Theoretical Biology, Peking University, Beijing 100871, China
| | - Sam E Adams
- Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| | - Edward O Cannon
- Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| | - Robert C Glen
- Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| |
Collapse
|
10
|
Harper G, Pickett SD. Methods for mining HTS data. Drug Discov Today 2006; 11:694-9. [PMID: 16846796 DOI: 10.1016/j.drudis.2006.06.006] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2006] [Revised: 05/08/2006] [Accepted: 06/09/2006] [Indexed: 01/23/2023]
Abstract
Data mining is a fast-growing field that is finding application across a wide range of industries. HTS is a crucial part of the drug discovery process at most large pharmaceutical companies. Accurate analysis of HTS data is, therefore, vital to drug discovery. Given the large quantity of data generated during an HTS, and the importance of analyzing those data effectively, it is unsurprising that data-mining techniques are now increasingly applied to HTS data analysis. Taking a broad view of both the HTS process and the data-mining process, we review recent literature that describes the application of data-mining techniques to HTS data.
Collapse
Affiliation(s)
- Gavin Harper
- GSK, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, United Kingdom.
| | | |
Collapse
|
11
|
Strachan RT, Ferrara G, Roth BL. Screening the receptorome: an efficient approach for drug discovery and target validation. Drug Discov Today 2006; 11:708-16. [PMID: 16846798 DOI: 10.1016/j.drudis.2006.06.012] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2006] [Revised: 06/02/2006] [Accepted: 06/16/2006] [Indexed: 11/18/2022]
Abstract
The receptorome, comprising at least 5% of the human genome, encodes receptors that mediate the physiological, pathological and therapeutic responses to a vast number of exogenous and endogenous ligands. Not surprisingly, the majority of approved medications target members of the receptorome. Several in silico and physical screening approaches have been devised to mine the receptorome efficiently for the discovery and validation of molecular targets for therapeutic drug discovery. Receptorome screening has also been used to discover, and thereby avoid, the molecular targets responsible for serious and unforeseen drug side effects.
Collapse
Affiliation(s)
- Ryan T Strachan
- Department of Biochemistry, Comprehensive Cancer Center and NIMH Psychoactive Drug Screening Program, Case Western Reserve University Medical School, Cleveland, OH 44106, USA
| | | | | |
Collapse
|
12
|
Guha R, Dutta D, Jurs PC, Chen T. Local Lazy Regression: Making Use of the Neighborhood to Improve QSAR Predictions. J Chem Inf Model 2006; 46:1836-47. [PMID: 16859315 DOI: 10.1021/ci060064e] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Traditional quantitative structure-activity relationship (QSAR) models aim to capture global structure-activity trends present in a data set. In many situations, there may be groups of molecules which exhibit a specific set of features which relate to their activity or inactivity. Such a group of features can be said to represent a local structure-activity relationship. Traditional QSAR models may not recognize such local relationships. In this work, we investigate the use of local lazy regression (LLR), which obtains a prediction for a query molecule using its local neighborhood, rather than considering the whole data set. This modeling approach is especially useful for very large data sets because no a priori model need be built. We applied the technique to three biological data sets. In the first case, the root-mean-square error (RMSE) for an external prediction set was 0.94 log units versus 0.92 log units for the global model. However, LLR was able to characterize a specific group of anomalous molecules with much better accuracy (0.64 log units versus 0.70 log units for the global model). For the second data set, the LLR technique resulted in a decrease in RMSE from 0.36 log units to 0.31 log units for the external prediction set. In the third case, we obtained an RMSE of 2.01 log units versus 2.16 log units for the global model. In all cases, LLR led to a few observations being poorly predicted compared to the global model. We present an analysis of why this was observed and possible improvements to the local regression approach.
Collapse
Affiliation(s)
- Rajarshi Guha
- Department of Chemistry, Pennsylvania State University, University Park, Pennsylvania 16802, USA.
| | | | | | | |
Collapse
|
13
|
Klekota J, Brauner E, Roth FP, Schreiber SL. Using High-Throughput Screening Data To Discriminate Compounds with Single-Target Effects from Those with Side Effects. J Chem Inf Model 2006; 46:1549-62. [PMID: 16859287 DOI: 10.1021/ci050495h] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The most desirable compound leads from high-throughput assays are those with novel biological activities resulting from their action on a single biological target. Valuable resources can be wasted on compound leads with significant 'side effects' on additional biological targets; therefore, technical refinements to identify compounds that primarily have effects resulting from a single target are needed. This study explores the use of multiple assays of a chemical library and a statistic based on entropy to identify lead compound classes that have patterns of assay activity resulting primarily from small molecule action on a single target. This statistic, called the coincidence score, discriminates with 88% accuracy compound classes known to act primarily on a single target from compound classes with significant side effects on nonhomologous targets. Furthermore, a significant number of the compound classes predicted to have primarily single-target effects contain known bioactive compounds. We also show that a compound's known biological target or mechanism of action can often be suggested by its pattern of activities in multiple assays.
Collapse
Affiliation(s)
- Justin Klekota
- Howard Hughes Medical Institute, 12 Oxford Street, Cambridge, Massachusetts 02138, Harvard Institute of Chemistry and Cell Biology, 250 Longwood Avenue, SGMB-604, Boston, Massachusetts 02115, USA.
| | | | | | | |
Collapse
|