1
|
Fogel GB, Lamers SL, Liu ES, Salemi M, McGrath MS. Identification of dual-tropic HIV-1 using evolved neural networks. Biosystems 2015; 137:12-9. [PMID: 26419858 PMCID: PMC4921197 DOI: 10.1016/j.biosystems.2015.09.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2015] [Revised: 09/24/2015] [Accepted: 09/26/2015] [Indexed: 02/07/2023]
Abstract
Blocking the binding of the envelope HIV-1 protein to immune cells is a popular concept for development of anti-HIV therapeutics. R5 HIV-1 binds CCR5, X4 HIV-1 binds CXCR4, and dual-tropic HIV-1 can bind either coreceptor for cellular entry. R5 viruses are associated with early infection and over time can evolve to X4 viruses that are associated with immune failure. Dual-tropic HIV-1 is less studied; however, it represents functional antigenic intermediates during the transition of R5 to X4 viruses. Viral tropism is linked partly to the HIV-1 envelope V3 domain, where the amino acid sequence helps dictate the receptor a particular virus will target; however, using V3 sequence information to identify dual-tropic HIV-1 isolates has remained difficult. Our goal in this study was to elucidate features of dual-tropic HIV-1 isolates that assist in the biological understanding of dual-tropism and develop an approach for their detection. Over 1559 HIV-1 subtype B sequences with known tropisms were analyzed. Each sequence was represented by 73 structural, biochemical and regional features. These features were provided to an evolved neural network classifier and evaluated using balanced and unbalanced data sets. The study resolved R5X4 viruses from R5 with an accuracy of 81.8% and from X4 with an accuracy of 78.8%. The approach also identified a set of V3 features (hydrophobicity, structural and polarity) that are associated with tropism transitions. The ability to distinguish R5X4 isolates will improve computational tropism decisions for R5 vs. X4 and assist in HIV-1 research and drug development efforts.
Collapse
Affiliation(s)
- Gary B Fogel
- Natural Selection, Inc., San Diego, CA 92121, United States
| | | | - Enoch S Liu
- Natural Selection, Inc., San Diego, CA 92121, United States
| | - Marco Salemi
- University of Florida, Department of Pathology and Laboratory Medicine, Gainesville, FL 32610, United States
| | - Michael S McGrath
- University of California at San Francisco, Department of Laboratory Medicine and The AIDS and Cancer Specimen Resource, San Francisco, CA 94143, United States
| |
Collapse
|
2
|
Benchmarking ligand-based virtual High-Throughput Screening with the PubChem database. Molecules 2013; 18:735-56. [PMID: 23299552 PMCID: PMC3759399 DOI: 10.3390/molecules18010735] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2012] [Revised: 10/11/2012] [Accepted: 12/17/2012] [Indexed: 01/04/2023] Open
Abstract
With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. These data sets provide the foundation for benchmarking a new cheminformatics framework BCL::ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed.
Collapse
|
3
|
Mueller R, Dawson ES, Niswender CM, Butkiewicz M, Hopkins CR, Weaver CD, Lindsley CW, Conn PJ, Meiler J. Iterative experimental and virtual high-throughput screening identifies metabotropic glutamate receptor subtype 4 positive allosteric modulators. J Mol Model 2012; 18:4437-46. [PMID: 22592386 DOI: 10.1007/s00894-012-1441-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2012] [Accepted: 04/18/2012] [Indexed: 11/26/2022]
Abstract
Activation of metabotropic glutamate receptor subtype 4 has been shown to be efficacious in rodent models of Parkinson's disease. Artificial neural networks were trained based on a recently reported high throughput screen which identified 434 positive allosteric modulators of metabotropic glutamate receptor subtype 4 out of a set of approximately 155,000 compounds. A jury system containing three artificial neural networks achieved a theoretical enrichment of 15.4 when selecting the top 2 % compounds of an independent test dataset. The model was used to screen an external commercial database of approximately 450,000 drug-like compounds. 1,100 predicted active small molecules were tested experimentally using two distinct assays of mGlu(4) activity. This experiment yielded 67 positive allosteric modulators of metabotropic glutamate receptor subtype 4 that confirmed in both experimental systems. Compared to the 0.3 % active compounds in the primary screen, this constituted an enrichment of 22 fold.
Collapse
Affiliation(s)
- Ralf Mueller
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37232-6600, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Andonie R, Fabry-Asztalos L, Abdul-Wahid CB, Abdul-Wahid S, Barker GI, Magill LC. Fuzzy ARTMAP prediction of biological activities for potential HIV-1 protease inhibitors using a small molecular data set. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:80-93. [PMID: 21071799 DOI: 10.1109/tcbb.2009.50] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Obtaining satisfactory results with neural networks depends on the availability of large data samples. The use of small training sets generally reduces performance. Most classical Quantitative Structure-Activity Relationship (QSAR) studies for a specific enzyme system have been performed on small data sets. We focus on the neuro-fuzzy prediction of biological activities of HIV-1 protease inhibitory compounds when inferring from small training sets. We propose two computational intelligence prediction techniques which are suitable for small training sets, at the expense of some computational overhead. Both techniques are based on the FAMR model. The FAMR is a Fuzzy ARTMAP (FAM) incremental learning system used for classification and probability estimation. During the learning phase, each sample pair is assigned a relevance factor proportional to the importance of that pair. The two proposed algorithms in this paper are: 1) The GA-FAMR algorithm, which is new, consists of two stages: a) During the first stage, we use a genetic algorithm (GA) to optimize the relevances assigned to the training data. This improves the generalization capability of the FAMR. b) In the second stage, we use the optimized relevances to train the FAMR. 2) The Ordered FAMR is derived from a known algorithm. Instead of optimizing relevances, it optimizes the order of data presentation using the algorithm of Dagher et al. In our experiments, we compare these two algorithms with an algorithm not based on the FAM, the FS-GA-FNN introduced in [4], [5]. We conclude that when inferring from small training sets, both techniques are efficient, in terms of generalization capability and execution time. The computational overhead introduced is compensated by better accuracy. Finally, the proposed techniques are used to predict the biological activities of newly designed potential HIV-1 protease inhibitors.
Collapse
Affiliation(s)
- Răzvan Andonie
- Computer Science Department, Central Washington University, 400 E. University Way, Ellensburg, WA 98926, USA.
| | | | | | | | | | | |
Collapse
|
5
|
Hecht D. Applications of machine learning and computational intelligence to drug discovery and development. Drug Dev Res 2010. [DOI: 10.1002/ddr.20402] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- David Hecht
- Southwestern College, Chula Vista, California
| |
Collapse
|
6
|
Reddy AS, Kumar S, Garg R. Hybrid-genetic algorithm based descriptor optimization and QSAR models for predicting the biological activity of Tipranavir analogs for HIV protease inhibition. J Mol Graph Model 2010; 28:852-62. [PMID: 20399695 PMCID: PMC2872997 DOI: 10.1016/j.jmgm.2010.03.005] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2009] [Revised: 03/04/2010] [Accepted: 03/14/2010] [Indexed: 11/16/2022]
Abstract
The prediction of biological activity of a chemical compound from its structural features plays an important role in drug design. In this paper, we discuss the quantitative structure activity relationship (QSAR) prediction models developed on a dataset of 170 HIV protease enzyme inhibitors. Various chemical descriptors that encode hydrophobic, topological, geometrical and electronic properties are calculated to represent the structures of the molecules in the dataset. We use the hybrid-GA (genetic algorithm) optimization technique for descriptor space reduction. The linear multiple regression analysis (MLR), correlation-based feature selection (CFS), non-linear decision tree (DT), and artificial neural network (ANN) approaches are used as fitness functions. The selected descriptors represent the overall descriptor space and account well for the binding nature of the considered dataset. These selected features are also human interpretable and can be used to explain the interactions between a drug molecule and its receptor protein (HIV protease). The selected descriptors are then used for developing the QSAR prediction models by using the MLR, DT and ANN approaches. These models are discussed, analyzed and compared to validate and test their performance for this dataset. All three approaches yield the QSAR models with good prediction performance. The models developed by DT and ANN are comparable and have better prediction than the MLR model. For ANN model, weight analysis is carried out to analyze the role of various descriptors in activity prediction. All the prediction models point towards the involvement of hydrophobic interactions. These models can be useful for predicting the biological activity of new untested HIV protease inhibitors and virtual screening for identifying new lead compounds.
Collapse
Affiliation(s)
- A Srinivas Reddy
- Department of Biomedical Engineering, University of California-Davis, Davis, CA 95616-5294, USA.
| | | | | |
Collapse
|
7
|
Mueller R, Rodriguez AL, Dawson ES, Butkiewicz M, Nguyen TT, Oleszkiewicz S, Bleckmann A, Weaver CD, Lindsley CW, Conn PJ, Meiler J. Identification of Metabotropic Glutamate Receptor Subtype 5 Potentiators Using Virtual High-Throughput Screening. ACS Chem Neurosci 2010; 1:288-305. [PMID: 20414370 PMCID: PMC2857954 DOI: 10.1021/cn9000389] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Accepted: 01/04/2010] [Indexed: 11/30/2022] Open
Abstract
![]()
Selective potentiators of glutamate response at metabotropic glutamate receptor subtype 5 (mGluR5) have exciting potential for the development of novel treatment strategies for schizophrenia. A total of 1,382 compounds with positive allosteric modulation (PAM) of the mGluR5 glutamate response were identified through high-throughput screening (HTS) of a diverse library of 144,475 substances utilizing a functional assay measuring receptor-induced intracellular release of calcium. Primary hits were tested for concentration-dependent activity, and potency data (EC50 values) were used for training artificial neural network (ANN) quantitative structure−activity relationship (QSAR) models that predict biological potency from the chemical structure. While all models were trained to predict EC50, the quality of the models was assessed by using both continuous measures and binary classification. Numerical descriptors of chemical structure were used as input for the machine learning procedure and optimized in an iterative protocol. The ANN models achieved theoretical enrichment ratios of up to 38 for an independent data set not used in training the model. A database of ∼450,000 commercially available drug-like compounds was targeted in a virtual screen. A set of 824 compounds was obtained for testing based on the highest predicted potency values. Biological testing found 28.2% (232/824) of these compounds with various activities at mGluR5 including 177 pure potentiators and 55 partial agonists. These results represent an enrichment factor of 23 for pure potentiation of the mGluR5 glutamate response and 30 for overall mGluR5 modulation activity when compared with those of the original mGluR5 experimental screening data (0.94% hit rate). The active compounds identified contained 72% close derivatives of previously identified PAMs as well as 28% nontrivial derivatives of known active compounds.
Collapse
Affiliation(s)
| | | | - Eric S. Dawson
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37232-6600
| | | | | | | | | | | | - Craig W. Lindsley
- Department of Chemistry
- Department of Pharmacology
- Institute for Chemical Biology
| | | | - Jens Meiler
- Department of Chemistry
- Department of Pharmacology
- Institute for Chemical Biology
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37232-6600
| |
Collapse
|
8
|
Hecht D, Fogel GB. A Novel In Silico Approach to Drug Discovery via Computational Intelligence. J Chem Inf Model 2009; 49:1105-21. [DOI: 10.1021/ci9000647] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- David Hecht
- Southwestern College, 900 Otay Lakes Road, Chula Vista, California 91910, and Natural Selection, Inc., 9330 Scranton Road, Suite 150, San Diego, California 92121
| | - Gary B. Fogel
- Southwestern College, 900 Otay Lakes Road, Chula Vista, California 91910, and Natural Selection, Inc., 9330 Scranton Road, Suite 150, San Diego, California 92121
| |
Collapse
|