1
|
Colliandre L, Muller C. Bayesian Optimization in Drug Discovery. Methods Mol Biol 2024; 2716:101-136. [PMID: 37702937 DOI: 10.1007/978-1-0716-3449-3_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
Drug discovery deals with the search for initial hits and their optimization toward a targeted clinical profile. Throughout the discovery pipeline, the candidate profile will evolve, but the optimization will mainly stay a trial-and-error approach. Tons of in silico methods have been developed to improve and fasten this pipeline. Bayesian optimization (BO) is a well-known method for the determination of the global optimum of a function. In the last decade, BO has gained popularity in the early drug design phase. This chapter starts with the concept of black box optimization applied to drug design and presents some approaches to tackle it. Then it focuses on BO and explains its principle and all the algorithmic building blocks needed to implement it. This explanation aims to be accessible to people involved in drug discovery projects. A strong emphasis is made on the solutions to deal with the specific constraints of drug discovery. Finally, a large set of practical applications of BO is highlighted.
Collapse
|
2
|
Guo W, Liu J, Dong F, Song M, Li Z, Khan MKH, Patterson TA, Hong H. Review of machine learning and deep learning models for toxicity prediction. Exp Biol Med (Maywood) 2023; 248:1952-1973. [PMID: 38057999 PMCID: PMC10798180 DOI: 10.1177/15353702231209421] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023] Open
Abstract
The ever-increasing number of chemicals has raised public concerns due to their adverse effects on human health and the environment. To protect public health and the environment, it is critical to assess the toxicity of these chemicals. Traditional in vitro and in vivo toxicity assays are complicated, costly, and time-consuming and may face ethical issues. These constraints raise the need for alternative methods for assessing the toxicity of chemicals. Recently, due to the advancement of machine learning algorithms and the increase in computational power, many toxicity prediction models have been developed using various machine learning and deep learning algorithms such as support vector machine, random forest, k-nearest neighbors, ensemble learning, and deep neural network. This review summarizes the machine learning- and deep learning-based toxicity prediction models developed in recent years. Support vector machine and random forest are the most popular machine learning algorithms, and hepatotoxicity, cardiotoxicity, and carcinogenicity are the frequently modeled toxicity endpoints in predictive toxicology. It is known that datasets impact model performance. The quality of datasets used in the development of toxicity prediction models using machine learning and deep learning is vital to the performance of the developed models. The different toxicity assignments for the same chemicals among different datasets of the same type of toxicity have been observed, indicating benchmarking datasets is needed for developing reliable toxicity prediction models using machine learning and deep learning algorithms. This review provides insights into current machine learning models in predictive toxicology, which are expected to promote the development and application of toxicity prediction models in the future.
Collapse
Affiliation(s)
- Wenjing Guo
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Jie Liu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Fan Dong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Meng Song
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Zoe Li
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Md Kamrul Hasan Khan
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Tucker A Patterson
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| |
Collapse
|
3
|
Sharma B, Chenthamarakshan V, Dhurandhar A, Pereira S, Hendler JA, Dordick JS, Das P. Accurate clinical toxicity prediction using multi-task deep neural nets and contrastive molecular explanations. Sci Rep 2023; 13:4908. [PMID: 36966203 PMCID: PMC10039880 DOI: 10.1038/s41598-023-31169-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 03/07/2023] [Indexed: 03/27/2023] Open
Abstract
Explainable machine learning for molecular toxicity prediction is a promising approach for efficient drug development and chemical safety. A predictive ML model of toxicity can reduce experimental cost and time while mitigating ethical concerns by significantly reducing animal and clinical testing. Herein, we use a deep learning framework for simultaneously modeling in vitro, in vivo, and clinical toxicity data. Two different molecular input representations are used; Morgan fingerprints and pre-trained SMILES embeddings. A multi-task deep learning model accurately predicts toxicity for all endpoints, including clinical, as indicated by the area under the Receiver Operator Characteristic curve and balanced accuracy. In particular, pre-trained molecular SMILES embeddings as input to the multi-task model improved clinical toxicity predictions compared to existing models in MoleculeNet benchmark. Additionally, our multitask approach is comprehensive in the sense that it is comparable to state-of-the-art approaches for specific endpoints in in vitro, in vivo and clinical platforms. Through both the multi-task model and transfer learning, we were able to indicate the minimal need of in vivo data for clinical toxicity predictions. To provide confidence and explain the model's predictions, we adapt a post-hoc contrastive explanation method that returns pertinent positive and negative features, which correspond well to known mutagenic and reactive toxicophores, such as unsubstituted bonded heteroatoms, aromatic amines, and Michael receptors. Furthermore, toxicophore recovery by pertinent feature analysis captures more of the in vitro (53%) and in vivo (56%), rather than of the clinical (8%), endpoints, and indeed uncovers a preference in known toxicophore data towards in vitro and in vivo experimental data. To our knowledge, this is the first contrastive explanation, using both present and absent substructures, for predictions of clinical and in vivo molecular toxicity.
Collapse
Affiliation(s)
| | | | | | - Shiranee Pereira
- ICARE, International Center for Alternatives in Research and Education, Chennai, India
| | | | | | - Payel Das
- IBM Research, Yorktown Heights, NY, USA.
| |
Collapse
|
4
|
Choi IH, Oh IS. Weighted edit distance optimized using genetic algorithm for SMILES-based compound similarity. Pattern Anal Appl 2023. [DOI: 10.1007/s10044-023-01141-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2023]
|
5
|
Guo M, Shou W, Makatura L, Erps T, Foshey M, Matusik W. Polygrammar: Grammar for Digital Polymer Representation and Generation. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2022; 9:e2101864. [PMID: 35678650 PMCID: PMC9376847 DOI: 10.1002/advs.202101864] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 12/04/2021] [Indexed: 05/22/2023]
Abstract
Polymers are widely studied materials with diverse properties and applications determined by molecular structures. It is essential to represent these structures clearly and explore the full space of achievable chemical designs. However, existing approaches cannot offer comprehensive design models for polymers because of their inherent scale and structural complexity. Here, a parametric, context-sensitive grammar designed specifically for polymers (PolyGrammar) is proposed. Using the symbolic hypergraph representation and 14 simple production rules, PolyGrammar can represent and generate all valid polyurethane structures. An algorithm is presented to translate any polyurethane structure from the popular Simplified Molecular-Input Line-entry System (SMILES) string format into the PolyGrammar representation. The representative power of PolyGrammar is tested by translating a dataset of over 600 polyurethane samples collected from the literature. Furthermore, it is shown that PolyGrammar can be easily extended to other copolymers and homopolymers. By offering a complete, explicit representation scheme and an explainable generative model with validity guarantees, PolyGrammar takes an essential step toward a more comprehensive and practical system for polymer discovery and exploration. As the first bridge between formal languages and chemistry, PolyGrammar also serves as a critical blueprint to inform the design of similar grammars for other chemistries, including organic and inorganic molecules.
Collapse
Affiliation(s)
- Minghao Guo
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
- CUHK Multimedia LabThe Chinese University of Hong KongSha TinHong Kong
| | - Wan Shou
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Liane Makatura
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Timothy Erps
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Michael Foshey
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Wojciech Matusik
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| |
Collapse
|
6
|
Tao Xue H, Stanley-Baker M, Wai Kin Kong A, Leung Li H, Wen Bin Goh W. Data considerations for predictive modeling applied to the discovery of bioactive natural products. Drug Discov Today 2022; 27:2235-2243. [DOI: 10.1016/j.drudis.2022.05.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 03/21/2022] [Accepted: 05/10/2022] [Indexed: 11/29/2022]
|
7
|
Wang Z, Liu M, Luo Y, Xu Z, Xie Y, Wang L, Cai L, Qi Q, Yuan Z, Yang T, Ji S. Advanced Graph and Sequence Neural Networks for Molecular Property Prediction and Drug Discovery. Bioinformatics 2022; 38:2579-2586. [PMID: 35179547 DOI: 10.1093/bioinformatics/btac112] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 02/13/2022] [Accepted: 02/16/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Properties of molecules are indicative of their functions and thus are useful in many applications. With the advances of deep learning methods, computational approaches for predicting molecular properties are gaining increasing momentum. However, there lacks customized and advanced methods and comprehensive tools for this task currently. RESULTS Here we develop a suite of comprehensive machine learning methods and tools spanning different computational models, molecular representations, and loss functions for molecular property prediction and drug discovery. Specifically, we represent molecules as both graphs and sequences. Built on these representations, we develop novel deep models for learning from molecular graphs and sequences. In order to learn effectively from highly imbalanced datasets, we develop advanced loss functions that optimize areas under precision-recall curves and receiver operating characteristic curves. Altogether, our work not only serves as a comprehensive tool, but also contributes towards developing novel and advanced graph and sequence learning methodologies. Results on both online and offline antibiotics discovery and molecular property prediction tasks show that our methods achieve consistent improvements over prior methods. In particular, our methods achieve #1 ranking in terms of both ROC-AUC and PRC-AUC on the AI Cures Open Challenge for drug discovery related to COVID-19. AVAILABILITY AND IMPLEMENTATION Our source code is released as part of the MoleculeX library (https://github.com/divelab/MoleculeX) under AdvProp.
Collapse
Affiliation(s)
- Zhengyang Wang
- Texas A&M University, Department of Computer Science and Engineering, College Station, TX 77843, USA
| | - Meng Liu
- Texas A&M University, Department of Computer Science and Engineering, College Station, TX 77843, USA
| | - Youzhi Luo
- Texas A&M University, Department of Computer Science and Engineering, College Station, TX 77843, USA
| | - Zhao Xu
- Texas A&M University, Department of Computer Science and Engineering, College Station, TX 77843, USA
| | - Yaochen Xie
- Texas A&M University, Department of Computer Science and Engineering, College Station, TX 77843, USA
| | - Limei Wang
- Texas A&M University, Department of Computer Science and Engineering, College Station, TX 77843, USA
| | - Lei Cai
- Texas A&M University, Department of Computer Science and Engineering, College Station, TX 77843, USA
| | - Qi Qi
- University of Iowa, Department of Computer Science, Iowa City, IA 52242, USA
| | - Zhuoning Yuan
- University of Iowa, Department of Computer Science, Iowa City, IA 52242, USA
| | - Tianbao Yang
- University of Iowa, Department of Computer Science, Iowa City, IA 52242, USA
| | - Shuiwang Ji
- University of Iowa, Department of Computer Science, Iowa City, IA 52242, USA
| |
Collapse
|
8
|
Liang S, Yu H. Revealing new therapeutic opportunities through drug target prediction: a class imbalance-tolerant machine learning approach. Bioinformatics 2021; 36:4490-4497. [PMID: 32399556 DOI: 10.1093/bioinformatics/btaa495] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 02/18/2020] [Accepted: 05/06/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION In silico drug target prediction provides valuable information for drug repurposing, understanding of side effects as well as expansion of the druggable genome. In particular, discovery of actionable drug targets is critical to developing targeted therapies for diseases. RESULTS Here, we develop a robust method for drug target prediction by leveraging a class imbalance-tolerant machine learning framework with a novel training scheme. We incorporate novel features, including drug-gene phenotype similarity and gene expression profile similarity that capture information orthogonal to other features. We show that our classifier achieves robust performance and is able to predict gene targets for new drugs as well as drugs that potentially target unexplored genes. By providing newly predicted drug-target associations, we uncover novel opportunities of drug repurposing that may benefit cancer treatment through action on either known drug targets or currently undrugged genes. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
9
|
Wang MWH, Goodman JM, Allen TEH. Machine Learning in Predictive Toxicology: Recent Applications and Future Directions for Classification Models. Chem Res Toxicol 2020; 34:217-239. [PMID: 33356168 DOI: 10.1021/acs.chemrestox.0c00316] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
In recent times, machine learning has become increasingly prominent in predictive toxicology as it has shifted from in vivo studies toward in silico studies. Currently, in vitro methods together with other computational methods such as quantitative structure-activity relationship modeling and absorption, distribution, metabolism, and excretion calculations are being used. An overview of machine learning and its applications in predictive toxicology is presented here, including support vector machines (SVMs), random forest (RF) and decision trees (DTs), neural networks, regression models, naïve Bayes, k-nearest neighbors, and ensemble learning. The recent successes of these machine learning methods in predictive toxicology are summarized, and a comparison of some models used in predictive toxicology is presented. In predictive toxicology, SVMs, RF, and DTs are the dominant machine learning methods due to the characteristics of the data available. Lastly, this review describes the current challenges facing the use of machine learning in predictive toxicology and offers insights into the possible areas of improvement in the field.
Collapse
Affiliation(s)
- Marcus W H Wang
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Jonathan M Goodman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Timothy E H Allen
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom.,MRC Toxicology Unit, University of Cambridge, Hodgkin Building, Lancaster Road, Leicester LE1 7HB, United Kingdom
| |
Collapse
|
10
|
Peng Y, Zhang Z, Jiang Q, Guan J, Zhou S. TOP: A deep mixture representation learning method for boosting molecular toxicity prediction. Methods 2020; 179:55-64. [PMID: 32446957 DOI: 10.1016/j.ymeth.2020.05.013] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Revised: 05/12/2020] [Accepted: 05/13/2020] [Indexed: 01/18/2023] Open
Abstract
At the early stages of the drug discovery, molecule toxicity prediction is crucial to excluding drug candidates that are likely to fail in clinical trials. In this paper, we presented a novel molecular representation method and developed a corresponding deep learning-based framework called TOP (the abbreviation of TOxicity Prediction). TOP integrates specifically designed data preprocessing methods, an RNN based on bidirectional gated recurrent unit (BiGRU), and fully connected neural networks for end-to-end molecular representation learning and chemical toxicity prediction. TOP can automatically learn a mixed molecular representation from not only SMILES contextual information that describes the molecule structure, but also physiochemical properties. Therefore, TOP can overcome the drawbacks of existing methods that use either of them, thus greatly promotes toxicity prediction accuracy. We conducted extensive experiments over 14 classic toxicity prediction tasks on three different benchmark datasets, including balanced and imbalanced ones. The results show that, with the help of the novel molecular representation method, TOP significantly outperforms not only three baseline machine learning methods, but also five state-of-the-art methods.
Collapse
Affiliation(s)
- Yuzhong Peng
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200433, China; Key Lab of Scientific Computing and Intelligent Information Processing in Universities of Guangxi, Nanning Normal University, Nanning 530001, China.
| | - Ziqiao Zhang
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200433, China.
| | - Qizhi Jiang
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200433, China.
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China.
| | - Shuigeng Zhou
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200433, China.
| |
Collapse
|
11
|
Öztürk H, Ozkirimli E, Özgür A. A novel methodology on distributed representations of proteins using their interacting ligands. Bioinformatics 2019; 34:i295-i303. [PMID: 29949957 PMCID: PMC6022674 DOI: 10.1093/bioinformatics/bty287] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Motivation The effective representation of proteins is a crucial task that directly affects the performance of many bioinformatics problems. Related proteins usually bind to similar ligands. Chemical characteristics of ligands are known to capture the functional and mechanistic properties of proteins suggesting that a ligand-based approach can be utilized in protein representation. In this study, we propose SMILESVec, a Simplified molecular input line entry system (SMILES)-based method to represent ligands and a novel method to compute similarity of proteins by describing them based on their ligands. The proteins are defined utilizing the word-embeddings of the SMILES strings of their ligands. The performance of the proposed protein description method is evaluated in protein clustering task using TransClust and MCL algorithms. Two other protein representation methods that utilize protein sequence, Basic local alignment tool and ProtVec, and two compound fingerprint-based protein representation methods are compared. Results We showed that ligand-based protein representation, which uses only SMILES strings of the ligands that proteins bind to, performs as well as protein sequence-based representation methods in protein clustering. The results suggest that ligand-based protein description can be an alternative to the traditional sequence or structure-based representation of proteins and this novel approach can be applied to different bioinformatics problems such as prediction of new protein–ligand interactions and protein function annotation. Availability and implementation https://github.com/hkmztrk/SMILESVecProteinRepresentation Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hakime Öztürk
- Department of Computer Engineering, Bogazici University, Istanbul, Turkey
| | - Elif Ozkirimli
- Department of Chemical Engineering, Bogazici University, Istanbul, Turkey
| | - Arzucan Özgür
- Department of Computer Engineering, Bogazici University, Istanbul, Turkey
| |
Collapse
|
12
|
Lin TS, Coley CW, Mochigase H, Beech HK, Wang W, Wang Z, Woods E, Craig SL, Johnson JA, Kalow JA, Jensen KF, Olsen BD. BigSMILES: A Structurally-Based Line Notation for Describing Macromolecules. ACS CENTRAL SCIENCE 2019; 5:1523-1531. [PMID: 31572779 PMCID: PMC6764162 DOI: 10.1021/acscentsci.9b00476] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Indexed: 05/21/2023]
Abstract
Having a compact yet robust structurally based identifier or representation system is a key enabling factor for efficient sharing and dissemination of research results within the chemistry community, and such systems lay down the essential foundations for future informatics and data-driven research. While substantial advances have been made for small molecules, the polymer community has struggled in coming up with an efficient representation system. This is because, unlike other disciplines in chemistry, the basic premise that each distinct chemical species corresponds to a well-defined chemical structure does not hold for polymers. Polymers are intrinsically stochastic molecules that are often ensembles with a distribution of chemical structures. This difficulty limits the applicability of all deterministic representations developed for small molecules. In this work, a new representation system that is capable of handling the stochastic nature of polymers is proposed. The new system is based on the popular "simplified molecular-input line-entry system" (SMILES), and it aims to provide representations that can be used as indexing identifiers for entries in polymer databases. As a pilot test, the entries of the standard data set of the glass transition temperature of linear polymers (Bicerano, 2002) were converted into the new BigSMILES language. Furthermore, it is hoped that the proposed system will provide a more effective language for communication within the polymer community and increase cohesion between the researchers within the community.
Collapse
Affiliation(s)
- Tzyy-Shyang Lin
- Department
of Chemical Engineering and Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Connor W. Coley
- Department
of Chemical Engineering and Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Hidenobu Mochigase
- Department
of Chemical Engineering and Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Haley K. Beech
- Department
of Chemical Engineering and Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Wencong Wang
- Department
of Chemical Engineering and Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Zi Wang
- Department
of Chemistry, Duke University, Durham, North Carolina 27708, United States
| | - Eliot Woods
- Department
of Chemistry, Northwestern University, Evanston, Illinois 60208, United States
| | - Stephen L. Craig
- Department
of Chemistry, Duke University, Durham, North Carolina 27708, United States
| | - Jeremiah A. Johnson
- Department
of Chemical Engineering and Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Julia A. Kalow
- Department
of Chemistry, Northwestern University, Evanston, Illinois 60208, United States
| | - Klavs F. Jensen
- Department
of Chemical Engineering and Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Bradley D. Olsen
- Department
of Chemical Engineering and Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
- E-mail:
| |
Collapse
|
13
|
Chakravarti SK, Alla SRM. Descriptor Free QSAR Modeling Using Deep Learning With Long Short-Term Memory Neural Networks. Front Artif Intell 2019; 2:17. [PMID: 33733106 PMCID: PMC7861338 DOI: 10.3389/frai.2019.00017] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Accepted: 08/22/2019] [Indexed: 12/15/2022] Open
Abstract
Current practice of building QSAR models usually involves computing a set of descriptors for the training set compounds, applying a descriptor selection algorithm and finally using a statistical fitting method to build the model. In this study, we explored the prospects of building good quality interpretable QSARs for big and diverse datasets, without using any pre-calculated descriptors. We have used different forms of Long Short-Term Memory (LSTM) neural networks to achieve this, trained directly using either traditional SMILES codes or a new linear molecular notation developed as part of this work. Three endpoints were modeled: Ames mutagenicity, inhibition of P. falciparum Dd2 and inhibition of Hepatitis C Virus, with training sets ranging from 7,866 to 31,919 compounds. To boost the interpretability of the prediction results, attention-based machine learning mechanism, jointly with a bidirectional LSTM was used to detect structural alerts for the mutagenicity data set. Traditional fragment descriptor-based models were used for comparison. As per the results of the external and cross-validation experiments, overall prediction accuracies of the LSTM models were close to the fragment-based models. However, LSTM models were superior in predicting test chemicals that are dissimilar to the training set compounds, a coveted quality of QSAR models in real world applications. In summary, it is possible to build QSAR models using LSTMs without using pre-computed traditional descriptors, and models are far from being "black box." We wish that this study will be helpful in bringing large, descriptor-less QSARs to mainstream use.
Collapse
|
14
|
Dhami DS, Kunapuli G, Das M, Page D, Natarajan S. Drug-Drug Interaction Discovery: Kernel Learning from Heterogeneous Similarities. SMART HEALTH (AMSTERDAM, NETHERLANDS) 2018; 9-10:88-100. [PMID: 30547078 PMCID: PMC6289266 DOI: 10.1016/j.smhl.2018.07.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
We develop a pipeline to mine complex drug interactions by combining different similarities and interaction types (molecular, structural, phenotypic, genomic etc). Our goal is to learn an optimal kernel from these heterogeneous similarities in a supervised manner. We formulate an extensible framework that can easily integrate new interaction types into a rich model. The core of our pipeline features a novel kernel-learning approach that tunes the weights of the heterogeneous similarities, and fuses them into a Similarity-based Kernel for Identifying Drug-Drug interactions and Discovery, or SKID3. Experimental evaluation on the DrugBank database shows that SKID3 effectively combines similarities generated from chemical reaction pathways (which generally improve precision) and molecular and structural fingerprints (which generally improve recall) into a single kernel that gets the best of both worlds, and consequently demonstrates the best performance.
Collapse
Affiliation(s)
- Devendra Singh Dhami
- Erik Jonsson School of Engineering and Computer Science, The University of Texas at Dallas, United States
| | - Gautam Kunapuli
- Erik Jonsson School of Engineering and Computer Science, The University of Texas at Dallas, United States
| | - Mayukh Das
- Erik Jonsson School of Engineering and Computer Science, The University of Texas at Dallas, United States
| | - David Page
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, United States
| | - Sriraam Natarajan
- Erik Jonsson School of Engineering and Computer Science, The University of Texas at Dallas, United States
- School of Informatics, Computing & Engineering, Indiana University Bloomington, United States (On Leave)
| |
Collapse
|
15
|
Coley CW, Barzilay R, Green WH, Jaakkola TS, Jensen KF. Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction. J Chem Inf Model 2017; 57:1757-1772. [PMID: 28696688 DOI: 10.1021/acs.jcim.6b00601] [Citation(s) in RCA: 220] [Impact Index Per Article: 31.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The task of learning an expressive molecular representation is central to developing quantitative structure-activity and property relationships. Traditional approaches rely on group additivity rules, empirical measurements or parameters, or generation of thousands of descriptors. In this paper, we employ a convolutional neural network for this embedding task by treating molecules as undirected graphs with attributed nodes and edges. Simple atom and bond attributes are used to construct atom-specific feature vectors that take into account the local chemical environment using different neighborhood radii. By working directly with the full molecular graph, there is a greater opportunity for models to identify important features relevant to a prediction task. Unlike other graph-based approaches, our atom featurization preserves molecule-level spatial information that significantly enhances model performance. Our models learn to identify important features of atom clusters for the prediction of aqueous solubility, octanol solubility, melting point, and toxicity. Extensions and limitations of this strategy are discussed.
Collapse
Affiliation(s)
- Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology , 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology , 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology , 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Tommi S Jaakkola
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology , 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology , 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
16
|
Yamane J, Aburatani S, Imanishi S, Akanuma H, Nagano R, Kato T, Sone H, Ohsako S, Fujibuchi W. Prediction of developmental chemical toxicity based on gene networks of human embryonic stem cells. Nucleic Acids Res 2016; 44:5515-28. [PMID: 27207879 PMCID: PMC4937330 DOI: 10.1093/nar/gkw450] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Accepted: 05/09/2016] [Indexed: 01/01/2023] Open
Abstract
Predictive toxicology using stem cells or their derived tissues has gained increasing importance in biomedical and pharmaceutical research. Here, we show that toxicity category prediction by support vector machines (SVMs), which uses qRT-PCR data from 20 categorized chemicals based on a human embryonic stem cell (hESC) system, is improved by the adoption of gene networks, in which network edge weights are added as feature vectors when noisy qRT-PCR data fail to make accurate predictions. The accuracies of our system were 97.5–100% for three toxicity categories: neurotoxins (NTs), genotoxic carcinogens (GCs) and non-genotoxic carcinogens (NGCs). For two uncategorized chemicals, bisphenol-A and permethrin, our system yielded reasonable results: bisphenol-A was categorized as an NGC, and permethrin was categorized as an NT; both predictions were supported by recently published papers. Our study has two important features: (i) as the first study to employ gene networks without using conventional quantitative structure-activity relationships (QSARs) as input data for SVMs to analyze toxicogenomics data in an hESC validation system, it uses additional information of gene-to-gene interactions to significantly increase prediction accuracies for noisy gene expression data; and (ii) using only undifferentiated hESCs, our study has considerable potential to predict late-onset chemical toxicities, including abnormalities that occur during embryonic development.
Collapse
Affiliation(s)
- Junko Yamane
- Center for iPS Cell Research and Application, Kyoto University, 53 Kawahara-cho, Shogoin, Sakyo-ku, Kyoto 606-8507, Japan Center for Disease Biology and Integrative Medicine, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan
| | - Sachiyo Aburatani
- Computational Biology Research Center, Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Satoshi Imanishi
- Center for Disease Biology and Integrative Medicine, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan
| | - Hiromi Akanuma
- Research Center for Environmental Risk, National Institute for Environmental Studies, 16-2 Onogawa, Tsukuba, Ibaraki 305-8506, Japan
| | - Reiko Nagano
- Research Center for Environmental Risk, National Institute for Environmental Studies, 16-2 Onogawa, Tsukuba, Ibaraki 305-8506, Japan
| | - Tsuyoshi Kato
- Department of Computer Science, Gunma University, 1-5-1 Tenjin-cho, Kiryu, Gunma 376-8515, Japan
| | - Hideko Sone
- Research Center for Environmental Risk, National Institute for Environmental Studies, 16-2 Onogawa, Tsukuba, Ibaraki 305-8506, Japan
| | - Seiichiroh Ohsako
- Center for Disease Biology and Integrative Medicine, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan
| | - Wataru Fujibuchi
- Center for iPS Cell Research and Application, Kyoto University, 53 Kawahara-cho, Shogoin, Sakyo-ku, Kyoto 606-8507, Japan Computational Biology Research Center, Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| |
Collapse
|
17
|
Öztürk H, Ozkirimli E, Özgür A. A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction. BMC Bioinformatics 2016; 17:128. [PMID: 26987649 PMCID: PMC4797122 DOI: 10.1186/s12859-016-0977-x] [Citation(s) in RCA: 70] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 03/03/2016] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Molecular structures can be represented as strings of special characters using SMILES. Since each molecule is represented as a string, the similarity between compounds can be computed using SMILES-based string similarity functions. Most previous studies on drug-target interaction prediction use 2D-based compound similarity kernels such as SIMCOMP. To the best of our knowledge, using SMILES-based similarity functions, which are computationally more efficient than the 2D-based kernels, has not been investigated for this task before. RESULTS In this study, we adapt and evaluate various SMILES-based similarity methods for drug-target interaction prediction. In addition, inspired by the vector space model of Information Retrieval we propose cosine similarity based SMILES kernels that make use of the Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) weighting approaches. We also investigate generating composite kernels by combining our best SMILES-based similarity functions with the SIMCOMP kernel. With this study, we provided a comparison of 13 different ligand similarity functions, each of which utilizes the SMILES string of molecule representation. Additionally, TF and TF-IDF based cosine similarity kernels are proposed. CONCLUSION The more efficient SMILES-based similarity functions performed similarly to the more complex 2D-based SIMCOMP kernel in terms of AUC-ROC scores. The TF-IDF based cosine similarity obtained a better AUC-PR score than the SIMCOMP kernel on the GPCR benchmark data set. The composite kernel of TF-IDF based cosine similarity and SIMCOMP achieved the best AUC-PR scores for all data sets.
Collapse
Affiliation(s)
- Hakime Öztürk
- Department of Computer Engineering, Bogazici University, Bebek, Istanbul, 34342, Turkey.
| | - Elif Ozkirimli
- Department of Computer Engineering, Bogazici University, Bebek, Istanbul, 34342, Turkey.
| | - Arzucan Özgür
- Department of Computer Engineering, Bogazici University, Bebek, Istanbul, 34342, Turkey.
| |
Collapse
|
18
|
Basant N, Gupta S, Singh KP. Predicting aquatic toxicities of chemical pesticides in multiple test species using nonlinear QSTR modeling approaches. CHEMOSPHERE 2015; 139:246-255. [PMID: 26142614 DOI: 10.1016/j.chemosphere.2015.06.063] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 06/10/2015] [Accepted: 06/12/2015] [Indexed: 06/04/2023]
Abstract
In this study, we established nonlinear quantitative-structure toxicity relationship (QSTR) models for predicting the toxicities of chemical pesticides in multiple aquatic test species following the OECD (Organization for Economic Cooperation and Development) guidelines. The decision tree forest (DTF) and decision tree boost (DTB) based QSTR models were constructed using a pesticides toxicity dataset in Selenastrum capricornutum and a set of six descriptors. Other six toxicity data sets were used for external validation of the constructed QSTRs. Global QSTR models were also constructed using the combined dataset of all the seven species. The diversity in chemical structures and nonlinearity in the data were evaluated. Model validation was performed deriving several statistical coefficients for the test data and the prediction and generalization abilities of the QSTRs were evaluated. Both the QSTR models identified WPSA1 (weighted charged partial positive surface area) as the most influential descriptor. The DTF and DTB QSTRs performed relatively better than the single decision tree (SDT) and support vector machines (SVM) models used as a benchmark here and yielded R(2) of 0.886 and 0.964 between the measured and predicted toxicity values in the complete dataset (S. capricornutum). The QSTR models applied to six other aquatic species toxicity data yielded R(2) of >0.92 (DTF) and >0.97 (DTB), respectively. The prediction accuracies of the global models were comparable with those of the S. capricornutum models. The results suggest for the appropriateness of the developed QSTR models to reliably predict the aquatic toxicity of chemicals and can be used for regulatory purpose.
Collapse
Affiliation(s)
- Nikita Basant
- Kan Ban Systems Pvt. Ltd., Laxmi Nagar, Delhi 110092, India.
| | - Shikha Gupta
- Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001, India.
| | - Kunwar P Singh
- Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001, India.
| |
Collapse
|
19
|
Network Pharmacology Bridges Traditional Application and Modern Development of Traditional Chinese Medicine. CHINESE HERBAL MEDICINES 2015. [DOI: 10.1016/s1674-6384(15)60014-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
20
|
Zhang X, Fan HR, Li YZ, Xiao XF, Liu R, Qi JW, Wang J, Zhang ZP, Liu CX, Shen XP. Development and Application of Network Toxicology in Safety Research of Chinese Materia Medica. CHINESE HERBAL MEDICINES 2015. [DOI: 10.1016/s1674-6384(15)60016-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
21
|
Lampa E, Lind L, Lind PM, Bornefalk-Hermansson A. The identification of complex interactions in epidemiology and toxicology: a simulation study of boosted regression trees. Environ Health 2014; 13:57. [PMID: 24993424 PMCID: PMC4120739 DOI: 10.1186/1476-069x-13-57] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2013] [Accepted: 06/28/2014] [Indexed: 05/29/2023]
Abstract
BACKGROUND There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants. The usual approach is to formulate an additive statistical model and check for departures using product terms between the variables of interest. In this paper, we present an approach to search for interaction effects among several variables using boosted regression trees. METHODS We simulate a continuous outcome from real data on 27 environmental contaminants, some of which are correlated, and test the method's ability to uncover the simulated interactions. The simulated outcome contains one four-way interaction, one non-linear effect and one interaction between a continuous variable and a binary variable. Four scenarios reflecting different strengths of association are simulated. We illustrate the method using real data. RESULTS The method succeeded in identifying the true interactions in all scenarios except where the association was weakest. Some spurious interactions were also found, however. The method was also capable to identify interactions in the real data set. CONCLUSIONS We conclude that boosted regression trees can be used to uncover complex interaction effects in epidemiological studies.
Collapse
Affiliation(s)
- Erik Lampa
- Department of Medical Sciences, Occupational and Environmental Medicine, Uppsala University, 75185 Uppsala Sweden
| | - Lars Lind
- Department of Medical Sciences, Cardiovascular Epidemiology, Uppsala University, 75185 Uppsala Sweden
| | - P Monica Lind
- Department of Medical Sciences, Occupational and Environmental Medicine, Uppsala University, 75185 Uppsala Sweden
| | | |
Collapse
|
22
|
Doucet JP, Doucet-Panaye A. Structure-activity relationship study of trifluoromethylketone inhibitors of insect juvenile hormone esterase: comparison of several classification methods. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2014; 25:589-616. [PMID: 24884820 DOI: 10.1080/1062936x.2014.919959] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Juvenile hormone esterase (JHE) plays a key role in the development and metamorphosis of holometabolous insects. Its inhibitors could possibly be targeted for insect control. Conversely, JHE may also be involved in endocrine disruption by xenobiotics, resulting in detrimental effects in beneficial insects. There is therefore a need to know the structural characteristics of the molecules able to monitor JHE activity, and to develop SAR and QSAR studies to estimate their effectiveness. For a large diverse population of 181 trifluoromethylketones (TFKs) - the most potent JHE inhibitors known to date - we recently proposed a binary classification (active/inactive) using a support vector machine and Codessa structural descriptors. We have now examined, using the same data set and with the same descriptors, the applicability and performance of five other machine learning approaches. These have been shown able to handle high dimensional data (with descriptors possibly irrelevant or redundant) and to cope with complex mechanisms, but without delivering explicit directly exploitable models. Splitting the data into five batches (training set 80%, test set 20%) and carrying out leave-one-out cross-validation, led to good results of comparable performance, consistent with our previous support vector classifier (SVC) results. Accuracy was greater than 0.80 for all approaches. A reduced set of 15 descriptors common to all the investigated approaches showed good predictive ability (confirmed using a three-layer perceptron) and gives some clues regarding a mechanistic interpretation.
Collapse
Affiliation(s)
- J P Doucet
- a Itodys , Université Paris-Diderot , UMR 7086 , Paris , France
| | | |
Collapse
|
23
|
Yan J, Zhu WW, Kong B, Lu HB, Yun YH, Huang JH, Liang YZ. A Combinational Strategy of Model Disturbance and Outlier Comparison to Define Applicability Domain in Quantitative Structural Activity Relationship. Mol Inform 2014; 33:503-13. [PMID: 27486037 DOI: 10.1002/minf.201300161] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Accepted: 04/16/2014] [Indexed: 01/21/2023]
Abstract
In order to define an applicability domain for quantitative structure-activity relationship modeling, a combinational strategy of model disturbance and outlier comparison is developed. An indicator named model disturbance index was defined to estimate the prediction error. Moreover, the information of the outliers in the training set was used to filter the unreliable samples in the test set based on "structural similarity". Chromatography retention indices data were used to investigate this approach. The relationship between model disturbance index and prediction error can be found. Also, the comparison between the outlier set and the test set could provide additional information about which unknown samples should be paid more attentions. A novel technique based on model population analysis was used to evaluate the validity of applicability domain. Finally, three commonly used methods, i.e. Leverage, descriptor range-based and model perturbation method, were compared with the proposed approach.
Collapse
Affiliation(s)
- Jun Yan
- Research Center of Modernization of Traditional Chinese Medicine, Central South University, Changsha 410083, P. R. China tel: +86 731 8830831; fax: +86 731 8830831
| | - Wei-Wei Zhu
- Department of Chemical and Bioscience, HeChi University, YiZhou 546300, P. R. China
| | - Bo Kong
- Technology Center of China Tobacco Hunan Industrial Co., LTD, Changsha 410014, P. R. China
| | - Hong-Bing Lu
- Technology Center of China Tobacco Hunan Industrial Co., LTD, Changsha 410014, P. R. China
| | - Yong-Huan Yun
- Research Center of Modernization of Traditional Chinese Medicine, Central South University, Changsha 410083, P. R. China tel: +86 731 8830831; fax: +86 731 8830831
| | - Jian-Hua Huang
- Research Center of Modernization of Traditional Chinese Medicine, Central South University, Changsha 410083, P. R. China tel: +86 731 8830831; fax: +86 731 8830831
| | - Yi-Zeng Liang
- Research Center of Modernization of Traditional Chinese Medicine, Central South University, Changsha 410083, P. R. China tel: +86 731 8830831; fax: +86 731 8830831.
| |
Collapse
|
24
|
Zheng W, Tian D, Wang X, Tian W, Zhang H, Jiang S, He G, Zheng Y, Qu W. Support vector machine: Classifying and predicting mutagenicity of complex mixtures based on pollution profiles. Toxicology 2013; 313:151-9. [DOI: 10.1016/j.tox.2013.01.016] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2012] [Revised: 09/28/2012] [Accepted: 01/22/2013] [Indexed: 01/12/2023]
|
25
|
Doucet JP, Doucet-Panaye A, Devillers J. Structure-activity relationship study of trifluoromethylketones: inhibitors of insect juvenile hormone esterase. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2013; 24:481-499. [PMID: 23721304 DOI: 10.1080/1062936x.2013.792499] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
The juvenile hormone esterase (JHE) regulates juvenile hormone titre in insect hemolymph during its larval development. It has been suggested that JHE could be targeted for use in insect control. This enzyme can also be considered as involved in the phenomenon of endocrine disruption by xenobiotics in beneficial insects. Consequently, there is a need to know the characteristics of the molecules able to act on the JHE. Trifluoromethylketones (TFKs) are the most potent JHE inhibitors found to date and different quantitative structure-activity relationships (QSARs) have been derived for this group of chemicals. In this context, a set of 181 TFKs (118 active and 63 inactive compounds), tested on Trichoplusia ni for their JHE inhibition activity and described by physico-chemical descriptors, was split into different training and test sets to derive structure-activity relationship (SAR) models from support vector classification (SVC). A SVC model including 88 descriptors and derived from a Gaussian kernel was selected for its predictive performances. Another model computed only with 13 descriptors was also selected due to its mechanistic interpretability. This study clearly illustrates the difficulty in capturing the essential structural characteristics of the TFKs explaining their JHE inhibitory activity.
Collapse
Affiliation(s)
- J P Doucet
- ITODYS, UMR 7086, Université Paris 7, Paris, France.
| | | | | |
Collapse
|