1
|
Ylipää E, Chavan S, Bånkestad M, Broberg J, Glinghammar B, Norinder U, Cotgreave I. hERG-toxicity prediction using traditional machine learning and advanced deep learning techniques. Curr Res Toxicol 2023; 5:100121. [PMID: 37701072 PMCID: PMC10493507 DOI: 10.1016/j.crtox.2023.100121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 08/24/2023] [Accepted: 08/30/2023] [Indexed: 09/14/2023] Open
Abstract
The rise of artificial intelligence (AI) based algorithms has gained a lot of interest in the pharmaceutical development field. Our study demonstrates utilization of traditional machine learning techniques such as random forest (RF), support-vector machine (SVM), extreme gradient boosting (XGBoost), deep neural network (DNN) as well as advanced deep learning techniques like gated recurrent unit-based DNN (GRU-DNN) and graph neural network (GNN), towards predicting human ether-á-go-go related gene (hERG) derived toxicity. Using the largest hERG dataset derived to date, we have utilized 203,853 and 87,366 compounds for training and testing the models, respectively. The results show that GNN, SVM, XGBoost, DNN, RF, and GRU-DNN all performed well, with validation set AUC ROC scores equals 0.96, 0.95, 0.95, 0.94, 0.94 and 0.94, respectively. The GNN was found to be the top performing model based on predictive power and generalizability. The GNN technique is free of any feature engineering steps while having a minimal human intervention. The GNN approach may serve as a basis for comprehensive automation in predictive toxicology. We believe that the models presented here may serve as a promising tool, both for academic institutes as well as pharmaceutical industries, in predicting hERG-liability in new molecular structures.
Collapse
Affiliation(s)
- Erik Ylipää
- Computer Systems Unit, Research Institutes of Sweden RISE, Kista 164 40, Sweden
| | - Swapnil Chavan
- Unit of Chemical and Pharmaceutical Toxicology, Research Institutes of Sweden RISE, Södertalje 151 36, Sweden
| | - Maria Bånkestad
- Computer Systems Unit, Research Institutes of Sweden RISE, Kista 164 40, Sweden
| | - Johan Broberg
- Computer Systems Unit, Research Institutes of Sweden RISE, Kista 164 40, Sweden
| | - Björn Glinghammar
- Preclinical Development & Translational Medicine, Swedish Orphan Biovitrum AB, Solna 171 65, Sweden
| | - Ulf Norinder
- Department of Computer and Systems Sciences, Stockholm University, Kista 164 07, Sweden
| | - Ian Cotgreave
- Unit of Chemical and Pharmaceutical Toxicology, Research Institutes of Sweden RISE, Södertalje 151 36, Sweden
| |
Collapse
|
2
|
Sosnin S, Karlov D, Tetko IV, Fedorov MV. Comparative Study of Multitask Toxicity Modeling on a Broad Chemical Space. J Chem Inf Model 2019; 59:1062-1072. [PMID: 30589269 DOI: 10.1021/acs.jcim.8b00685] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Acute toxicity is one of the most challenging properties to predict purely with computational methods due to its direct relationship to biological interactions. Moreover, toxicity can be represented by different end points: it can be measured for different species using different types of administration, etc., and it is questionable if the knowledge transfer between end points is possible. We performed a comparative study of prediction multitask toxicity for a broad chemical space using different descriptors and modeling algorithms and applied multitask learning for a large toxicity data set extracted from the Registry of Toxic Effects of Chemical Substances (RTECS). We demonstrated that multitask modeling provides significant improvement over single-output models and other machine learning methods. Our research reveals that multitask learning can be very useful to improve the quality of acute toxicity modeling and raises a discussion about the usage of multitask approaches for regulation purposes. Our MultiTox models are freely available in OCHEM platform ( ochem.eu/multitox ) under CC-BY-NC license.
Collapse
Affiliation(s)
- Sergey Sosnin
- Skolkovo Institute of Science and Technology , Skolkovo Innovation Center , Moscow 143026 , Russia
| | - Dmitry Karlov
- Skolkovo Institute of Science and Technology , Skolkovo Innovation Center , Moscow 143026 , Russia
| | - Igor V Tetko
- Helmholtz Zentrum München-Research Center for Environmental Health (GmbH) , Institute of Structural Biology and BIGCHEM GmbH , Ingolstädter Landstraße 1 , D-85764 Neuherberg , Germany
| | - Maxim V Fedorov
- Skolkovo Institute of Science and Technology , Skolkovo Innovation Center , Moscow 143026 , Russia.,University of Strathclyde , Department of Physics , John Anderson Building, 107 Rottenrow East , Glasgow , U.K. G40NG
| |
Collapse
|
3
|
Karlov DS, Sosnin S, Tetko IV, Fedorov MV. Chemical space exploration guided by deep neural networks. RSC Adv 2019; 9:5151-5157. [PMID: 35514634 PMCID: PMC9060647 DOI: 10.1039/c8ra10182e] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 01/29/2019] [Indexed: 11/21/2022] Open
Abstract
A parametric t-SNE approach based on deep feed-forward neural networks was applied to the chemical space visualization problem. It is able to retain more information than certain dimensionality reduction techniques used for this purpose (principal component analysis (PCA), multidimensional scaling (MDS)). The applicability of this method to some chemical space navigation tasks (activity cliffs and activity landscapes identification) is discussed. We created a simple web tool to illustrate our work (http://space.syntelly.com). A parametric t-SNE approach based on deep feed-forward neural networks was applied to the chemical space visualization problem.![]()
Collapse
Affiliation(s)
- Dmitry S. Karlov
- Skolkovo Institute of Science and Technology
- Skolkovo Innovation Center
- Moscow 143026
- Russia
| | - Sergey Sosnin
- Skolkovo Institute of Science and Technology
- Skolkovo Innovation Center
- Moscow 143026
- Russia
- Syntelly LLC
| | - Igor V. Tetko
- Helmholtz Zentrum München – Research Center for Environmental Health (GmbH)
- Institute of Structural Biology
- Germany
- BIGCHEM GmbH
- Germany
| | - Maxim V. Fedorov
- Skolkovo Institute of Science and Technology
- Skolkovo Innovation Center
- Moscow 143026
- Russia
- Syntelly LLC
| |
Collapse
|
4
|
Sun L, Yang H, Li J, Wang T, Li W, Liu G, Tang Y. In Silico Prediction of Compounds Binding to Human Plasma Proteins by QSAR Models. ChemMedChem 2017; 13:572-581. [DOI: 10.1002/cmdc.201700582] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Revised: 10/18/2017] [Indexed: 12/18/2022]
Affiliation(s)
- Lixia Sun
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy; East China University of Science and Technology; Shanghai 200237 China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy; East China University of Science and Technology; Shanghai 200237 China
| | - Jie Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy; East China University of Science and Technology; Shanghai 200237 China
| | - Tianduanyi Wang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy; East China University of Science and Technology; Shanghai 200237 China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy; East China University of Science and Technology; Shanghai 200237 China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy; East China University of Science and Technology; Shanghai 200237 China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy; East China University of Science and Technology; Shanghai 200237 China
| |
Collapse
|
5
|
Chavan S, Abdelaziz A, Wiklander JG, Nicholls IA. A k-nearest neighbor classification of hERG K(+) channel blockers. J Comput Aided Mol Des 2016; 30:229-36. [PMID: 26860111 PMCID: PMC4802000 DOI: 10.1007/s10822-016-9898-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 01/28/2016] [Indexed: 01/08/2023]
Abstract
A series of 172 molecular structures that block the hERG K+ channel were used to develop a classification model where, initially, eight types of PaDEL fingerprints were used for k-nearest neighbor model development. A consensus model constructed using Extended-CDK, PubChem and Substructure count fingerprint-based models was found to be a robust predictor of hERG activity. This consensus model demonstrated sensitivity and specificity values of 0.78 and 0.61 for the internal dataset compounds and 0.63 and 0.54 for the external (PubChem) dataset compounds, respectively. This model has identified the highest number of true positives (i.e. 140) from the PubChem dataset so far, as compared to other published models, and can potentially serve as a basis for the prediction of hERG active compounds. Validating this model against FDA-withdrawn substances indicated that it may even be useful for differentiating between mechanisms underlying QT prolongation.
Collapse
Affiliation(s)
- Swapnil Chavan
- Bioorganic and Biophysical Chemistry Laboratory, Department of Chemistry and Biomedical Sciences, Linnaeus University Centre for Biomaterials Chemistry, Linnaeus University, 391 82, Kalmar, Sweden.
| | - Ahmed Abdelaziz
- eADMET GmbH, Lichtenbergstraße 8, 85748, Garching, Munich, Germany
| | - Jesper G Wiklander
- Bioorganic and Biophysical Chemistry Laboratory, Department of Chemistry and Biomedical Sciences, Linnaeus University Centre for Biomaterials Chemistry, Linnaeus University, 391 82, Kalmar, Sweden
| | - Ian A Nicholls
- Bioorganic and Biophysical Chemistry Laboratory, Department of Chemistry and Biomedical Sciences, Linnaeus University Centre for Biomaterials Chemistry, Linnaeus University, 391 82, Kalmar, Sweden. .,Department of Chemistry-BMC, Uppsala University, Box 576, 751 23, Uppsala, Sweden.
| |
Collapse
|
6
|
Computational investigations of hERG channel blockers: New insights and current predictive models. Adv Drug Deliv Rev 2015; 86:72-82. [PMID: 25770776 DOI: 10.1016/j.addr.2015.03.003] [Citation(s) in RCA: 73] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Revised: 01/13/2015] [Accepted: 03/04/2015] [Indexed: 01/08/2023]
Abstract
Identification of potential human Ether-a-go-go Related-Gene (hERG) potassium channel blockers is an essential part of the drug development and drug safety process in pharmaceutical industries or academic drug discovery centers, as they may lead to drug-induced QT prolongation, arrhythmia and Torsade de Pointes. Recent reports also suggest starting to address such issues at the hit selection stage. In order to prioritize molecules during the early drug discovery phase and to reduce the risk of drug attrition due to cardiotoxicity during pre-clinical and clinical stages, computational approaches have been developed to predict the potential hERG blockage of new drug candidates. In this review, we will describe the current in silico methods developed and applied to predict and to understand the mechanism of actions of hERG blockers, including ligand-based and structure-based approaches. We then discuss ongoing research on other ion channels and hERG polymorphism susceptible to be involved in LQTS and how systemic approaches can help in the drug safety decision.
Collapse
|
7
|
Abstract
The emphasis of this review is particularly on multivariate statistical methods currently used in quantitative structure–activity relationship (QSAR) studies.
Collapse
Affiliation(s)
- Somayeh Pirhadi
- Drug Design in Silico Lab
- Chemistry Faculty
- K. N. Toosi University of Technology
- Tehran
- Iran
| | | | - Jahan B. Ghasemi
- Drug Design in Silico Lab
- Chemistry Faculty
- K. N. Toosi University of Technology
- Tehran
- Iran
| |
Collapse
|
8
|
Jeon EH, Park JH, Jeong JH, Lee SK. 2D-QSAR analysis for hERG ion channel inhibitors. ANALYTICAL SCIENCE AND TECHNOLOGY 2011. [DOI: 10.5806/ast.2011.24.6.533] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
9
|
Affiliation(s)
- Melvin J. Yu
- Eisai Incorporated, 4 Corporate Drive, Andover, Massachusetts 01810
| |
Collapse
|
10
|
Michielan L, Moro S. Pharmaceutical Perspectives of Nonlinear QSAR Strategies. J Chem Inf Model 2010; 50:961-78. [DOI: 10.1021/ci100072z] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- Lisa Michielan
- Molecular Modeling Section (MMS), Dipartimento di Scienze Farmaceutiche, Università di Padova, via Marzolo 5, I-35131 Padova, Italy
| | - Stefano Moro
- Molecular Modeling Section (MMS), Dipartimento di Scienze Farmaceutiche, Università di Padova, via Marzolo 5, I-35131 Padova, Italy
| |
Collapse
|
11
|
Li J, Li S, Lei B, Liu H, Yao X, Liu M, Gramatica P. A new strategy to improve the predictive ability of the local lazy regression and its application to the QSAR study of melanin-concentrating hormone receptor 1 antagonists. J Comput Chem 2010; 31:973-85. [PMID: 19670228 DOI: 10.1002/jcc.21383] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
In the quantitative structure-activity relationship (QSAR) study, local lazy regression (LLR) can predict the activity of a query molecule by using the information of its local neighborhood without need to produce QSAR models a priori. When a prediction is required for a query compound, a set of local models including different number of nearest neighbors are identified. The leave-one-out cross-validation (LOO-CV) procedure is usually used to assess the prediction ability of each model, and the model giving the lowest LOO-CV error or highest LOO-CV correlation coefficient is chosen as the best model. However, it has been proved that the good statistical value from LOO cross-validation appears to be the necessary, but not the sufficient condition for the model to have a high predictive power. In this work, a new strategy is proposed to improve the predictive ability of LLR models and to access the accuracy of a query prediction. The bandwidth of k neighbor value for LLR is optimized by considering the predictive ability of local models using an external validation set. This approach was applied to the QSAR study of a series of thienopyrimidinone antagonists of melanin-concentrating hormone receptor 1. The obtained results from the new strategy shows evident improvement compared with the commonly used LOO-CV LLR methods and the traditional global linear model.
Collapse
Affiliation(s)
- Jiazhong Li
- State Key Laboratory of Applied Organic Chemistry, Department of Chemistry, Lanzhou University, Lanzhou 730000, China
| | | | | | | | | | | | | |
Collapse
|
12
|
Gunturi SB, Theerthala SS, Patel NK, Bahl J, Narayanan R. Prediction of skin sensitization potential using D-optimal design and GA-kNN classification methods. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2010; 21:305-335. [PMID: 20544553 DOI: 10.1080/10629361003773955] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Modelling of skin sensitization data of 255 diverse compounds and 450 calculated descriptors was performed to develop global predictive classification models that are applicable to whole chemical space. With this aim, we employed two automated procedures, (a) D-optimal design to select optimal members of the training and test sets and (b) k-Nearest Neighbour classification (kNN) method along with Genetic Algorithms (GA-kNN Classification) to select significant and independent descriptors in order to build the models. This methodology helped us to derive multiple models, M1-M5, that are stable and robust. The best among them, model M1 (CCR(train) = 84.3%, CCR(test) = 87.2% and CCR(ext) = 80.4%), is based on six neighbours and nine descriptors and further suggests that: (a) it is stable and robust and performs better than the reported models in literature, and (b) the combination of D-optimal design and GA-kNN classification approach is a very promising approach. Consensus prediction based on the models M1-M5 improved the CCR of training, test and external validation datasets by 3.8%, 4.45% and 3.85%, respectively, over M1. From the analysis of the physical meaning of the selected descriptors, it is inferred that the skin sensitization potential of small organic compounds can be accurately predicted using calculated descriptors that code for the following fundamental properties: (i) lipophilicity, (ii) atomic polarizability, (iii) shape, (iii) electrostatic interactions, and (iv) chemical reactivity.
Collapse
Affiliation(s)
- S B Gunturi
- Innovation Labs Hyderabad, Tata Consultancy Services Limited, #1, Software Units Layout, Madhapur, Hyderabad - 500 081, India
| | | | | | | | | |
Collapse
|
13
|
Current mathematical methods used in QSAR/QSPR studies. Int J Mol Sci 2009; 10:1978-1998. [PMID: 19564933 PMCID: PMC2695261 DOI: 10.3390/ijms10051978] [Citation(s) in RCA: 124] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2009] [Accepted: 04/28/2009] [Indexed: 02/07/2023] Open
Abstract
This paper gives an overview of the mathematical methods currently used in quantitative structure-activity/property relationship (QASR/QSPR) studies. Recently, the mathematical methods applied to the regression of QASR/QSPR models are developing very fast, and new methods, such as Gene Expression Programming (GEP), Project Pursuit Regression (PPR) and Local Lazy Regression (LLR) have appeared on the QASR/QSPR stage. At the same time, the earlier methods, including Multiple Linear Regression (MLR), Partial Least Squares (PLS), Neural Networks (NN), Support Vector Machine (SVM) and so on, are being upgraded to improve their performance in QASR/QSPR studies. These new and upgraded methods and algorithms are described in detail, and their advantages and disadvantages are evaluated and discussed, to show their application potential in QASR/QSPR studies in the future.
Collapse
|