1
|
Vistoli G, Talarico C, Vittorio S, Lunghini F, Mazzolari A, Beccari A, Pedretti A. Approaching Pharmacological Space: Events and Components. Methods Mol Biol 2025; 2834:151-169. [PMID: 39312164 DOI: 10.1007/978-1-0716-4003-6_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
The pharmacological space comprises all the dynamic events that determine the bioactivity (and/or the metabolism and toxicity) of a given ligand. The pharmacological space accounts for the structural flexibility and property variability of the two interacting molecules as well as for the mutual adaptability characterizing their molecular recognition process. The dynamic behavior of all these events can be described by a set of possible states (e.g., conformations, binding modes, isomeric forms) that the simulated systems can assume. For each monitored state, a set of state-dependent ligand- and structure-based descriptors can be calculated. Instead of considering only the most probable state (as routinely done), the pharmacological space proposes to consider all the monitored states. For each state-dependent descriptor, the corresponding space can be evaluated by calculating various dynamic parameters such as mean and range values.The reviewed examples emphasize that the pharmacological space can find fruitful applications in structure-based virtual screening as well as in toxicity prediction. In detail, in all reported examples, the inclusion of the pharmacological space parameters enhances the resulting performances. Beneficial effects are obtained by combining both different binding modes to account for ligand mobility and different target structures to account for protein flexibility/adaptability.The proposed computational workflow that combines docking simulations and rescoring analyses to enrich the arsenal of docking-based descriptors revealed a general applicability regardless of the considered target and utilized docking engine. Finally, the EFO approach that generates consensus models by linearly combining various descriptors yielded highly performing models in all discussed virtual screening campaigns.
Collapse
Affiliation(s)
- Giulio Vistoli
- Dipartimento di Scienze Farmaceutiche, Università Degli Studi di Milano, Milan, Italy.
| | | | - Serena Vittorio
- Dipartimento di Scienze Farmaceutiche, Università Degli Studi di Milano, Milan, Italy
| | | | - Angelica Mazzolari
- Dipartimento di Scienze Farmaceutiche, Università Degli Studi di Milano, Milan, Italy
| | | | - Alessandro Pedretti
- Dipartimento di Scienze Farmaceutiche, Università Degli Studi di Milano, Milan, Italy
| |
Collapse
|
2
|
Wang N, Li X, Xiao J, Liu S, Cao D. Data-driven toxicity prediction in drug discovery: Current status and future directions. Drug Discov Today 2024; 29:104195. [PMID: 39357621 DOI: 10.1016/j.drudis.2024.104195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 09/13/2024] [Accepted: 09/26/2024] [Indexed: 10/04/2024]
Abstract
Early toxicity assessment plays a vital role in the drug discovery process on account of its significant influence on the attrition rate of candidates. Recently, constant upgrading of information technology has greatly promoted the continuous development of toxicity prediction. To give an overview of the current state of data-driven toxicity prediction, we reviewed relevant studies and summarized them in three main respects: the features and difficulties of toxicity prediction, the evolution of modeling approaches, and the available tools for toxicity prediction. For each part, we expound the research status, existing challenges, and feasible solutions. Finally, several new directions and suggestions for toxicity prediction are also put forward.
Collapse
Affiliation(s)
- Ningning Wang
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; The Hunan Institute of Pharmacy Practice and Clinical Research, Changsha 410008 Hunan, PR China
| | - Xinliang Li
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; The Hunan Institute of Pharmacy Practice and Clinical Research, Changsha 410008 Hunan, PR China
| | - Jing Xiao
- Hunan Institute for Drug Control, Changsha 410001 Hunan, PR China
| | - Shao Liu
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; The Hunan Institute of Pharmacy Practice and Clinical Research, Changsha 410008 Hunan, PR China.
| | - Dongsheng Cao
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, PR China.
| |
Collapse
|
3
|
Xu J, Wang Z, Niu Y, Tang Y, Wang Y, Huang J, Leung ELH. TRP channels in cancer: Therapeutic opportunities and research strategies. Pharmacol Res 2024; 209:107412. [PMID: 39303771 DOI: 10.1016/j.phrs.2024.107412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 09/11/2024] [Accepted: 09/11/2024] [Indexed: 09/22/2024]
Abstract
The influence of gut microbiota on transient receptor potential (TRP) channels has been identified as an important element in the development of gastrointestinal conditions, yet its involvement in cancer progression is not as thoroughly understood. This review explores the multifaceted roles of TRP channels in oncogenesis and emphasizes their significance in cancer progression and therapeutic outcomes. Critical focus was placed on the influence of traditional medicines, such as traditional Chinese medicine (TCM) related aromatic medicines, on TRP channel functions. Moreover, we explored the interplay between the gut microbiota and TRP channels in cancer signaling, highlighting the therapeutic potential of targeting this axis in cancer treatment. The impact of current therapies on TRP channel function was examined, demonstrating the need for a comprehensive understanding of how different modalities affect TRP channels in cancer. Technological advancements, including artificial intelligence (AI) tools and computer-aided drug development (CADD), have been discussed in the context of leveraging TRP channels for innovative cancer therapies. Future directions emphasize the potential applications of TRP channel research in advancing cancer treatment and enhancing patients' well-being.
Collapse
Affiliation(s)
- Jiahui Xu
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau SAR, China; MOE Frontiers Science Center for Precision Oncology, University of Macau, Macau SAR, China
| | - Ziming Wang
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau SAR, China; MOE Frontiers Science Center for Precision Oncology, University of Macau, Macau SAR, China
| | - Yuqing Niu
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau SAR, China; MOE Frontiers Science Center for Precision Oncology, University of Macau, Macau SAR, China
| | - Yuping Tang
- Key Laboratory of Shaanxi Administration of Traditional Chinese Medicine for TCM Compatibility, Shaanxi University of Chinese Medicine, Xianyang, Shaanxi 712046, China
| | - Yuwei Wang
- Key Laboratory of Shaanxi Administration of Traditional Chinese Medicine for TCM Compatibility, Shaanxi University of Chinese Medicine, Xianyang, Shaanxi 712046, China.
| | - Jumin Huang
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau SAR, China; MOE Frontiers Science Center for Precision Oncology, University of Macau, Macau SAR, China.
| | - Elaine Lai-Han Leung
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau SAR, China; MOE Frontiers Science Center for Precision Oncology, University of Macau, Macau SAR, China; State Key Laboratory of Quality Research in Chinese Medicine, University of Macau, Macau SAR, China.
| |
Collapse
|
4
|
Liu J, Khan MKH, Guo W, Dong F, Ge W, Zhang C, Gong P, Patterson TA, Hong H. Machine learning and deep learning approaches for enhanced prediction of hERG blockade: a comprehensive QSAR modeling study. Expert Opin Drug Metab Toxicol 2024; 20:665-684. [PMID: 38968091 DOI: 10.1080/17425255.2024.2377593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 06/26/2024] [Indexed: 07/07/2024]
Abstract
BACKGROUND Cardiotoxicity is a major cause of drug withdrawal. The hERG channel, regulating ion flow, is pivotal for heart and nervous system function. Its blockade is a concern in drug development. Predicting hERG blockade is essential for identifying cardiac safety issues. Various QSAR models exist, but their performance varies. Ongoing improvements show promise, necessitating continued efforts to enhance accuracy using emerging deep learning algorithms in predicting potential hERG blockade. STUDY DESIGN AND METHOD Using a large training dataset, six individual QSAR models were developed. Additionally, three ensemble models were constructed. All models were evaluated using 10-fold cross-validations and two external datasets. RESULTS The 10-fold cross-validations resulted in Mathews correlation coefficient (MCC) values from 0.682 to 0.730, surpassing the best-reported model on the same dataset (0.689). External validations yielded MCC values from 0.520 to 0.715 for the first dataset, exceeding those of previously reported models (0-0.599). For the second dataset, MCC values fell between 0.025 and 0.215, aligning with those of reported models (0.112-0.220). CONCLUSIONS The developed models can assist the pharmaceutical industry and regulatory agencies in predicting hERG blockage activity, thereby enhancing safety assessments and reducing the risk of adverse cardiac events associated with new drug candidates.
Collapse
Affiliation(s)
- Jie Liu
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Md Kamrul Hasan Khan
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Wenjing Guo
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Fan Dong
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Weigong Ge
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, USA
| | - Ping Gong
- Environmental Laboratory, US Army Engineer Research and Development Center, Vicksburg, MS, USA
| | - Tucker A Patterson
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Huixiao Hong
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| |
Collapse
|
5
|
Sanches IH, Braga RC, Alves VM, Andrade CH. Enhancing hERG Risk Assessment with Interpretable Classificatory and Regression Models. Chem Res Toxicol 2024; 37:910-922. [PMID: 38781421 PMCID: PMC11187631 DOI: 10.1021/acs.chemrestox.3c00400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 04/22/2024] [Accepted: 05/14/2024] [Indexed: 05/25/2024]
Abstract
The human Ether-à-go-go-Related Gene (hERG) is a transmembrane protein that regulates cardiac action potential, and its inhibition can induce a potentially deadly cardiac syndrome. In vitro tests help identify hERG blockers at early stages; however, the high cost motivates searching for alternative, cost-effective methods. The primary goal of this study was to enhance the Pred-hERG tool for predicting hERG blockage. To achieve this, we developed new QSAR models that incorporated additional data, updated existing classificatory and multiclassificatory models, and introduced new regression models. Notably, we integrated SHAP (SHapley Additive exPlanations) values to offer a visual interpretation of these models. Utilizing the latest data from ChEMBL v30, encompassing over 14,364 compounds with hERG data, our binary and multiclassification models outperformed both the previous iteration of Pred-hERG and all publicly available models. Notably, the new version of our tool introduces a regression model for predicting hERG activity (pIC50). The optimal model demonstrated an R2 of 0.61 and an RMSE of 0.48, surpassing the only available regression model in the literature. Pred-hERG 5.0 now offers users a swift, reliable, and user-friendly platform for the early assessment of chemically induced cardiotoxicity through hERG blockage. The tool provides versatile outcomes, including (i) classificatory predictions of hERG blockage with prediction reliability, (ii) multiclassificatory predictions of hERG blockage with reliability, (iii) regression predictions with estimated pIC50 values, and (iv) probability maps illustrating the contribution of chemical fragments for each prediction. Furthermore, we implemented explainable AI analysis (XAI) to visualize SHAP values, providing insights into the contribution of each feature to binary classification predictions. A consensus prediction calculated based on the predictions of the three developed models is also present to assist the user's decision-making process. Pred-hERG 5.0 has been designed to be user-friendly, making it accessible to users without computational or programming expertise. The tool is freely available at http://predherg.labmol.com.br.
Collapse
Affiliation(s)
- Igor H. Sanches
- Laboratory
for Molecular Modeling and Drug Design (LabMol), Faculty of Pharmacy, Universidade Federal de Goiás, Goiânia, GO 74690-900, Brazil
- Center
for Excellence in Artificial Intelligence (CEIA), Institute of Informatics, Universidade Federal de Goiás, Goiânia, GO 74690-900, Brazil
- Center
for the Research and Advancement in Fragments and Molecular Targets
(CRAFT), School of Pharmaceutical Sciences at Ribeirao Preto, University of São Paulo, Ribeirão Preto, SP 05508-220, Brazil
| | | | - Vinicius M. Alves
- University
of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Carolina Horta Andrade
- Laboratory
for Molecular Modeling and Drug Design (LabMol), Faculty of Pharmacy, Universidade Federal de Goiás, Goiânia, GO 74690-900, Brazil
- Center
for Excellence in Artificial Intelligence (CEIA), Institute of Informatics, Universidade Federal de Goiás, Goiânia, GO 74690-900, Brazil
- Center
for the Research and Advancement in Fragments and Molecular Targets
(CRAFT), School of Pharmaceutical Sciences at Ribeirao Preto, University of São Paulo, Ribeirão Preto, SP 05508-220, Brazil
| |
Collapse
|
6
|
Fan Z, Yu J, Zhang X, Chen Y, Sun S, Zhang Y, Chen M, Xiao F, Wu W, Li X, Zheng M, Luo X, Wang D. Reducing overconfident errors in molecular property classification using Posterior Network. PATTERNS (NEW YORK, N.Y.) 2024; 5:100991. [PMID: 39005492 PMCID: PMC11240180 DOI: 10.1016/j.patter.2024.100991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/20/2023] [Accepted: 04/15/2024] [Indexed: 07/16/2024]
Abstract
Deep-learning-based classification models are increasingly used for predicting molecular properties in drug development. However, traditional classification models using the Softmax function often give overconfident mispredictions for out-of-distribution samples, highlighting a critical lack of accurate uncertainty estimation. Such limitations can result in substantial costs and should be avoided during drug development. Inspired by advances in evidential deep learning and Posterior Network, we replaced the Softmax function with a normalizing flow to enhance the uncertainty estimation ability of the model in molecular property classification. The proposed strategy was evaluated across diverse scenarios, including simulated experiments based on a synthetic dataset, ADMET predictions, and ligand-based virtual screening. The results demonstrate that compared with the vanilla model, the proposed strategy effectively alleviates the problem of giving overconfident but incorrect predictions. Our findings support the promising application of evidential deep learning in drug development and offer a valuable framework for further research.
Collapse
Affiliation(s)
- Zhehuan Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing 100049, China
| | - Jie Yu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing 100049, China
| | - Xiang Zhang
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Yijie Chen
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Shihui Sun
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Yuanyuan Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing 100049, China
| | - Mingan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai 201210, China
- Lingang Laboratory, Shanghai 200031, China
| | - Fu Xiao
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Wenyong Wu
- Lingang Laboratory, Shanghai 200031, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing 100049, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing 100049, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing 100049, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | | |
Collapse
|
7
|
Boldini D, Ballabio D, Consonni V, Todeschini R, Grisoni F, Sieber SA. Effectiveness of molecular fingerprints for exploring the chemical space of natural products. J Cheminform 2024; 16:35. [PMID: 38528548 DOI: 10.1186/s13321-024-00830-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/17/2024] [Indexed: 03/27/2024] Open
Abstract
Natural products are a diverse class of compounds with promising biological properties, such as high potency and excellent selectivity. However, they have different structural motifs than typical drug-like compounds, e.g., a wider range of molecular weight, multiple stereocenters and higher fraction of sp3-hybridized carbons. This makes the encoding of natural products via molecular fingerprints difficult, thus restricting their use in cheminformatics studies. To tackle this issue, we explored over 30 years of research to systematically evaluate which molecular fingerprint provides the best performance on the natural product chemical space. We considered 20 molecular fingerprints from four different sources, which we then benchmarked on over 100,000 unique natural products from the COCONUT (COlleCtion of Open Natural prodUcTs) and CMNPD (Comprehensive Marine Natural Products Database) databases. Our analysis focused on the correlation between different fingerprints and their classification performance on 12 bioactivity prediction datasets. Our results show that different encodings can provide fundamentally different views of the natural product chemical space, leading to substantial differences in pairwise similarity and performance. While Extended Connectivity Fingerprints are the de-facto option to encoding drug-like compounds, other fingerprints resulted to match or outperform them for bioactivity prediction of natural products. These results highlight the need to evaluate multiple fingerprinting algorithms for optimal performance and suggest new areas of research. Finally, we provide an open-source Python package for computing all molecular fingerprints considered in the study, as well as data and scripts necessary to reproduce the results, at https://github.com/dahvida/NP_Fingerprints .
Collapse
Affiliation(s)
- Davide Boldini
- TUM School of Natural Sciences, Department of Bioscience, Technical University of Munich, Center for Functional Protein Assemblies (CPA), 85748, Garching bei München, Germany.
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.zza Della Scienza, 1, 20126, Milan, Italy
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.zza Della Scienza, 1, 20126, Milan, Italy
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.zza Della Scienza, 1, 20126, Milan, Italy
| | - Francesca Grisoni
- Institute for Complex Molecular Systems and Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, Netherlands
| | - Stephan A Sieber
- TUM School of Natural Sciences, Department of Bioscience, Technical University of Munich, Center for Functional Protein Assemblies (CPA), 85748, Garching bei München, Germany
| |
Collapse
|
8
|
Lee KH, Won SJ, Oyinloye P, Shi L. Unlocking the Potential of High-Quality Dopamine Transporter Pharmacological Data: Advancing Robust Machine Learning-Based QSAR Modeling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.06.583803. [PMID: 38558976 PMCID: PMC10979915 DOI: 10.1101/2024.03.06.583803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The dopamine transporter (DAT) plays a critical role in the central nervous system and has been implicated in numerous psychiatric disorders. The ligand-based approaches are instrumental to decipher the structure-activity relationship (SAR) of DAT ligands, especially the quantitative SAR (QSAR) modeling. By gathering and analyzing data from literature and databases, we systematically assemble a diverse range of ligands binding to DAT, aiming to discern the general features of DAT ligands and uncover the chemical space for potential novel DAT ligand scaffolds. The aggregation of DAT pharmacological activity data, particularly from databases like ChEMBL, provides a foundation for constructing robust QSAR models. The compilation and meticulous filtering of these data, establishing high-quality training datasets with specific divisions of pharmacological assays and data types, along with the application of QSAR modeling, prove to be a promising strategy for navigating the pertinent chemical space. Through a systematic comparison of DAT QSAR models using training datasets from various ChEMBL releases, we underscore the positive impact of enhanced data set quality and increased data set size on the predictive power of DAT QSAR models.
Collapse
Affiliation(s)
- Kuo Hao Lee
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse – Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| | - Sung Joon Won
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse – Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| | - Precious Oyinloye
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse – Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| | - Lei Shi
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse – Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| |
Collapse
|
9
|
Viganò EL, Ballabio D, Roncaglioni A. Artificial Intelligence and Machine Learning Methods to Evaluate Cardiotoxicity following the Adverse Outcome Pathway Frameworks. TOXICS 2024; 12:87. [PMID: 38276722 PMCID: PMC10820364 DOI: 10.3390/toxics12010087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/15/2024] [Accepted: 01/17/2024] [Indexed: 01/27/2024]
Abstract
Cardiovascular disease is a leading global cause of mortality. The potential cardiotoxic effects of chemicals from different classes, such as environmental contaminants, pesticides, and drugs can significantly contribute to effects on health. The same chemical can induce cardiotoxicity in different ways, following various Adverse Outcome Pathways (AOPs). In addition, the potential synergistic effects between chemicals further complicate the issue. In silico methods have become essential for tackling the problem from different perspectives, reducing the need for traditional in vivo testing, and saving valuable resources in terms of time and money. Artificial intelligence (AI) and machine learning (ML) are among today's advanced approaches for evaluating chemical hazards. They can serve, for instance, as a first-tier component of Integrated Approaches to Testing and Assessment (IATA). This study employed ML and AI to assess interactions between chemicals and specific biological targets within the AOP networks for cardiotoxicity, starting with molecular initiating events (MIEs) and progressing through key events (KEs). We explored methods to encode chemical information in a suitable way for ML and AI. We started with commonly used approaches in Quantitative Structure-Activity Relationship (QSAR) methods, such as molecular descriptors and different types of fingerprint. We then increased the complexity of encoders, incorporating graph-based methods, auto-encoders, and character embeddings employed in neural language processing. We also developed a multimodal neural network architecture, capable of considering the complementary nature of different chemical representations simultaneously. The potential of this approach, compared to more conventional architectures designed to handle a single encoder, becomes apparent when the amount of data increases.
Collapse
Affiliation(s)
- Edoardo Luca Viganò
- Laboratory of Environmental Toxicology and Chemistry, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCSS, 20156 Milan, Italy;
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, 20126 Milan, Italy;
| | - Alessandra Roncaglioni
- Laboratory of Environmental Toxicology and Chemistry, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCSS, 20156 Milan, Italy;
| |
Collapse
|
10
|
Liu W, Hopkins AM, Yan P, Du S, Luyt LG, Li Y, Hou J. Can machine learning 'transform' peptides/peptidomimetics into small molecules? A case study with ghrelin receptor ligands. Mol Divers 2023; 27:2239-2255. [PMID: 36331785 DOI: 10.1007/s11030-022-10555-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 10/19/2022] [Indexed: 11/06/2022]
Abstract
There has been considerable interest in transforming peptides into small molecules as peptide-based molecules often present poorer bioavailability and lower metabolic stability. Our studies looked into building machine learning (ML) models to investigate if ML is able to identify the 'bioactive' features of peptides and use the features to accurately discriminate between binding and non-binding small molecules. The ghrelin receptor (GR), a receptor that is implicated in various diseases, was used as an example to demonstrate whether ML models derived from a peptide library can be used to predict small molecule binders. ML models based on three different algorithms, namely random forest, support vector machine, and extreme gradient boosting, were built based on a carefully curated dataset of peptide/peptidomimetic and small molecule GR ligands. The results indicated that ML models trained with a dataset exclusively composed of peptides/peptidomimetics provide limited predictive power for small molecules, but that ML models trained with a diverse dataset composed of an array of both peptides/peptidomimetics and small molecules displayed exceptional results in terms of accuracy and false rates. The diversified models can accurately differentiate the binding small molecules from non-binding small molecules using an external validation set with new small molecules that we synthesized previously. Structural features that are the most critical contributors to binding activity were extracted and are remarkably consistent with the crystallography and mutagenesis studies.
Collapse
Affiliation(s)
- Wenjie Liu
- Department of Chemistry, Lakehead University and Thunder Bay Regional Health Research Institute, 980 Oliver Road, Thunder Bay, ON, P7B 6V4, Canada
| | - Austin M Hopkins
- Department of Chemistry, Lakehead University and Thunder Bay Regional Health Research Institute, 980 Oliver Road, Thunder Bay, ON, P7B 6V4, Canada
| | - Peizhi Yan
- Department of Electrical and Computer Engineering, The University of British Columbia, Vancouver, BC, Canada
| | - Shan Du
- Department of Computer Science, Mathematics, Physics and Statistics, The University of British Columbia, Okanagan, Kelowna, BC, Canada
| | - Leonard G Luyt
- Department of Chemistry, University of Western Ontario, London, ON, Canada
- London Regional Cancer Program, Lawson Health Research Institute, London, ON, Canada
| | - Yifeng Li
- Department of Computer Science, Brock University, Saint Catharines, ON, Canada
| | - Jinqiang Hou
- Department of Chemistry, Lakehead University and Thunder Bay Regional Health Research Institute, 980 Oliver Road, Thunder Bay, ON, P7B 6V4, Canada.
| |
Collapse
|
11
|
Ylipää E, Chavan S, Bånkestad M, Broberg J, Glinghammar B, Norinder U, Cotgreave I. hERG-toxicity prediction using traditional machine learning and advanced deep learning techniques. Curr Res Toxicol 2023; 5:100121. [PMID: 37701072 PMCID: PMC10493507 DOI: 10.1016/j.crtox.2023.100121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 08/24/2023] [Accepted: 08/30/2023] [Indexed: 09/14/2023] Open
Abstract
The rise of artificial intelligence (AI) based algorithms has gained a lot of interest in the pharmaceutical development field. Our study demonstrates utilization of traditional machine learning techniques such as random forest (RF), support-vector machine (SVM), extreme gradient boosting (XGBoost), deep neural network (DNN) as well as advanced deep learning techniques like gated recurrent unit-based DNN (GRU-DNN) and graph neural network (GNN), towards predicting human ether-á-go-go related gene (hERG) derived toxicity. Using the largest hERG dataset derived to date, we have utilized 203,853 and 87,366 compounds for training and testing the models, respectively. The results show that GNN, SVM, XGBoost, DNN, RF, and GRU-DNN all performed well, with validation set AUC ROC scores equals 0.96, 0.95, 0.95, 0.94, 0.94 and 0.94, respectively. The GNN was found to be the top performing model based on predictive power and generalizability. The GNN technique is free of any feature engineering steps while having a minimal human intervention. The GNN approach may serve as a basis for comprehensive automation in predictive toxicology. We believe that the models presented here may serve as a promising tool, both for academic institutes as well as pharmaceutical industries, in predicting hERG-liability in new molecular structures.
Collapse
Affiliation(s)
- Erik Ylipää
- Computer Systems Unit, Research Institutes of Sweden RISE, Kista 164 40, Sweden
| | - Swapnil Chavan
- Unit of Chemical and Pharmaceutical Toxicology, Research Institutes of Sweden RISE, Södertalje 151 36, Sweden
| | - Maria Bånkestad
- Computer Systems Unit, Research Institutes of Sweden RISE, Kista 164 40, Sweden
| | - Johan Broberg
- Computer Systems Unit, Research Institutes of Sweden RISE, Kista 164 40, Sweden
| | - Björn Glinghammar
- Preclinical Development & Translational Medicine, Swedish Orphan Biovitrum AB, Solna 171 65, Sweden
| | - Ulf Norinder
- Department of Computer and Systems Sciences, Stockholm University, Kista 164 07, Sweden
| | - Ian Cotgreave
- Unit of Chemical and Pharmaceutical Toxicology, Research Institutes of Sweden RISE, Södertalje 151 36, Sweden
| |
Collapse
|
12
|
Boldini D, Grisoni F, Kuhn D, Friedrich L, Sieber SA. Practical guidelines for the use of gradient boosting for molecular property prediction. J Cheminform 2023; 15:73. [PMID: 37641120 PMCID: PMC10464382 DOI: 10.1186/s13321-023-00743-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 08/09/2023] [Indexed: 08/31/2023] Open
Abstract
Decision tree ensembles are among the most robust, high-performing and computationally efficient machine learning approaches for quantitative structure-activity relationship (QSAR) modeling. Among them, gradient boosting has recently garnered particular attention, for its performance in data science competitions, virtual screening campaigns, and bioactivity prediction. However, different variants of gradient boosting exist, the most popular being XGBoost, LightGBM and CatBoost. Our study provides the first comprehensive comparison of these approaches for QSAR. To this end, we trained 157,590 gradient boosting models, which were evaluated on 16 datasets and 94 endpoints, comprising 1.4 million compounds in total. Our results show that XGBoost generally achieves the best predictive performance, while LightGBM requires the least training time, especially for larger datasets. In terms of feature importance, the models surprisingly rank molecular features differently, reflecting differences in regularization techniques and decision tree structures. Thus, expert knowledge must always be employed when evaluating data-driven explanations of bioactivity. Furthermore, our results show that the relevance of each hyperparameter varies greatly across datasets and that it is crucial to optimize as many hyperparameters as possible to maximize the predictive performance. In conclusion, our study provides the first set of guidelines for cheminformatics practitioners to effectively train, optimize and evaluate gradient boosting models for virtual screening and QSAR applications.
Collapse
Affiliation(s)
- Davide Boldini
- Department of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, Garching bei Munich, Germany
| | - Francesca Grisoni
- Department of Biomedical Engineering, Institute for Complex Molecular Sciences, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/E, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | | | | | - Stephan A Sieber
- Department of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, Garching bei Munich, Germany.
| |
Collapse
|
13
|
Park GJ, Kang NS. ADis-QSAR: a machine learning model based on biological activity differences of compounds. J Comput Aided Mol Des 2023:10.1007/s10822-023-00517-1. [PMID: 37382799 DOI: 10.1007/s10822-023-00517-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 06/26/2023] [Indexed: 06/30/2023]
Abstract
Drug candidates identified by the pharmaceutical industry typically have unique structural characteristics to ensure they interact strongly and specifically with their biological targets. Identifying these characteristics is a key challenge for developing new drugs, and quantitative structure-activity relationship (QSAR) analysis has generally been used to perform this task. QSAR models with good predictive power improve the cost and time efficiencies invested in compound development. Generating these good models depends on how well differences between "active" and "inactive" compound groups can be conveyed to the model to be learned. Efforts to solve this difference issue have been made, including generating a "molecular descriptor" that compressively expresses the structural characteristics of compounds. From the same perspective, we succeeded in developing the Activity Differences-Quantitative Structure-Activity Relationship (ADis-QSAR) model by generating molecular descriptors that more explicitly convey features of the group through a pair system that performs direct connections between active and inactive groups. We used popular machine learning algorithms, such as Support Vector Machine, Random Forest, XGBoost and Multi-Layer Perceptron for model learning and evaluated the model using scores such as accuracy, area under curve, precision and specificity. The results showed that the Support Vector Machine performed better than the others. Notably, the ADis-QSAR model showed significant improvements in meaningful scores such as precision and specificity compared to the baseline model, even in datasets with dissimilar chemical spaces. This model reduces the risk of selecting false positive compounds, improving the efficiency of drug development.
Collapse
Affiliation(s)
- Gyoung Jin Park
- Graduate School of New Drug Discovery and Development, Chungnam National University, 99 Daehak-ro,Yuseong-gu, Daejeon, 34134, Korea
| | - Nam Sook Kang
- Graduate School of New Drug Discovery and Development, Chungnam National University, 99 Daehak-ro,Yuseong-gu, Daejeon, 34134, Korea.
| |
Collapse
|
14
|
Vandenberk B, Chew DS, Prasana D, Gupta S, Exner DV. Successes and challenges of artificial intelligence in cardiology. Front Digit Health 2023; 5:1201392. [PMID: 37448836 PMCID: PMC10336354 DOI: 10.3389/fdgth.2023.1201392] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 06/19/2023] [Indexed: 07/15/2023] Open
Abstract
In the past decades there has been a substantial evolution in data management and data processing techniques. New data architectures made analysis of big data feasible, healthcare is orienting towards personalized medicine with digital health initiatives, and artificial intelligence (AI) is becoming of increasing importance. Despite being a trendy research topic, only very few applications reach the stage where they are implemented in clinical practice. This review provides an overview of current methodologies and identifies clinical and organizational challenges for AI in healthcare.
Collapse
Affiliation(s)
- Bert Vandenberk
- Department of Cardiac Sciences, Libin Cardiovascular Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Cardiovascular Sciences, KU Leuven, Leuven, Belgium
| | - Derek S. Chew
- Department of Cardiac Sciences, Libin Cardiovascular Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Dinesh Prasana
- Intelense Inc., Markham, ON, Canada
- IOT/AI- Caliber Interconnect Pvt Ltd., Coimbatore, India
| | | | - Derek V. Exner
- Department of Cardiac Sciences, Libin Cardiovascular Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
15
|
Tran TTV, Surya Wibowo A, Tayara H, Chong KT. Artificial Intelligence in Drug Toxicity Prediction: Recent Advances, Challenges, and Future Perspectives. J Chem Inf Model 2023; 63:2628-2643. [PMID: 37125780 DOI: 10.1021/acs.jcim.3c00200] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Toxicity prediction is a critical step in the drug discovery process that helps identify and prioritize compounds with the greatest potential for safe and effective use in humans, while also reducing the risk of costly late-stage failures. It is estimated that over 30% of drug candidates are discarded owing to toxicity. Recently, artificial intelligence (AI) has been used to improve drug toxicity prediction as it provides more accurate and efficient methods for identifying the potentially toxic effects of new compounds before they are tested in human clinical trials, thus saving time and money. In this review, we present an overview of recent advances in AI-based drug toxicity prediction, including the use of various machine learning algorithms and deep learning architectures, of six major toxicity properties and Tox21 assay end points. Additionally, we provide a list of public data sources and useful toxicity prediction tools for the research community and highlight the challenges that must be addressed to enhance model performance. Finally, we discuss future perspectives for AI-based drug toxicity prediction. This review can aid researchers in understanding toxicity prediction and pave the way for new methods of drug discovery.
Collapse
Affiliation(s)
- Thi Tuyet Van Tran
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Faculty of Information Technology, An Giang University, Long Xuyen 880000, Vietnam
- Vietnam National University - Ho Chi Minh City, Ho Chi Minh 700000, Vietnam
| | - Agung Surya Wibowo
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Department of Electrical Engineering, Telkom University, Bandung 40257, Indonesia
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Kil To Chong
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea
| |
Collapse
|
16
|
Grisoni F. Chemical language models for de novo drug design: Challenges and opportunities. Curr Opin Struct Biol 2023; 79:102527. [PMID: 36738564 DOI: 10.1016/j.sbi.2023.102527] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 12/07/2022] [Accepted: 12/20/2022] [Indexed: 02/05/2023]
Abstract
Generative deep learning is accelerating de novo drug design, by allowing the generation of molecules with desired properties on demand. Chemical language models - which generate new molecules in the form of strings using deep learning - have been particularly successful in this endeavour. Thanks to advances in natural language processing methods and interdisciplinary collaborations, chemical language models are expected to become increasingly relevant in drug discovery. This minireview provides an overview of the current state-of-the-art of chemical language models for de novo design, and analyses current limitations, challenges, and advantages. Finally, a perspective on future opportunities is provided.
Collapse
Affiliation(s)
- Francesca Grisoni
- Eindhoven University of Technology, Institute for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven, Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Netherlands.
| |
Collapse
|
17
|
Vittorio S, Lunghini F, Pedretti A, Vistoli G, Beccari AR. Ensemble of structure and ligand-based classification models for hERG liability profiling. Front Pharmacol 2023; 14:1148670. [PMID: 37033661 PMCID: PMC10076575 DOI: 10.3389/fphar.2023.1148670] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 03/13/2023] [Indexed: 04/11/2023] Open
Abstract
Drug-induced cardiotoxicity represents one of the most critical safety concerns in the early stages of drug development. The blockade of the human ether-à-go-go-related potassium channel (hERG) is the most frequent cause of cardiotoxicity, as it is associated to long QT syndrome which can lead to fatal arrhythmias. Therefore, assessing hERG liability of new drugs candidates is crucial to avoid undesired cardiotoxic effects. In this scenario, computational approaches have emerged as useful tools for the development of predictive models able to identify potential hERG blockers. In the last years, several efforts have been addressed to generate ligand-based (LB) models due to the lack of experimental structural information about hERG channel. However, these methods rely on the structural features of the molecules used to generate the model and often fail in correctly predicting new chemical scaffolds. Recently, the 3D structure of hERG channel has been experimentally solved enabling the use of structure-based (SB) strategies which may overcome the limitations of the LB approaches. In this study, we compared the performances achieved by both LB and SB classifiers for hERG-related cardiotoxicity developed by using Random Forest algorithm and employing a training set containing 12789 hERG binders. The SB models were trained on a set of scoring functions computed by docking and rescoring calculations, while the LB classifiers were built on a set of physicochemical descriptors and fingerprints. Furthermore, models combining the LB and SB features were developed as well. All the generated models were internally validated by ten-fold cross-validation on the TS and further verified on an external test set. The former revealed that the best performance was achieved by the LB model, while the model combining the LB and the SB attributes displayed the best results when applied on the external test set highlighting the usefulness of the integration of LB and SB features in correctly predicting unseen molecules. Overall, our predictive models showed satisfactory performances providing new useful tools to filter out potential cardiotoxic drug candidates in the early phase of drug discovery.
Collapse
Affiliation(s)
- Serena Vittorio
- Dipartimento di Scienze Farmaceutiche, Università Degli Studi di Milano, Milano, Italy
| | | | - Alessandro Pedretti
- Dipartimento di Scienze Farmaceutiche, Università Degli Studi di Milano, Milano, Italy
| | - Giulio Vistoli
- Dipartimento di Scienze Farmaceutiche, Università Degli Studi di Milano, Milano, Italy
| | | |
Collapse
|
18
|
Melnikov F, Anger LT, Hasselgren C. Toward Quantitative Models in Safety Assessment: A Case Study to Show Impact of Dose-Response Inference on hERG Inhibition Models. Int J Mol Sci 2022; 24:ijms24010635. [PMID: 36614078 PMCID: PMC9820331 DOI: 10.3390/ijms24010635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 12/23/2022] [Accepted: 12/24/2022] [Indexed: 12/31/2022] Open
Abstract
Due to challenges with historical data and the diversity of assay formats, in silico models for safety-related endpoints are often based on discretized data instead of the data on a natural continuous scale. Models for discretized endpoints have limitations in usage and interpretation that can impact compound design. Here, we present a consistent data inference approach, exemplified on two data sets of Ether-à-go-go-Related Gene (hERG) K+ inhibition data, for dose-response and screening experiments that are generally applicable for in vitro assays. hERG inhibition has been associated with severe cardiac effects and is one of the more prominent safety targets assessed in drug development, using a wide array of in vitro and in silico screening methods. In this study, the IC50 for hERG inhibition is estimated from diverse historical proprietary data. The IC50 derived from a two-point proprietary screening data set demonstrated high correlation (R = 0.98, MAE = 0.08) with IC50s derived from six-point dose-response curves. Similar IC50 estimation accuracy was obtained on a public thallium flux assay data set (R = 0.90, MAE = 0.2). The IC50 data were used to develop a robust quantitative model. The model's MAE (0.47) and R2 (0.46) were on par with literature statistics and approached assay reproducibility. Using a continuous model has high value for pharmaceutical projects, as it enables rank ordering of compounds and evaluation of compounds against project-specific inhibition thresholds. This data inference approach can be widely applicable to assays with quantitative readouts and has the potential to impact experimental design and improve model performance, interpretation, and acceptance across many standard safety endpoints.
Collapse
|
19
|
Physicochemical QSAR analysis of hERG inhibition revisited: towards a quantitative potency prediction. J Comput Aided Mol Des 2022; 36:837-849. [PMID: 36305984 DOI: 10.1007/s10822-022-00483-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 10/04/2022] [Indexed: 01/07/2023]
Abstract
In an earlier study (Didziapetris R & Lanevskij K (2016). J Comput Aided Mol Des. 30:1175-1188) we collected a database of publicly available hERG inhibition data for almost 6700 drug-like molecules and built a probabilistic Gradient Boosting classifier with a minimal set of physicochemical descriptors (log P, pKa, molecular size and topology parameters). This approach favored interpretability over statistical performance but still achieved an overall classification accuracy of 75%. In the current follow-up work we expanded the database (provided in Supplementary Information) to almost 9400 molecules and performed temporal validation of the model on a set of novel chemicals from recently published lead optimization projects. Validation results showed almost no performance degradation compared to the original study. Additionally, we rebuilt the model using AFT (Accelerated Failure Time) learning objective in XGBoost, which accepts both quantitative and censored data often reported in protein inhibition studies. The new model achieved a similar level of accuracy of discerning hERG blockers from non-blockers at 10 µM threshold, which can be conceived as close to the performance ceiling for methods aiming to describe only non-specific ligand interactions with hERG. Yet, this model outputs quantitative potency values (IC50) and is not tied to a particular classification cut-off. pIC50 from patch-clamp measurements can be predicted with R2 ≈ 0.4 and MAE < 0.5, which enables ligand ranking according to their expected potency levels. The employed approach can be valuable for quantitative modeling of various ADME and drug safety endpoints with a high prevalence of censored data.
Collapse
|
20
|
Boldini D, Friedrich L, Kuhn D, Sieber SA. Tuning gradient boosting for imbalanced bioassay modelling with custom loss functions. J Cheminform 2022; 14:80. [DOI: 10.1186/s13321-022-00657-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 10/30/2022] [Indexed: 11/12/2022] Open
Abstract
AbstractWhile in the last years there has been a dramatic increase in the number of available bioassay datasets, many of them suffer from extremely imbalanced distribution between active and inactive compounds. Thus, there is an urgent need for novel approaches to tackle class imbalance in drug discovery. Inspired by recent advances in computer vision, we investigated a panel of alternative loss functions for imbalanced classification in the context of Gradient Boosting and benchmarked them on six datasets from public and proprietary sources, for a total of 42 tasks and 2 million compounds. Our findings show that with these modifications, we achieve statistically significant improvements over the conventional cross-entropy loss function on five out of six datasets. Furthermore, by employing these bespoke loss functions we are able to push Gradient Boosting to match or outperform a wide variety of previously reported classifiers and neural networks. We also investigate the impact of changing the loss function on training time and find that it increases convergence speed up to 8 times faster. As such, these results show that tuning the loss function for Gradient Boosting is a straightforward and computationally efficient method to achieve state-of-the-art performance on imbalanced bioassay datasets without compromising on interpretability and scalability.
Graphical Abstract
Collapse
|
21
|
Delre P, Lavado GJ, Lamanna G, Saviano M, Roncaglioni A, Benfenati E, Mangiatordi GF, Gadaleta D. Ligand-based prediction of hERG-mediated cardiotoxicity based on the integration of different machine learning techniques. Front Pharmacol 2022; 13:951083. [PMID: 36133824 PMCID: PMC9483173 DOI: 10.3389/fphar.2022.951083] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 07/20/2022] [Indexed: 11/13/2022] Open
Abstract
Drug-induced cardiotoxicity is a common side effect of drugs in clinical use or under postmarket surveillance and is commonly due to off-target interactions with the cardiac human-ether-a-go-go-related (hERG) potassium channel. Therefore, prioritizing drug candidates based on their hERG blocking potential is a mandatory step in the early preclinical stage of a drug discovery program. Herein, we trained and properly validated 30 ligand-based classifiers of hERG-related cardiotoxicity based on 7,963 curated compounds extracted by the freely accessible repository ChEMBL (version 25). Different machine learning algorithms were tested, namely, random forest, K-nearest neighbors, gradient boosting, extreme gradient boosting, multilayer perceptron, and support vector machine. The application of 1) the best practices for data curation, 2) the feature selection method VSURF, and 3) the synthetic minority oversampling technique (SMOTE) to properly handle the unbalanced data, allowed for the development of highly predictive models (BAMAX = 0.91, AUCMAX = 0.95). Remarkably, the undertaken temporal validation approach not only supported the predictivity of the herein presented classifiers but also suggested their ability to outperform those models commonly used in the literature. From a more methodological point of view, the study put forward a new computational workflow, freely available in the GitHub repository (https://github.com/PDelre93/hERG-QSAR), as valuable for building highly predictive models of hERG-mediated cardiotoxicity.
Collapse
Affiliation(s)
- Pietro Delre
- CNR—Institute of Crystallography, Bari, Italy
- Chemistry Department, University of Bari “Aldo Moro”, Bari, Italy
| | - Giovanna J. Lavado
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Giuseppe Lamanna
- CNR—Institute of Crystallography, Bari, Italy
- Chemistry Department, University of Bari “Aldo Moro”, Bari, Italy
| | | | - Alessandra Roncaglioni
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Emilio Benfenati
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Giuseppe Felice Mangiatordi
- CNR—Institute of Crystallography, Bari, Italy
- *Correspondence: Giuseppe Felice Mangiatordi, ; Domenico Gadaleta,
| | - Domenico Gadaleta
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
- *Correspondence: Giuseppe Felice Mangiatordi, ; Domenico Gadaleta,
| |
Collapse
|
22
|
Shan M, Jiang C, Qin L, Cheng G. A Review of Computational Methods in Predicting hERG Channel Blockers. ChemistrySelect 2022. [DOI: 10.1002/slct.202201221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Mengyi Shan
- School of Pharmaceutical Sciences Zhejiang Chinese Medical University Hangzhou 310053 People's Republic of China
| | - Chen Jiang
- QuanMin RenZheng (HangZhou) Technology Co. Ltd. China
| | - Lu‐Ping Qin
- School of Pharmaceutical Sciences Zhejiang Chinese Medical University Hangzhou 310053 People's Republic of China
| | - Gang Cheng
- School of Pharmaceutical Sciences Zhejiang Chinese Medical University Hangzhou 310053 People's Republic of China
| |
Collapse
|
23
|
Kim H, Park M, Lee I, Nam H. BayeshERG: a robust, reliable and interpretable deep learning model for predicting hERG channel blockers. Brief Bioinform 2022; 23:6609519. [PMID: 35709752 DOI: 10.1093/bib/bbac211] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Revised: 04/19/2022] [Accepted: 05/06/2022] [Indexed: 11/13/2022] Open
Abstract
Unintended inhibition of the human ether-à-go-go-related gene (hERG) ion channel by small molecules leads to severe cardiotoxicity. Thus, hERG channel blockage is a significant concern in the development of new drugs. Several computational models have been developed to predict hERG channel blockage, including deep learning models; however, they lack robustness, reliability and interpretability. Here, we developed a graph-based Bayesian deep learning model for hERG channel blocker prediction, named BayeshERG, which has robust predictive power, high reliability and high resolution of interpretability. First, we applied transfer learning with 300 000 large data in initial pre-training to increase the predictive performance. Second, we implemented a Bayesian neural network with Monte Carlo dropout to calibrate the uncertainty of the prediction. Third, we utilized global multihead attentive pooling to augment the high resolution of structural interpretability for the hERG channel blockers and nonblockers. We conducted both internal and external validations for stringent evaluation; in particular, we benchmarked most of the publicly available hERG channel blocker prediction models. We showed that our proposed model outperformed predictive performance and uncertainty calibration performance. Furthermore, we found that our model learned to focus on the essential substructures of hERG channel blockers via an attention mechanism. Finally, we validated the prediction results of our model by conducting in vitro experiments and confirmed its high validity. In summary, BayeshERG could serve as a versatile tool for discovering hERG channel blockers and helping maximize the possibility of successful drug discovery. The data and source code are available at our GitHub repository (https://github.com/GIST-CSBL/BayeshERG).
Collapse
Affiliation(s)
- Hyunho Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-gu, Gwangju, 61005, Republic of Korea
| | - Minsu Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-gu, Gwangju, 61005, Republic of Korea
| | - Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-gu, Gwangju, 61005, Republic of Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-gu, Gwangju, 61005, Republic of Korea
| |
Collapse
|
24
|
Drug-Induced Immune Thrombocytopenia Toxicity Prediction Based on Machine Learning. Pharmaceutics 2022; 14:pharmaceutics14050943. [PMID: 35631529 PMCID: PMC9143325 DOI: 10.3390/pharmaceutics14050943] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 04/20/2022] [Accepted: 04/22/2022] [Indexed: 11/29/2022] Open
Abstract
Drug-induced immune thrombocytopenia (DITP) often occurs in patients receiving many drug treatments simultaneously. However, clinicians usually fail to accurately distinguish which drugs can be plausible culprits. Despite significant advances in laboratory-based DITP testing, in vitro experimental assays have been expensive and, in certain cases, cannot provide a timely diagnosis to patients. To address these shortcomings, this paper proposes an efficient machine learning-based method for DITP toxicity prediction. A small dataset consisting of 225 molecules was constructed. The molecules were represented by six fingerprints, three descriptors, and their combinations. Seven classical machine learning-based models were examined to determine an optimal model. The results show that the RDMD + PubChem-k-NN model provides the best prediction performance among all the models, achieving an area under the curve of 76.9% and overall accuracy of 75.6% on the external validation set. The application domain (AD) analysis demonstrates the prediction reliability of the RDMD + PubChem-k-NN model. Five structural fragments related to the DITP toxicity are identified through information gain (IG) method along with fragment frequency analysis. Overall, as far as known, it is the first machine learning-based classification model for recognizing chemicals with DITP toxicity and can be used as an efficient tool in drug design and clinical therapy.
Collapse
|
25
|
Krishna S, Borrel A, Huang R, Zhao J, Xia M, Kleinstreuer N. High-Throughput Chemical Screening and Structure-Based Models to Predict hERG Inhibition. BIOLOGY 2022; 11:209. [PMID: 35205076 PMCID: PMC8869358 DOI: 10.3390/biology11020209] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Revised: 01/18/2022] [Accepted: 01/21/2022] [Indexed: 12/23/2022]
Abstract
Chemical inhibition of the human ether-a -go-go-related gene (hERG) potassium channel leads to a prolonged QT interval that can contribute to severe cardiotoxicity. The adverse effects of hERG inhibition are one of the principal causes of drug attrition in clinical and pre-clinical development. Preliminary studies have demonstrated that a wide range of environmental chemicals and toxicants may also inhibit the hERG channel and contribute to the pathophysiology of cardiovascular (CV) diseases. As part of the US federal Tox21 program, the National Center for Advancing Translational Science (NCATS) applied a quantitative high throughput screening (qHTS) approach to screen the Tox21 library of 10,000 compounds (~7871 unique chemicals) at 14 concentrations in triplicate to identify chemicals perturbing hERG activity in the U2OS cell line thallium flux assay platform. The qHTS cell-based thallium influx assay provided a robust and reliable dataset to evaluate the ability of thousands of drugs and environmental chemicals to inhibit hERG channel protein, and the use of chemical structure-based clustering and chemotype enrichment analysis facilitated the identification of molecular features that are likely responsible for the observed hERG activity. We employed several machine-learning approaches to develop QSAR prediction models for the assessment of hERG liabilities for drug-like and environmental chemicals. The training set was compiled by integrating hERG bioactivity data from the ChEMBL database with the Tox21 qHTS thallium flux assay data. The best results were obtained with the random forest method (~92.6% balanced accuracy). The data and scripts used to generate hERG prediction models are provided in an open-access format as key in vitro and in silico tools that can be applied in a translational toxicology pipeline for drug development and environmental chemical screening.
Collapse
Affiliation(s)
- Shagun Krishna
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences (NIEHS), Research Triangle, NC 27560, USA;
| | | | - Ruili Huang
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Bethesda, MD 20892-4874, USA; (R.H.); (J.Z.); (M.X.)
| | - Jinghua Zhao
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Bethesda, MD 20892-4874, USA; (R.H.); (J.Z.); (M.X.)
| | - Menghang Xia
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Bethesda, MD 20892-4874, USA; (R.H.); (J.Z.); (M.X.)
| | - Nicole Kleinstreuer
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences (NIEHS), Research Triangle, NC 27560, USA;
| |
Collapse
|
26
|
Shan M, Jiang C, Chen J, Qin LP, Qin JJ, Cheng G. Predicting hERG channel blockers with directed message passing neural networks. RSC Adv 2022; 12:3423-3430. [PMID: 35425351 PMCID: PMC8979305 DOI: 10.1039/d1ra07956e] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 12/13/2021] [Indexed: 11/30/2022] Open
Abstract
Compounds with human ether-à-go-go related gene (hERG) blockade activity may cause severe cardiotoxicity. Assessing the hERG liability in the early stages of the drug discovery process is important, and the in silico methods for predicting hERG channel blockers are actively pursued. In the present study, the directed message passing neural network (D-MPNN) was applied to construct classification models for identifying hERG blockers based on diverse datasets. Several descriptors and fingerprints were tested along with the D-MPNN model. Among all these combinations, D-MPNN with the moe206 descriptors generated from MOE (D-MPNN + moe206) showed significantly improved performances. The AUC-ROC values of the D-MPNN + moe206 model reached 0.956 ± 0.005 under random split and 0.922 ± 0.015 under scaffold split on Cai's hERG dataset, respectively. Moreover, the comparisons between our models and several recently reported machine learning models were made based on various datasets. Our results indicated that the D-MPNN + moe206 model is among the best classification models. Overall, the excellent performance of the DMPNN + moe206 model achieved in this study highlights its potential application in the discovery of novel and effective hERG blockers. Compounds with human ether-à-go-go related gene (hERG) blockade activity may cause severe cardiotoxicity.![]()
Collapse
Affiliation(s)
- Mengyi Shan
- College of Pharmaceutical Sciences, Zhejiang Chinese Medical University Hangzhou 310053 People's Republic of China
| | - Chen Jiang
- College of Pharmaceutical Sciences, Zhejiang Chinese Medical University Hangzhou 310053 People's Republic of China .,Hangzhou Jingchun Trading Co., Ltd. China
| | - Jing Chen
- College of Pharmaceutical Sciences, Zhejiang Chinese Medical University Hangzhou 310053 People's Republic of China .,College of Pharmaceutical Sciences, Zhejiang University Hangzhou Zhejiang 310058 PR China
| | - Lu-Ping Qin
- College of Pharmaceutical Sciences, Zhejiang Chinese Medical University Hangzhou 310053 People's Republic of China
| | - Jiang-Jiang Qin
- The Cancer Hospital of the University of Chinese Academy of Sciences, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences Hangzhou 310022 China
| | - Gang Cheng
- College of Pharmaceutical Sciences, Zhejiang Chinese Medical University Hangzhou 310053 People's Republic of China
| |
Collapse
|
27
|
Li Y, Xu Y, Yu Y. CRNNTL: Convolutional Recurrent Neural Network and Transfer Learning for QSAR Modeling in Organic Drug and Material Discovery. Molecules 2021; 26:molecules26237257. [PMID: 34885843 PMCID: PMC8658888 DOI: 10.3390/molecules26237257] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 11/25/2021] [Accepted: 11/26/2021] [Indexed: 11/16/2022] Open
Abstract
Molecular latent representations, derived from autoencoders (AEs), have been widely used for drug or material discovery over the past couple of years. In particular, a variety of machine learning methods based on latent representations have shown excellent performance on quantitative structure–activity relationship (QSAR) modeling. However, the sequence feature of them has not been considered in most cases. In addition, data scarcity is still the main obstacle for deep learning strategies, especially for bioactivity datasets. In this study, we propose the convolutional recurrent neural network and transfer learning (CRNNTL) method inspired by the applications of polyphonic sound detection and electrocardiogram classification. Our model takes advantage of both convolutional and recurrent neural networks for feature extraction, as well as the data augmentation method. According to QSAR modeling on 27 datasets, CRNNTL can outperform or compete with state-of-art methods in both drug and material properties. In addition, the performances on one isomers-based dataset indicate that its excellent performance results from the improved ability in global feature extraction when the ability of the local one is maintained. Then, the transfer learning results show that CRNNTL can overcome data scarcity when choosing relative source datasets. Finally, the high versatility of our model is shown by using different latent representations as inputs from other types of AEs.
Collapse
Affiliation(s)
- Yaqin Li
- West China Tianfu Hospital, Sichuan University, Chengdu 610041, China
- Correspondence: (Y.L.); (Y.Y.)
| | - Yongjin Xu
- Department of Chemistry and Molecular Biology, University of Gothenburg, Kemivägen 10, 41296 Gothenburg, Sweden;
| | - Yi Yu
- Department of Chemistry and Molecular Biology, University of Gothenburg, Kemivägen 10, 41296 Gothenburg, Sweden;
- Correspondence: (Y.L.); (Y.Y.)
| |
Collapse
|
28
|
Venkatraman V. FP-ADMET: a compendium of fingerprint-based ADMET prediction models. J Cheminform 2021; 13:75. [PMID: 34583740 PMCID: PMC8479898 DOI: 10.1186/s13321-021-00557-5] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 09/20/2021] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION The absorption, distribution, metabolism, excretion, and toxicity (ADMET) of drugs plays a key role in determining which among the potential candidates are to be prioritized. In silico approaches based on machine learning methods are becoming increasing popular, but are nonetheless limited by the availability of data. With a view to making both data and models available to the scientific community, we have developed FPADMET which is a repository of molecular fingerprint-based predictive models for ADMET properties. In this article, we have examined the efficacy of fingerprint-based machine learning models for a large number of ADMET-related properties. The predictive ability of a set of 20 different binary fingerprints (based on substructure keys, atom pairs, local path environments, as well as custom fingerprints such as all-shortest paths) for over 50 ADMET and ADMET-related endpoints have been evaluated as part of the study. We find that for a majority of the properties, fingerprint-based random forest models yield comparable or better performance compared with traditional 2D/3D molecular descriptors. AVAILABILITY The models are made available as part of open access software that can be downloaded from https://gitlab.com/vishsoft/fpadmet .
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Norwegian University of Science and Technology, Realfagbygget, Gløshaugen, Høgskoleringen, 7491, Trondheim, Norway.
| |
Collapse
|
29
|
Lee KH, Fant AD, Guo J, Guan A, Jung J, Kudaibergenova M, Miranda WE, Ku T, Cao J, Wacker S, Duff HJ, Newman AH, Noskov SY, Shi L. Toward Reducing hERG Affinities for DAT Inhibitors with a Combined Machine Learning and Molecular Modeling Approach. J Chem Inf Model 2021; 61:4266-4279. [PMID: 34420294 DOI: 10.1021/acs.jcim.1c00856] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Psychostimulant drugs, such as cocaine, inhibit dopamine reuptake via blockading the dopamine transporter (DAT), which is the primary mechanism underpinning their abuse. Atypical DAT inhibitors are dissimilar to cocaine and can block cocaine- or methamphetamine-induced behaviors, supporting their development as part of a treatment regimen for psychostimulant use disorders. When developing these atypical DAT inhibitors as medications, it is necessary to avoid off-target binding that can produce unwanted side effects or toxicities. In particular, the blockade of a potassium channel, human ether-a-go-go (hERG), can lead to potentially lethal ventricular tachycardia. In this study, we established a counter screening platform for DAT and against hERG binding by combining machine learning-based quantitative structure-activity relationship (QSAR) modeling, experimental validation, and molecular modeling and simulations. Our results show that the available data are adequate to establish robust QSAR models, as validated by chemical synthesis and pharmacological evaluation of a validation set of DAT inhibitors. Furthermore, the QSAR models based on subsets of the data according to experimental approaches used have predictive power as well, which opens the door to target specific functional states of a protein. Complementarily, our molecular modeling and simulations identified the structural elements responsible for a pair of DAT inhibitors having opposite binding affinity trends at DAT and hERG, which can be leveraged for rational optimization of lead atypical DAT inhibitors with desired pharmacological properties.
Collapse
Affiliation(s)
- Kuo Hao Lee
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse-Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States
| | - Andrew D Fant
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse-Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States
| | - Jiqing Guo
- Libin Cardiovascular Institute of Alberta, Cumming School of Medicine, University of Calgary, Calgary, Alberta T2N 4N1, Canada
| | - Andy Guan
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse-Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States
| | - Joslyn Jung
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse-Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States
| | - Mary Kudaibergenova
- Centre for Molecular Simulation, Department of Biological Sciences, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | - Williams E Miranda
- Centre for Molecular Simulation, Department of Biological Sciences, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | - Therese Ku
- Medicinal Chemistry Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse-Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States
| | - Jianjing Cao
- Medicinal Chemistry Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse-Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States
| | - Soren Wacker
- Libin Cardiovascular Institute of Alberta, Cumming School of Medicine, University of Calgary, Calgary, Alberta T2N 4N1, Canada.,Centre for Molecular Simulation, Department of Biological Sciences, University of Calgary, Calgary, Alberta T2N 1N4, Canada.,Achlys Inc., 7-126 Li Ka Shing Center for Health and Innovation, Edmonton, Alberta T6G 2E1, Canada
| | - Henry J Duff
- Libin Cardiovascular Institute of Alberta, Cumming School of Medicine, University of Calgary, Calgary, Alberta T2N 4N1, Canada
| | - Amy Hauck Newman
- Medicinal Chemistry Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse-Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States
| | - Sergei Y Noskov
- Centre for Molecular Simulation, Department of Biological Sciences, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | - Lei Shi
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse-Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States
| |
Collapse
|
30
|
Karim A, Lee M, Balle T, Sattar A. CardioTox net: a robust predictor for hERG channel blockade based on deep learning meta-feature ensembles. J Cheminform 2021; 13:60. [PMID: 34399849 PMCID: PMC8365955 DOI: 10.1186/s13321-021-00541-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 08/05/2021] [Indexed: 11/10/2022] Open
Abstract
MOTIVATION Ether-a-go-go-related gene (hERG) channel blockade by small molecules is a big concern during drug development in the pharmaceutical industry. Blockade of hERG channels may cause prolonged QT intervals that potentially could lead to cardiotoxicity. Various in-silico techniques including deep learning models are widely used to screen out small molecules with potential hERG related toxicity. Most of the published deep learning methods utilize a single type of features which might restrict their performance. Methods based on more than one type of features such as DeepHIT struggle with the aggregation of extracted information. DeepHIT shows better performance when evaluated against one or two accuracy metrics such as negative predictive value (NPV) and sensitivity (SEN) but struggle when evaluated against others such as Matthew correlation coefficient (MCC), accuracy (ACC), positive predictive value (PPV) and specificity (SPE). Therefore, there is a need for a method that can efficiently aggregate information gathered from models based on different chemical representations and boost hERG toxicity prediction over a range of performance metrics. RESULTS In this paper, we propose a deep learning framework based on step-wise training to predict hERG channel blocking activity of small molecules. Our approach utilizes five individual deep learning base models with their respective base features and a separate neural network to combine the outputs of the five base models. By using three external independent test sets with potency activity of IC50 at a threshold of 10 [Formula: see text]m, our method achieves better performance for a combination of classification metrics. We also investigate the effective aggregation of chemical information extracted for robust hERG activity prediction. In summary, CardioTox net can serve as a robust tool for screening small molecules for hERG channel blockade in drug discovery pipelines and performs better than previously reported methods on a range of classification metrics.
Collapse
Affiliation(s)
- Abdul Karim
- School of Information Communication Technology, Griffith University, 4111 Nathan, Brisbane, Australia
| | - Matthew Lee
- School of Information Communication Technology, Griffith University, 4111 Nathan, Brisbane, Australia
| | - Thomas Balle
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, 2006 Sydney, Australia
- Brain and Mind Centre, The University of Sydney, 2050 Sydney, Australia
| | - Abdul Sattar
- Institute of Integrated and Intelligent Systems, Griffith University, 4111 Nathan, Brisbane, Australia
| |
Collapse
|
31
|
Rácz A, Bajusz D, Miranda-Quintana RA, Héberger K. Machine learning models for classification tasks related to drug safety. Mol Divers 2021; 25:1409-1424. [PMID: 34110577 PMCID: PMC8342376 DOI: 10.1007/s11030-021-10239-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 05/27/2021] [Indexed: 12/23/2022]
Abstract
In this review, we outline the current trends in the field of machine learning-driven classification studies related to ADME (absorption, distribution, metabolism and excretion) and toxicity endpoints from the past six years (2015-2021). The study focuses only on classification models with large datasets (i.e. more than a thousand compounds). A comprehensive literature search and meta-analysis was carried out for nine different targets: hERG-mediated cardiotoxicity, blood-brain barrier penetration, permeability glycoprotein (P-gp) substrate/inhibitor, cytochrome P450 enzyme family, acute oral toxicity, mutagenicity, carcinogenicity, respiratory toxicity and irritation/corrosion. The comparison of the best classification models was targeted to reveal the differences between machine learning algorithms and modeling types, endpoint-specific performances, dataset sizes and the different validation protocols. Based on the evaluation of the data, we can say that tree-based algorithms are (still) dominating the field, with consensus modeling being an increasing trend in drug safety predictions. Although one can already find classification models with great performances to hERG-mediated cardiotoxicity and the isoenzymes of the cytochrome P450 enzyme family, these targets are still central to ADMET-related research efforts.
Collapse
Affiliation(s)
- Anita Rácz
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, Budapest, 1117, Hungary.
| | - Dávid Bajusz
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, Budapest, 1117, Hungary
| | | | - Károly Héberger
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, Budapest, 1117, Hungary.
| |
Collapse
|
32
|
Xiong Z, Cheng Z, Lin X, Xu C, Liu X, Wang D, Luo X, Zhang Y, Jiang H, Qiao N, Zheng M. Facing small and biased data dilemma in drug discovery with enhanced federated learning approaches. SCIENCE CHINA-LIFE SCIENCES 2021; 65:529-539. [PMID: 34319533 DOI: 10.1007/s11427-021-1946-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Accepted: 05/16/2021] [Indexed: 12/11/2022]
Abstract
Artificial intelligence (AI) models usually require large amounts of high-quality training data, which is in striking contrast to the situation of small and biased data faced by current drug discovery pipelines. The concept of federated learning has been proposed to utilize distributed data from different sources without leaking sensitive information of the data. This emerging decentralized machine learning paradigm is expected to dramatically improve the success rate of AI-powered drug discovery. Here, we simulated the federated learning process with different property and activity datasets from different sources, among which overlapping molecules with high or low biases exist in the recorded values. Beyond the benefit of gaining more data, we also demonstrated that federated training has a regularization effect superior to centralized training on the pooled datasets with high biases. Moreover, different network architectures for clients and aggregation algorithms for coordinators have been compared on the performance of federated learning, where personalized federated learning shows promising results. Our work demonstrates the applicability of federated learning in predicting drug-related properties and highlights its promising role in addressing the small and biased data dilemma in drug discovery.
Collapse
Affiliation(s)
- Zhaoping Xiong
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, Shanghai Tech University, Shanghai, 200031, China.,Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Ziqiang Cheng
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, Shanghai Tech University, Shanghai, 200031, China.,School of Information Science and Technology, University of Science and Technology of China, Hefei, 230000, China
| | - Xinyuan Lin
- Laboratory of Health Intelligence, Huawei Technologies Co., Ltd, Shenzhen, 518100, China
| | - Chi Xu
- Laboratory of Health Intelligence, Huawei Technologies Co., Ltd, Shenzhen, 518100, China
| | - Xiaohong Liu
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, Shanghai Tech University, Shanghai, 200031, China.,Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
| | - Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
| | - Yong Zhang
- Laboratory of Health Intelligence, Huawei Technologies Co., Ltd, Shenzhen, 518100, China
| | - Hualiang Jiang
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, Shanghai Tech University, Shanghai, 200031, China. .,Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
| | - Nan Qiao
- Laboratory of Health Intelligence, Huawei Technologies Co., Ltd, Shenzhen, 518100, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
| |
Collapse
|