1
|
Xiao Z, Zhu M, Chen J, You Z. Integrated Transfer Learning and Multitask Learning Strategies to Construct Graph Neural Network Models for Predicting Bioaccumulation Parameters of Chemicals. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:15650-15660. [PMID: 39051472 DOI: 10.1021/acs.est.4c02421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Accurate prediction of parameters related to the environmental exposure of chemicals is crucial for the sound management of chemicals. However, the lack of large data sets for training models may result in poor prediction accuracy and robustness. Herein, integrated transfer learning (TL) and multitask learning (MTL) was proposed for constructing a graph neural network (GNN) model (abbreviated as TL-MTL-GNN model) using n-octanol/water partition coefficients as a source domain. The TL-MTL-GNN model was trained to predict three bioaccumulation parameters based on enlarged data sets that cover 2496 compounds with at least one bioaccumulation parameter. Results show that the TL-MTL-GNN model outperformed single-task GNN models with and without the TL, as well as conventional machine learning models trained with molecular descriptors or fingerprints. Applicability domains were characterized by a state-of-the-art structure-activity landscape-based (abbreviated as ADSAL) methodology. The TL-MTL-GNN model coupled with the optimal ADSAL was employed to predict bioaccumulation parameters for around 60,000 chemicals, with more than 13,000 compounds identified as bioaccumulative chemicals. The high predictive accuracy and robustness of the TL-MTL-GNN model demonstrate the feasibility of integrating the TL and MTL strategy in modeling small-sized data sets. The strategy holds significant potential for addressing small data challenges in modeling environmental chemicals.
Collapse
Affiliation(s)
- Zijun Xiao
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Minghua Zhu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
- Key Laboratory of Integrated Regulation and Resources Development of Shallow Lakes of Ministry of Education, College of Environment, Hohai University, Nanjing 210098, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zecang You
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
2
|
Seal S, Williams D, Hosseini-Gerami L, Mahale M, Carpenter AE, Spjuth O, Bender A. Improved Detection of Drug-Induced Liver Injury by Integrating Predicted In Vivo and In Vitro Data. Chem Res Toxicol 2024; 37:1290-1305. [PMID: 38981058 PMCID: PMC11337212 DOI: 10.1021/acs.chemrestox.4c00015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 06/27/2024] [Accepted: 07/01/2024] [Indexed: 07/11/2024]
Abstract
Drug-induced liver injury (DILI) has been a significant challenge in drug discovery, often leading to clinical trial failures and necessitating drug withdrawals. Over the last decade, the existing suite of in vitro proxy-DILI assays has generally improved at identifying compounds with hepatotoxicity. However, there is considerable interest in enhancing the in silico prediction of DILI because it allows for evaluating large sets of compounds more quickly and cost-effectively, particularly in the early stages of projects. In this study, we aim to study ML models for DILI prediction that first predict nine proxy-DILI labels and then use them as features in addition to chemical structural features to predict DILI. The features include in vitro (e.g., mitochondrial toxicity, bile salt export pump inhibition) data, in vivo (e.g., preclinical rat hepatotoxicity studies) data, pharmacokinetic parameters of maximum concentration, structural fingerprints, and physicochemical parameters. We trained DILI-prediction models on 888 compounds from the DILI data set (composed of DILIst and DILIrank) and tested them on a held-out external test set of 223 compounds from the DILI data set. The best model, DILIPredictor, attained an AUC-PR of 0.79. This model enabled the detection of the top 25 toxic compounds (2.68 LR+, positive likelihood ratio) compared to models using only structural features (1.65 LR+ score). Using feature interpretation from DILIPredictor, we identified the chemical substructures causing DILI and differentiated cases of DILI caused by compounds in animals but not in humans. For example, DILIPredictor correctly recognized 2-butoxyethanol as nontoxic in humans despite its hepatotoxicity in mice models. Overall, the DILIPredictor model improves the detection of compounds causing DILI with an improved differentiation between animal and human sensitivity and the potential for mechanism evaluation. DILIPredictor required only chemical structures as input for prediction and is publicly available at https://broad.io/DILIPredictor for use via web interface and with all code available for download.
Collapse
Affiliation(s)
- Srijit Seal
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Rd, Cambridge CB2 1EW, United Kingdom
- Imaging
Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02141, United States
| | - Dominic Williams
- Safety
Innovation, Clinical Pharmacology and Safety Sciences, AstraZeneca, Cambridge CB4 0FZ, United Kingdom
- Quantitative
Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB4 0FZ, United Kingdom
| | - Layla Hosseini-Gerami
- Ignota
Laboratories, County Hall, Westminster Bridge Rd, London SE1 7PB, United Kingdom
| | - Manas Mahale
- Bombay
College
of Pharmacy Kalina Santacruz (E), Mumbai 400 098, India
| | - Anne E. Carpenter
- Imaging
Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02141, United States
| | - Ola Spjuth
- Department
of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, Uppsala SE-75124, Sweden
| | - Andreas Bender
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Rd, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|
3
|
Duan H, Lou C, Gu Y, Wang Y, Li W, Liu G, Tang Y. In Silico prediction of inhibitors for multiple transporters via machine learning methods. Mol Inform 2024; 43:e202300270. [PMID: 38235949 DOI: 10.1002/minf.202300270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 01/02/2024] [Accepted: 01/17/2024] [Indexed: 01/19/2024]
Abstract
Transporters play an indispensable role in facilitating the transport of nutrients, signaling molecules and the elimination of metabolites and toxins in human cells. Contemporary computational methods have been employed in the prediction of transporter inhibitors. However, these methods often focus on isolated endpoints, overlooking the interactions between transporters and lacking good interpretation. In this study, we integrated a comprehensive dataset and constructed models to assess the inhibitory effects on seven transporters. Both conventional machine learning and multi-task deep learning methods were employed. The results demonstrated that the MLT-GAT model achieved superior performance with an average AUC value of 0.882. It is noteworthy that our model excels not only in prediction performance but also in achieving robust interpretability, aided by GNN-Explainer. It provided valuable insights into transporter inhibition. The reliability of our model's predictions positioned it as a promising and valuable tool in the field of transporter inhibition research. Related data and code are available at https://gitee.com/wutiantian99/transporter_code.git.
Collapse
Affiliation(s)
- Hao Duan
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Chaofeng Lou
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yaxin Gu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yimeng Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| |
Collapse
|
4
|
Bernardi A, Bennett WFD, He S, Jones D, Kirshner D, Bennion BJ, Carpenter TS. Advances in Computational Approaches for Estimating Passive Permeability in Drug Discovery. MEMBRANES 2023; 13:851. [PMID: 37999336 PMCID: PMC10673305 DOI: 10.3390/membranes13110851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 10/19/2023] [Accepted: 10/21/2023] [Indexed: 11/25/2023]
Abstract
Passive permeation of cellular membranes is a key feature of many therapeutics. The relevance of passive permeability spans all biological systems as they all employ biomembranes for compartmentalization. A variety of computational techniques are currently utilized and under active development to facilitate the characterization of passive permeability. These methods include lipophilicity relations, molecular dynamics simulations, and machine learning, which vary in accuracy, complexity, and computational cost. This review briefly introduces the underlying theories, such as the prominent inhomogeneous solubility diffusion model, and covers a number of recent applications. Various machine-learning applications, which have demonstrated good potential for high-volume, data-driven permeability predictions, are also discussed. Due to the confluence of novel computational methods and next-generation exascale computers, we anticipate an exciting future for computationally driven permeability predictions.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Timothy S. Carpenter
- Lawrence Livermore National Laboratory, Livermore, CA 94550, USA; (A.B.); (W.F.D.B.); (S.H.); (D.J.); (D.K.); (B.J.B.)
| |
Collapse
|
5
|
Zhang W, Zhang C, Cao L, Liang F, Xie W, Tao L, Chen C, Yang M, Zhong L. Application of digital-intelligence technology in the processing of Chinese materia medica. Front Pharmacol 2023; 14:1208055. [PMID: 37693890 PMCID: PMC10484343 DOI: 10.3389/fphar.2023.1208055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 08/10/2023] [Indexed: 09/12/2023] Open
Abstract
Processing of Chinese Materia Medica (PCMM) is the concentrated embodiment, which is the core of Chinese unique traditional pharmaceutical technology. The processing includes the preparation steps such as cleansing, cutting and stir-frying, to make certain impacts on the quality and efficacy of Chinese botanical drugs. The rapid development of new computer digital technologies, such as big data analysis, Internet of Things (IoT), blockchain and cloud computing artificial intelligence, has promoted the rapid development of traditional pharmaceutical manufacturing industry with digitalization and intellectualization. In this review, the application of digital intelligence technology in the PCMM was analyzed and discussed, which hopefully promoted the standardization of the process and secured the quality of botanical drugs decoction pieces. Through the intellectualization and the digitization of production, safety and effectiveness of clinical use of traditional Chinese medicine (TCM) decoction pieces were ensured. This review also provided a theoretical basis for further technical upgrading and high-quality development of TCM industry.
Collapse
Affiliation(s)
- Wanlong Zhang
- College of Pharmacy, Jiangxi University of Chinese Medicine, Nanchang, Jiangxi, China
| | - Changhua Zhang
- College of Pharmacy, Jiangxi University of Chinese Medicine, Nanchang, Jiangxi, China
- Nanchang Research Institute, Sun Yat-sen University, Nanchang, Jiangxi, China
| | - Lan Cao
- College of Pharmacy, Jiangxi University of Chinese Medicine, Nanchang, Jiangxi, China
| | - Fang Liang
- College of Physical Culture, Yuzhang Normal University, Nanchang, Jiangxi, China
| | - Weihua Xie
- College of Pharmacy, Jiangxi University of Chinese Medicine, Nanchang, Jiangxi, China
| | - Liang Tao
- Nanchang Research Institute, Sun Yat-sen University, Nanchang, Jiangxi, China
| | - Chen Chen
- School of Biomedical Sciences, University of Queensland, Brisbane, QLD, Australia
| | - Ming Yang
- Key Laboratory of Modern Chinese Medicine Preparation of Ministry of Education, Jiangxi University of Chinese Medicine, Nanchang, Jiangxi, China
| | - Lingyun Zhong
- College of Pharmacy, Jiangxi University of Chinese Medicine, Nanchang, Jiangxi, China
| |
Collapse
|
6
|
Zhang W, Zhang C, Cao L, Liang F, Xie W, Tao L, Chen C, Yang M, Zhong L. Application of digital-intelligence technology in the processing of Chinese materia medica. Front Pharmacol 2023; 14. [DOI: https:/doi.org/10.3389/fphar.2023.1208055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2024] Open
Abstract
Processing of Chinese Materia Medica (PCMM) is the concentrated embodiment, which is the core of Chinese unique traditional pharmaceutical technology. The processing includes the preparation steps such as cleansing, cutting and stir-frying, to make certain impacts on the quality and efficacy of Chinese botanical drugs. The rapid development of new computer digital technologies, such as big data analysis, Internet of Things (IoT), blockchain and cloud computing artificial intelligence, has promoted the rapid development of traditional pharmaceutical manufacturing industry with digitalization and intellectualization. In this review, the application of digital intelligence technology in the PCMM was analyzed and discussed, which hopefully promoted the standardization of the process and secured the quality of botanical drugs decoction pieces. Through the intellectualization and the digitization of production, safety and effectiveness of clinical use of traditional Chinese medicine (TCM) decoction pieces were ensured. This review also provided a theoretical basis for further technical upgrading and high-quality development of TCM industry.
Collapse
|
7
|
Kong X, Lin K, Wu G, Tao X, Zhai X, Lv L, Dong D, Zhu Y, Yang S. Machine Learning Techniques Applied to the Study of Drug Transporters. Molecules 2023; 28:5936. [PMID: 37630188 PMCID: PMC10459831 DOI: 10.3390/molecules28165936] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 07/27/2023] [Accepted: 08/02/2023] [Indexed: 08/27/2023] Open
Abstract
With the advancement of computer technology, machine learning-based artificial intelligence technology has been increasingly integrated and applied in the fields of medicine, biology, and pharmacy, thereby facilitating their development. Transporters have important roles in influencing drug resistance, drug-drug interactions, and tissue-specific drug targeting. The investigation of drug transporter substrates and inhibitors is a crucial aspect of pharmaceutical development. However, long duration and high expenses pose significant challenges in the investigation of drug transporters. In this review, we discuss the present situation and challenges encountered in applying machine learning techniques to investigate drug transporters. The transporters involved include ABC transporters (P-gp, BCRP, MRPs, and BSEP) and SLC transporters (OAT, OATP, OCT, MATE1,2-K, and NET). The aim is to offer a point of reference for and assistance with the progression of drug transporter research, as well as the advancement of more efficient computer technology. Machine learning methods are valuable and attractive for helping with the study of drug transporter substrates and inhibitors, but continuous efforts are still needed to develop more accurate and reliable predictive models and to apply them in the screening process of drug development to improve efficiency and success rates.
Collapse
Affiliation(s)
- Xiaorui Kong
- Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China; (X.K.); (K.L.); (X.T.); (X.Z.); (L.L.); (D.D.)
| | - Kexin Lin
- Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China; (X.K.); (K.L.); (X.T.); (X.Z.); (L.L.); (D.D.)
| | - Gaolei Wu
- Department of Pharmacy, Dalian Women and Children’s Medical Group, Dalian 116024, China;
| | - Xufeng Tao
- Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China; (X.K.); (K.L.); (X.T.); (X.Z.); (L.L.); (D.D.)
| | - Xiaohan Zhai
- Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China; (X.K.); (K.L.); (X.T.); (X.Z.); (L.L.); (D.D.)
| | - Linlin Lv
- Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China; (X.K.); (K.L.); (X.T.); (X.Z.); (L.L.); (D.D.)
| | - Deshi Dong
- Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China; (X.K.); (K.L.); (X.T.); (X.Z.); (L.L.); (D.D.)
| | - Yanna Zhu
- Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China; (X.K.); (K.L.); (X.T.); (X.Z.); (L.L.); (D.D.)
| | - Shilei Yang
- Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China; (X.K.); (K.L.); (X.T.); (X.Z.); (L.L.); (D.D.)
| |
Collapse
|
8
|
AbdulHameed MDM, Liu R, Wallqvist A. Using a Graph Convolutional Neural Network Model to Identify Bile Salt Export Pump Inhibitors. ACS OMEGA 2023; 8:21853-21861. [PMID: 37360478 PMCID: PMC10286257 DOI: 10.1021/acsomega.3c01583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 05/19/2023] [Indexed: 06/28/2023]
Abstract
The bile salt export pump (BSEP) is a key transporter involved in the efflux of bile salts from hepatocytes to bile canaliculi. Inhibition of BSEP leads to the accumulation of bile salts within the hepatocytes, leading to possible cholestasis and drug-induced liver injury. Screening for and identification of chemicals that inhibit this transporter aid in understanding the safety liabilities of these chemicals. Moreover, computational approaches to identify BSEP inhibitors provide an alternative to the more resource-intensive, gold standard experimental approaches. Here, we used publicly available data to develop predictive machine learning models for the identification of potential BSEP inhibitors. Specifically, we analyzed the utility of a graph convolutional neural network (GCNN)-based approach in combination with multitask learning to identify BSEP inhibitors. Our analyses showed that the developed GCNN model performed better than the variable-nearest neighbor and Bayesian machine learning approaches, with a cross-validation receiver operating characteristic area under the curve of 0.86. In addition, we compared GCNN-based single-task and multitask models and evaluated their utility in addressing data limitation challenges commonly observed in bioactivity modeling. We found that multitask models performed better than single-task models and can be utilized to identify active molecules for targets with limited data availability. Overall, our developed multitask GCNN-based BSEP model provides a useful tool for prioritizing hits during early drug discovery and in risk assessment of chemicals.
Collapse
Affiliation(s)
- Mohamed Diwan M. AbdulHameed
- Department
of Defense Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Development Command, Fort Detrick 21702, Maryland, United States
- The
Henry M. Jackson Foundation for the Advancement of Military Medicine,
Inc., Bethesda 20817, Maryland, United States
| | - Ruifeng Liu
- Department
of Defense Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Development Command, Fort Detrick 21702, Maryland, United States
- The
Henry M. Jackson Foundation for the Advancement of Military Medicine,
Inc., Bethesda 20817, Maryland, United States
| | - Anders Wallqvist
- Department
of Defense Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Development Command, Fort Detrick 21702, Maryland, United States
| |
Collapse
|
9
|
Fan YJ, Allen JE, McLoughlin KS, Shi D, Bennion BJ, Zhang X, Lightstone FC. Evaluating point-prediction uncertainties in neural networks for protein-ligand binding prediction. ARTIFICIAL INTELLIGENCE CHEMISTRY 2023; 1:100004. [PMID: 37583465 PMCID: PMC10426331 DOI: 10.1016/j.aichem.2023.100004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Neural Network (NN) models provide potential to speed up the drug discovery process and reduce its failure rates. The success of NN models requires uncertainty quantification (UQ) as drug discovery explores chemical space beyond the training data distribution. Standard NN models do not provide uncertainty information. Some methods require changing the NN architecture or training procedure, limiting the selection of NN models. Moreover, predictive uncertainty can come from different sources. It is important to have the ability to separately model different types of predictive uncertainty, as the model can take assorted actions depending on the source of uncertainty. In this paper, we examine UQ methods that estimate different sources of predictive uncertainty for NN models aiming at protein-ligand binding prediction. We use our prior knowledge on chemical compounds to design the experiments. By utilizing a visualization method we create non-overlapping and chemically diverse partitions from a collection of chemical compounds. These partitions are used as training and test set splits to explore NN model uncertainty. We demonstrate how the uncertainties estimated by the selected methods describe different sources of uncertainty under different partitions and featurization schemes and the relationship to prediction error.
Collapse
Affiliation(s)
- Ya Ju Fan
- Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, 7000 East Ave., Livermore, CA, USA
| | - Jonathan E. Allen
- Biological Science and Security Center, Lawrence Livermore National Laboratory, Livermore, CA, USA
| | - Kevin S. McLoughlin
- Biological Science and Security Center, Lawrence Livermore National Laboratory, Livermore, CA, USA
| | - Da Shi
- Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Brian J. Bennion
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA, USA
| | - Xiaohua Zhang
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA, USA
| | - Felice C. Lightstone
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA, USA
| |
Collapse
|
10
|
Di Lascio E, Gerebtzoff G, Rodríguez-Pérez R. Systematic Evaluation of Local and Global Machine Learning Models for the Prediction of ADME Properties. Mol Pharm 2023; 20:1758-1767. [PMID: 36745394 DOI: 10.1021/acs.molpharmaceut.2c00962] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Machine learning (ML) has become an indispensable tool to predict absorption, distribution, metabolism, and excretion (ADME) properties in pharmaceutical research. ML algorithms are trained on molecular structures and corresponding ADME assay data to develop quantitative structure-property relationship (QSPR) models. Traditional QSPR models were trained on compound sets of limited size. With the advent of more complex ML algorithms and data availability, training sets have become larger and more diverse. Most common training approaches consist in either training a model with a small set of similar compounds, namely, compounds designed for the same drug discovery project or chemical series (local model approach) or with a larger set of diverse compounds (global model approach). Global models are built with all experimental data available for an assay, combining compound data from different projects and disease areas. Despite the ML progress made so far, the choice of the appropriate data composition for building ML models is still unclear. Herein, a systematic evaluation of local and global ML models was performed for 10 different experimental assays and 112 drug discovery projects. Results show a consistent superior performance of global models for ADME property predictions. Diagnostic analyses were also carried out to investigate the influence of training set size, structural diversity, and data shift in the relative performance of local and global ML models. Training set and structural diversity did not have an impact in the relative performance on the methods. Instead, data shift helped to identify the projects with larger performance differences between local and global models. Results presented in this work can be leveraged to improve ML-based ADME properties predictions and thus decision-making in drug discovery projects.
Collapse
Affiliation(s)
- Elena Di Lascio
- Novartis Institutes for Biomedical Research, Novartis Campus, BaselCH-4002, Switzerland
| | - Grégori Gerebtzoff
- Novartis Institutes for Biomedical Research, Novartis Campus, BaselCH-4002, Switzerland
| | | |
Collapse
|
11
|
Venkatraman V. FP-ADMET: a compendium of fingerprint-based ADMET prediction models. J Cheminform 2021; 13:75. [PMID: 34583740 PMCID: PMC8479898 DOI: 10.1186/s13321-021-00557-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 09/20/2021] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION The absorption, distribution, metabolism, excretion, and toxicity (ADMET) of drugs plays a key role in determining which among the potential candidates are to be prioritized. In silico approaches based on machine learning methods are becoming increasing popular, but are nonetheless limited by the availability of data. With a view to making both data and models available to the scientific community, we have developed FPADMET which is a repository of molecular fingerprint-based predictive models for ADMET properties. In this article, we have examined the efficacy of fingerprint-based machine learning models for a large number of ADMET-related properties. The predictive ability of a set of 20 different binary fingerprints (based on substructure keys, atom pairs, local path environments, as well as custom fingerprints such as all-shortest paths) for over 50 ADMET and ADMET-related endpoints have been evaluated as part of the study. We find that for a majority of the properties, fingerprint-based random forest models yield comparable or better performance compared with traditional 2D/3D molecular descriptors. AVAILABILITY The models are made available as part of open access software that can be downloaded from https://gitlab.com/vishsoft/fpadmet .
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Norwegian University of Science and Technology, Realfagbygget, Gløshaugen, Høgskoleringen, 7491, Trondheim, Norway.
| |
Collapse
|