1
|
Pu C, Gu L, Hu Y, Han W, Xu X, Liu H, Chen Y, Zhang Y. Prediction of Human Liver Microsome Clearance with Chirality-Focused Graph Neural Networks. J Chem Inf Model 2024; 64:5427-5438. [PMID: 38976447 DOI: 10.1021/acs.jcim.4c00243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
In drug candidate design, clearance is one of the most crucial pharmacokinetic parameters to consider. Recent advancements in machine learning techniques coupled with the growing accumulation of drug data have paved the way for the construction of computational models to predict drug clearance. However, concerns persist regarding the reliability of data collected from public sources, and a majority of current in silico quantitative structure-property relationship models tend to neglect the influence of molecular chirality. In this study, we meticulously examined human liver microsome (HLM) data from public databases and constructed two distinct data sets with varying HLM data quantity and quality. Two baseline models (RF and DNN) and three chirality-focused GNNs (DMPNN, TetraDMPNN, and ChIRo) were proposed, and their performance on HLM data was evaluated and compared with each other. The TetraDMPNN model, which leverages chirality from 2D structure, exhibited the best performance with a test R2 of 0.639 and a test root-mean-squared error of 0.429. The applicability domain of the model was also defined by using a molecular similarity-based method. Our research indicates that graph neural networks capable of capturing molecular chirality have significant potential for practical application and can deliver superior performance.
Collapse
Affiliation(s)
- Chengtao Pu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Lingxi Gu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yuxuan Hu
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Weijie Han
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Xiaohe Xu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| |
Collapse
|
2
|
Peteani G, Huynh MTD, Gerebtzoff G, Rodríguez-Pérez R. Application of machine learning models for property prediction to targeted protein degraders. Nat Commun 2024; 15:5764. [PMID: 38982061 PMCID: PMC11233499 DOI: 10.1038/s41467-024-49979-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 06/21/2024] [Indexed: 07/11/2024] Open
Abstract
Machine learning (ML) systems can model quantitative structure-property relationships (QSPR) using existing experimental data and make property predictions for new molecules. With the advent of modalities such as targeted protein degraders (TPD), the applicability of QSPR models is questioned and ML usage in TPD-centric projects remains limited. Herein, ML models are developed and evaluated for TPDs' property predictions, including passive permeability, metabolic clearance, cytochrome P450 inhibition, plasma protein binding, and lipophilicity. Interestingly, performance on TPDs is comparable to that of other modalities. Predictions for glues and heterobifunctionals often yield lower and higher errors, respectively. For permeability, CYP3A4 inhibition, and human and rat microsomal clearance, misclassification errors into high and low risk categories are lower than 4% for glues and 15% for heterobifunctionals. For all modalities, misclassification errors range from 0.8% to 8.1%. Investigated transfer learning strategies improve predictions for heterobifunctionals. This is the first comprehensive evaluation of ML for the prediction of absorption, distribution, metabolism, and excretion (ADME) and physicochemical properties of TPD molecules, including heterobifunctional and molecular glue sub-modalities. Taken together, our investigations show that ML-based QSPR models are applicable to TPDs and support ML usage for TPDs' design, to potentially accelerate drug discovery.
Collapse
Affiliation(s)
- Giulia Peteani
- Novartis Biomedical Research, Novartis Campus, 4002, Basel, Switzerland
| | | | | | | |
Collapse
|
3
|
Walter M, Borghardt JM, Humbeck L, Skalic M. Multi-Task ADME/PK prediction at industrial scale: leveraging large and diverse experimentaldatasets. Mol Inform 2024:e202400079. [PMID: 38973777 DOI: 10.1002/minf.202400079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 04/10/2024] [Accepted: 05/04/2024] [Indexed: 07/09/2024]
Abstract
ADME (Absorption, Distribution, Metabolism, Excretion) properties are key parameters to judge whether a drug candidate exhibits a desired pharmacokinetic (PK) profile. In this study, we tested multi-task machine learning (ML) models to predict ADME and animal PK endpoints trained on in-house data generated at Boehringer Ingelheim. Models were evaluated both at the design stage of a compound (i. e., no experimental data of test compounds available) and at testing stage when a particular assay would be conducted (i. e., experimental data of earlier conducted assays may be available). Using realistic time-splits, we found a clear benefit in performance of multi-task graph-based neural network models over single-task model, which was even stronger when experimental data of earlier assays is available. In an attempt to explain the success of multi-task models, we found that especially endpoints with the largest numbers of data points (physicochemical endpoints, clearance in microsomes) are responsible for increased predictivity in more complex ADME and PK endpoints. In summary, our study provides insight into how data for multiple ADME/PK endpoints in a pharmaceutical company can be best leveraged to optimize predictivity of ML models.
Collapse
Affiliation(s)
- Moritz Walter
- Medicinal Chemistry Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
| | - Jens M Borghardt
- Drug Discovery Sciences Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
| | - Lina Humbeck
- Medicinal Chemistry Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
| | - Miha Skalic
- Medicinal Chemistry Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
| |
Collapse
|
4
|
Long TZ, Jiang DJ, Shi SH, Deng YC, Wang WX, Cao DS. Enhancing Multi-species Liver Microsomal Stability Prediction through Artificial Intelligence. J Chem Inf Model 2024; 64:3222-3236. [PMID: 38498003 DOI: 10.1021/acs.jcim.4c00159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Liver microsomal stability, a crucial aspect of metabolic stability, significantly impacts practical drug discovery. However, current models for predicting liver microsomal stability are based on limited molecular information from a single species. To address this limitation, we constructed the largest public database of compounds from three common species: human, rat, and mouse. Subsequently, we developed a series of classification models using both traditional descriptor-based and classic graph-based machine learning (ML) algorithms. Remarkably, the best-performing models for the three species achieved Matthews correlation coefficients (MCCs) of 0.616, 0.603, and 0.574, respectively, on the test set. Furthermore, through the construction of consensus models based on these individual models, we have demonstrated their superior predictive performance in comparison with the existing models of the same type. To explore the similarities and differences in the properties of liver microsomal stability among multispecies molecules, we conducted preliminary interpretative explorations using the Shapley additive explanations (SHAP) and atom heatmap approaches for the models and misclassified molecules. Additionally, we further investigated representative structural modifications and substructures that decrease the liver microsomal stability in different species using the matched molecule pair analysis (MMPA) method and substructure extraction techniques. The established prediction models, along with insightful interpretation information regarding liver microsomal stability, will significantly contribute to enhancing the efficiency of exploring practical drugs for development.
Collapse
Affiliation(s)
- Teng-Zhi Long
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - De-Jun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Shao-Hua Shi
- Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR 999077, P. R. China
| | - You-Chao Deng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Wen-Xuan Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR 999077, P. R. China
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
| |
Collapse
|
5
|
Trunzer M, Teigão J, Huth F, Poller B, Desrayaud S, Rodríguez-Pérez R, Faller B. Improving In Vitro-In Vivo Extrapolation of Clearance Using Rat Liver Microsomes for Highly Plasma Protein-Bound Molecules. Drug Metab Dispos 2024; 52:345-354. [PMID: 38360916 DOI: 10.1124/dmd.123.001597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 02/07/2024] [Accepted: 02/12/2024] [Indexed: 02/17/2024] Open
Abstract
It is common practice in drug discovery and development to predict in vivo hepatic clearance from in vitro incubations with liver microsomes or hepatocytes using the well-stirred model (WSM). When applying the WSM to a set of approximately 3000 Novartis research compounds, 73% of neutral and basic compounds (extended clearance classification system [ECCS] class 2) were well-predicted within 3-fold. In contrast, only 44% (ECCS class 1A) or 34% (ECCS class 1B) of acids were predicted within 3-fold. To explore the hypothesis whether the higher degree of plasma protein binding for acids contributes to the in vitro-in vivo correlation (IVIVC) disconnect, 68 proprietary compounds were incubated with rat liver microsomes in the presence and absence of 5% plasma. A minor impact of plasma on clearance IVIVC was found for moderately bound compounds (fraction unbound in plasma [fup] ≥1%). However, addition of plasma significantly improved the IVIVC for highly bound compounds (fup <1%) as indicated by an increase of the average fold error from 0.10 to 0.36. Correlating fup with the scaled unbound intrinsic clearance ratio in the presence or absence of plasma allowed the establishment of an empirical, nonlinear correction equation that depends on fup Taken together, estimation of the metabolic clearance of highly bound compounds was enhanced by the addition of plasma to microsomal incubations. For standard incubations in buffer only, application of an empirical correction provided improved clearance predictions. SIGNIFICANCE STATEMENT: Application of the well-stirred liver model for clearance in vitro-in vivo extrapolation (IVIVE) in rat generally underpredicts the clearance of acids and the strong protein binding of acids is suspected to be one responsible factor. Unbound intrinsic in vitro clearance (CLint,u) determinations using rat liver microsomes supplemented with 5% plasma resulted in an improved IVIVE. An empirical equation was derived that can be applied to correct CLint,u-values in dependance of fraction unbound in plasma (fup) and measured CLint in buffer.
Collapse
Affiliation(s)
- Markus Trunzer
- Pharmacokinetic Sciences, Novartis Pharma AG, Basel, Switzerland
| | - Joana Teigão
- Pharmacokinetic Sciences, Novartis Pharma AG, Basel, Switzerland
| | - Felix Huth
- Pharmacokinetic Sciences, Novartis Pharma AG, Basel, Switzerland
| | - Birk Poller
- Pharmacokinetic Sciences, Novartis Pharma AG, Basel, Switzerland
| | | | | | - Bernard Faller
- Pharmacokinetic Sciences, Novartis Pharma AG, Basel, Switzerland
| |
Collapse
|
6
|
Fluetsch A, Trunzer M, Gerebtzoff G, Rodríguez-Pérez R. Deep Learning Models Compared to Experimental Variability for the Prediction of CYP3A4 Time-Dependent Inhibition. Chem Res Toxicol 2024; 37:549-560. [PMID: 38501689 DOI: 10.1021/acs.chemrestox.3c00305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Most drugs are mainly metabolized by cytochrome P450 (CYP450), which can lead to drug-drug interactions (DDI). Specifically, time-dependent inhibition (TDI) of CYP3A4 isoenzyme has been associated with clinically relevant DDI. To overcome potential DDI issues, high-throughput in vitro assays were established to assess the TDI of CYP3A4 during the discovery and lead optimization phases. However, in silico machine learning models would enable an earlier and larger-scale assessment of TDI potential liabilities. For CYP inhibition, most modeling efforts have focused on highly imbalanced and small data sets. Moreover, assay variability is rarely considered, which is key to understand the model's quality and suitability for decision-making. In this work, machine learning models were built for the prediction of TDI of CYP3A4, evaluated prospectively, and compared to the variability of the experimental assay. Different modeling strategies were investigated to assess their influence on the model's performance. Through multitask learning, additional data sets were leveraged for model building, coming from public databases, in-house CYP-related assays, or other pharmaceutical companies (federated learning). Apart from the numerical prediction of inactivation rates of CYP3A4 TDI, three-class predictions were carried out, giving a negative (inactivation rate kobs < 0.01 min-1), weak positive (0.01 ≤ kobs ≤ 0.025 min-1), or positive (kobs > 0.025 min-1) output. The final multitask graph neural network model achieved misclassification rates of 8 and 7% for positive and negative TDI, respectively. Importantly, the presented deep learning-based predictions had a similar precision to the reproducibility of in vitro experiments and thus offered great opportunities for drug design, early derisk of DDI potential, and selection of experiments. To facilitate CYP inhibition modeling efforts in the public domain, the developed model was used to annotate ∼16 000 publicly available structures, and a surrogate data set is shared as Supporting Information.
Collapse
Affiliation(s)
- Andrin Fluetsch
- Novartis Biomedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Markus Trunzer
- Novartis Biomedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Grégori Gerebtzoff
- Novartis Biomedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | | |
Collapse
|
7
|
Fluetsch A, Di Lascio E, Gerebtzoff G, Rodríguez-Pérez R. Adapting Deep Learning QSPR Models to Specific Drug Discovery Projects. Mol Pharm 2024; 21:1817-1826. [PMID: 38373038 DOI: 10.1021/acs.molpharmaceut.3c01124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Medicinal chemistry and drug design efforts can be assisted by machine learning (ML) models that relate the molecular structure to compound properties. Such quantitative structure-property relationship models are generally trained on large data sets that include diverse chemical series (global models). In the pharmaceutical industry, these ML global models are available across discovery projects as an "out-of-the-box" solution to assist in drug design, synthesis prioritization, and experiment selection. However, drug discovery projects typically focus on confined parts of the chemical space (e.g., chemical series), where global models might not be applicable. Local ML models are sometimes generated to focus on specific projects or series. Herein, ML-based global models, local models, and hybrid global-local strategies were benchmarked. Analyses were done for more than 300 drug discovery projects at Novartis and ten absorption, distribution, metabolism, and excretion (ADME) assays. In this work, hybrid global-local strategies based on transfer learning approaches were proposed to leverage both historical ADME data (global) and project-specific data (local) to adapt model predictions. Fine-tuning a pretrained global ML model (used for weights' initialization, WI) was the top-performing method. Average improvements of mean absolute errors across all assays were 16% and 27% compared with global and local models, respectively. Interestingly, when the effect of training set size was analyzed, WI fine-tuning was found to be successful even in low-data scenarios (e.g., ∼10 molecules per project). Taken together, this work highlights the potential of domain adaptation in the field of molecular property predictions to refine existing pretrained models on a new compound data distribution.
Collapse
Affiliation(s)
- Andrin Fluetsch
- Novartis Biomedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Elena Di Lascio
- Novartis Biomedical Research, Novartis Campus, Basel 4002, Switzerland
| | | | | |
Collapse
|
8
|
Guo W, Dong Y, Hao GF. Transfer learning empowers accurate pharmacokinetics prediction of small samples. Drug Discov Today 2024; 29:103946. [PMID: 38460571 DOI: 10.1016/j.drudis.2024.103946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 02/22/2024] [Accepted: 03/05/2024] [Indexed: 03/11/2024]
Abstract
Accurate assessment of pharmacokinetic (PK) properties is crucial for selecting optimal candidates and avoiding downstream failures. Transfer learning is an innovative machine learning approach enabling high-throughput prediction with limited data. Recently, transfer learning methods showed promise in predicting ADME/PK parameters. Given the prolific growth of research on transfer learning for PK prediction, a comprehensive review of its advantages and challenges is imperative. This study explores the fundamentals, classifications, toolkits and applications of various transfer learning techniques for PK prediction, demonstrating their utility through three practical case studies. This work will serve as a reference for drug design researchers.
Collapse
Affiliation(s)
- Wenbo Guo
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Guizhou University, Guiyang 550025, China
| | - Yawen Dong
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China.
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Guizhou University, Guiyang 550025, China.
| |
Collapse
|
9
|
Wojtuch A, Danel T, Podlewska S, Maziarka Ł. Extended study on atomic featurization in graph neural networks for molecular property prediction. J Cheminform 2023; 15:81. [PMID: 37726841 PMCID: PMC10507875 DOI: 10.1186/s13321-023-00751-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 08/23/2023] [Indexed: 09/21/2023] Open
Abstract
Graph neural networks have recently become a standard method for analyzing chemical compounds. In the field of molecular property prediction, the emphasis is now on designing new model architectures, and the importance of atom featurization is oftentimes belittled. When contrasting two graph neural networks, the use of different representations possibly leads to incorrect attribution of the results solely to the network architecture. To better understand this issue, we compare multiple atom representations by evaluating them on the prediction of free energy, solubility, and metabolic stability using graph convolutional networks. We discover that the choice of atom representation has a significant impact on model performance and that the optimal subset of features is task-specific. Additional experiments involving more sophisticated architectures, including graph transformers, support these findings. Moreover, we demonstrate that some commonly used atom features, such as the number of neighbors or the number of hydrogens, can be easily predicted using only information about bonds and atom type, yet their explicit inclusion in the representation has a positive impact on model performance. Finally, we explain the predictions of the best-performing models to better understand how they utilize the available atomic features.
Collapse
Affiliation(s)
- Agnieszka Wojtuch
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Kraków, Poland.
| | - Tomasz Danel
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Smętna 12, 31-343, Kraków, Poland
| | - Łukasz Maziarka
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Kraków, Poland
| |
Collapse
|
10
|
Lanini J, Santarossa G, Sirockin F, Lewis R, Fechner N, Misztela H, Lewis S, Maziarz K, Stanley M, Segler M, Stiefl N, Schneider N. PREFER: A New Predictive Modeling Framework for Molecular Discovery. J Chem Inf Model 2023; 63:4497-4504. [PMID: 37487018 DOI: 10.1021/acs.jcim.3c00523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]
Abstract
Machine-learning and deep-learning models have been extensively used in cheminformatics to predict molecular properties, to reduce the need for direct measurements, and to accelerate compound prioritization. However, different setups and frameworks and the large number of molecular representations make it difficult to properly evaluate, reproduce, and compare them. Here we present a new PREdictive modeling FramEwoRk for molecular discovery (PREFER), written in Python (version 3.7.7) and based on AutoSklearn (version 0.14.7), that allows comparison between different molecular representations and common machine-learning models. We provide an overview of the design of our framework and show exemplary use cases and results of several representation-model combinations on diverse data sets, both public and in-house. Finally, we discuss the use of PREFER on small data sets. The code of the framework is freely available on GitHub.
Collapse
Affiliation(s)
- Jessica Lanini
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Gianluca Santarossa
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Finton Sirockin
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Richard Lewis
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Nikolas Fechner
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | | | - Sarah Lewis
- Microsoft Research AI4Science, Cambridge CB1 2FB, U.K
| | | | - Megan Stanley
- Microsoft Research AI4Science, Cambridge CB1 2FB, U.K
| | - Marwin Segler
- Microsoft Research AI4Science, Cambridge CB1 2FB, U.K
| | - Nikolaus Stiefl
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Nadine Schneider
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| |
Collapse
|
11
|
Du BX, Long Y, Li X, Wu M, Shi JY. CMMS-GCL: cross-modality metabolic stability prediction with graph contrastive learning. Bioinformatics 2023; 39:btad503. [PMID: 37572298 PMCID: PMC10457661 DOI: 10.1093/bioinformatics/btad503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 07/26/2023] [Accepted: 08/11/2023] [Indexed: 08/14/2023] Open
Abstract
MOTIVATION Metabolic stability plays a crucial role in the early stages of drug discovery and development. Accurately modeling and predicting molecular metabolic stability has great potential for the efficient screening of drug candidates as well as the optimization of lead compounds. Considering wet-lab experiment is time-consuming, laborious, and expensive, in silico prediction of metabolic stability is an alternative choice. However, few computational methods have been developed to address this task. In addition, it remains a significant challenge to explain key functional groups determining metabolic stability. RESULTS To address these issues, we develop a novel cross-modality graph contrastive learning model named CMMS-GCL for predicting the metabolic stability of drug candidates. In our framework, we design deep learning methods to extract features for molecules from two modality data, i.e. SMILES sequence and molecule graph. In particular, for the sequence data, we design a multihead attention BiGRU-based encoder to preserve the context of symbols to learn sequence representations of molecules. For the graph data, we propose a graph contrastive learning-based encoder to learn structure representations by effectively capturing the consistencies between local and global structures. We further exploit fully connected neural networks to combine the sequence and structure representations for model training. Extensive experimental results on two datasets demonstrate that our CMMS-GCL consistently outperforms seven state-of-the-art methods. Furthermore, a collection of case studies on sequence data and statistical analyses of the graph structure module strengthens the validation of the interpretability of crucial functional groups recognized by CMMS-GCL. Overall, CMMS-GCL can serve as an effective and interpretable tool for predicting metabolic stability, identifying critical functional groups, and thus facilitating the drug discovery process and lead compound optimization. AVAILABILITY AND IMPLEMENTATION The code and data underlying this article are freely available at https://github.com/dubingxue/CMMS-GCL.
Collapse
Affiliation(s)
- Bing-Xue Du
- School of Life Sciences, Northwestern Polytechnical University, Xi’an 710072, China
- Institute for Infocomm Research (IR), Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore
| | - Yahui Long
- Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), Singapore 138648, Singapore
| | - Xiaoli Li
- Institute for Infocomm Research (IR), Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore
| | - Min Wu
- Institute for Infocomm Research (IR), Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi’an 710072, China
| |
Collapse
|
12
|
Amara K, Rodríguez-Pérez R, Jiménez-Luna J. Explaining compound activity predictions with a substructure-aware loss for graph neural networks. J Cheminform 2023; 15:67. [PMID: 37491407 PMCID: PMC10369817 DOI: 10.1186/s13321-023-00733-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 07/08/2023] [Indexed: 07/27/2023] Open
Abstract
Explainable machine learning is increasingly used in drug discovery to help rationalize compound property predictions. Feature attribution techniques are popular choices to identify which molecular substructures are responsible for a predicted property change. However, established molecular feature attribution methods have so far displayed low performance for popular deep learning algorithms such as graph neural networks (GNNs), especially when compared with simpler modeling alternatives such as random forests coupled with atom masking. To mitigate this problem, a modification of the regression objective for GNNs is proposed to specifically account for common core structures between pairs of molecules. The presented approach shows higher accuracy on a recently-proposed explainability benchmark. This methodology has the potential to assist with model explainability in drug discovery pipelines, particularly in lead optimization efforts where specific chemical series are investigated.
Collapse
Affiliation(s)
- Kenza Amara
- Microsoft Research AI4Science, 21 Station Rd., Cambridge, CB1 2FB UK
- Department of Computer Science, ETH Zurich, Andreasstrasse 5, 8050 Zurich, Switzerland
| | | | - José Jiménez-Luna
- Microsoft Research AI4Science, 21 Station Rd., Cambridge, CB1 2FB UK
| |
Collapse
|
13
|
Di Lascio E, Gerebtzoff G, Rodríguez-Pérez R. Systematic Evaluation of Local and Global Machine Learning Models for the Prediction of ADME Properties. Mol Pharm 2023; 20:1758-1767. [PMID: 36745394 DOI: 10.1021/acs.molpharmaceut.2c00962] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Machine learning (ML) has become an indispensable tool to predict absorption, distribution, metabolism, and excretion (ADME) properties in pharmaceutical research. ML algorithms are trained on molecular structures and corresponding ADME assay data to develop quantitative structure-property relationship (QSPR) models. Traditional QSPR models were trained on compound sets of limited size. With the advent of more complex ML algorithms and data availability, training sets have become larger and more diverse. Most common training approaches consist in either training a model with a small set of similar compounds, namely, compounds designed for the same drug discovery project or chemical series (local model approach) or with a larger set of diverse compounds (global model approach). Global models are built with all experimental data available for an assay, combining compound data from different projects and disease areas. Despite the ML progress made so far, the choice of the appropriate data composition for building ML models is still unclear. Herein, a systematic evaluation of local and global ML models was performed for 10 different experimental assays and 112 drug discovery projects. Results show a consistent superior performance of global models for ADME property predictions. Diagnostic analyses were also carried out to investigate the influence of training set size, structural diversity, and data shift in the relative performance of local and global ML models. Training set and structural diversity did not have an impact in the relative performance on the methods. Instead, data shift helped to identify the projects with larger performance differences between local and global models. Results presented in this work can be leveraged to improve ML-based ADME properties predictions and thus decision-making in drug discovery projects.
Collapse
Affiliation(s)
- Elena Di Lascio
- Novartis Institutes for Biomedical Research, Novartis Campus, BaselCH-4002, Switzerland
| | - Grégori Gerebtzoff
- Novartis Institutes for Biomedical Research, Novartis Campus, BaselCH-4002, Switzerland
| | | |
Collapse
|