1
|
Ng SS, Lu Y. Evaluating the Use of Graph Neural Networks and Transfer Learning for Oral Bioavailability Prediction. J Chem Inf Model 2023; 63:5035-5044. [PMID: 37582507 PMCID: PMC10467575 DOI: 10.1021/acs.jcim.3c00554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Indexed: 08/17/2023]
Abstract
Oral bioavailability is a pharmacokinetic property that plays an important role in drug discovery. Recently developed computational models involve the use of molecular descriptors, fingerprints, and conventional machine-learning models. However, determining the type of molecular descriptors requires domain expert knowledge and time for feature selection. With the emergence of the graph neural network (GNN), models can be trained to automatically extract features that they deem important. In this article, we exploited the automatic feature selection of GNN to predict oral bioavailability. To enhance the prediction performance of GNN, we utilized transfer learning by pre-training a model to predict solubility and obtained a final average accuracy of 0.797, an F1 score of 0.840, and an AUC-ROC of 0.867, which outperformed previous studies on predicting oral bioavailability with the same test data set.
Collapse
Affiliation(s)
- Sherwin
S. S. Ng
- School of Chemistry, Chemistry Engineering
and Biotechnology, Nanyang Technological
University, 21 Nanyang Link, Singapore 637371, Singapore
| | - Yunpeng Lu
- School of Chemistry, Chemistry Engineering
and Biotechnology, Nanyang Technological
University, 21 Nanyang Link, Singapore 637371, Singapore
| |
Collapse
|
2
|
Best practices in current models mimicking drug permeability in the gastrointestinal tract - an UNGAP review. Eur J Pharm Sci 2021; 170:106098. [PMID: 34954051 DOI: 10.1016/j.ejps.2021.106098] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 10/19/2021] [Accepted: 12/15/2021] [Indexed: 12/21/2022]
Abstract
The absorption of orally administered drug products is a complex, dynamic process, dependent on a range of biopharmaceutical properties; notably the aqueous solubility of a molecule, stability within the gastrointestinal tract (GIT) and permeability. From a regulatory perspective, the concept of high intestinal permeability is intrinsically linked to the fraction of the oral dose absorbed. The relationship between permeability and the extent of absorption means that experimental models of permeability have regularly been used as a surrogate measure to estimate the fraction absorbed. Accurate assessment of a molecule's intestinal permeability is of critical importance during the pharmaceutical development process of oral drug products, and the current review provides a critique of in vivo, in vitro and ex vivo approaches. The usefulness of in silico models to predict drug permeability is also discussed and an overview of solvent systems used in permeability assessments is provided. Studies of drug absorption in humans are an indirect indicator of intestinal permeability, but in vitro and ex vivo tools provide initial screening approaches are important tools for direct assessment of permeability in drug development. Continued refinement of the accuracy of in silico approaches and their validation with human in vivo data will facilitate more efficient characterisation of permeability earlier in the drug development process and will provide useful inputs for integrated, end-to-end absorption modelling.
Collapse
|
3
|
Bouhedjar K, Boukelia A, Khorief Nacereddine A, Boucheham A, Belaidi A, Djerourou A. A natural language processing approach based on embedding deep learning from heterogeneous compounds for quantitative structure-activity relationship modeling. Chem Biol Drug Des 2021; 96:961-972. [PMID: 33058460 DOI: 10.1111/cbdd.13742] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 05/27/2020] [Accepted: 05/31/2020] [Indexed: 12/15/2022]
Abstract
Over the past decade, rapid development in biological and chemical technologies such as high-throughput screening, parallel synthesis, has been significantly increased the amount of data, which requires the creation and the integration of new analytical methods, especially deep learning models. Recently, there is an increasing interest in deep learning utilization in computer-aided drug discovery due to its exceptional successful application in many fields. The present work proposed a natural language processing approach, based on embedding deep neural networks. Our method aims to transform the Simplified Molecular Input Line Entry System format into word embedding vectors to represent the semantics of compounds. These vectors are fed into supervised machine learning algorithms such as convolutional long short-term memory neural network, support vector machine, and random forest to build up quantitative structure-activity relationship models on toxicity data sets. The obtained results on toxicity data to the ciliate Tetrahymena pyriformis (IGC50 ), and acute toxicity rat data expressed as median lethal dose of treated rats (LD50 ) show that our approach can eventually be used to predict the activities of chemical compounds efficiently. All material used in this study is available online through the GitHub portal (https://github.com/BoukeliaAbdelbasset/NLPDeepQSAR.git).
Collapse
Affiliation(s)
- Khalid Bouhedjar
- Laboratoire de Synthèse et Biocatalyse Organique, Département de Chimie, Faculté des Sciences, Université Badji Mokhtar Annaba, Annaba, Algeria.,Laboratoire Bioinformatique, Centre de Recherche en Biotechnologie (CRBt), Constantine, Algeria
| | - Abdelbasset Boukelia
- Laboratoire Bioinformatique, Centre de Recherche en Biotechnologie (CRBt), Constantine, Algeria.,Computer Science Department, Faculty of NTIC University of Constantine 2 - Abdelhamid Mehri, Constantine, Algeria
| | - Abdelmalek Khorief Nacereddine
- Laboratory of Physical Chemistry and Biology of Materials, Department of Physics and Chemistry, Higher Normal School of Technological Education-Skikda, Skikda, Algeria
| | - Anouar Boucheham
- University Salah Boubnider Constantine, Constantine, Algeria.,Laboratory of Molecular and Cellular Biology, Constantine, Algeria
| | - Amine Belaidi
- Laboratoire Bioinformatique, Centre de Recherche en Biotechnologie (CRBt), Constantine, Algeria
| | - Abdelhafid Djerourou
- Laboratoire de Synthèse et Biocatalyse Organique, Département de Chimie, Faculté des Sciences, Université Badji Mokhtar Annaba, Annaba, Algeria
| |
Collapse
|
4
|
Ye Z, Yang W, Yang Y, Ouyang D. Interpretable machine learning methods for in vitro pharmaceutical formulation development. FOOD FRONTIERS 2021. [DOI: 10.1002/fft2.78] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Affiliation(s)
- Zhuyifan Ye
- State Key Laboratory of Quality Research in Chinese Medicine Institute of Chinese Medical Sciences (ICMS) University of Macau Macau China
| | - Wenmian Yang
- State Key Laboratory of Internet of Things for Smart City University of Macau Macau China
| | - Yilong Yang
- School of Software Beihang University Beijing China
| | - Defang Ouyang
- State Key Laboratory of Quality Research in Chinese Medicine Institute of Chinese Medical Sciences (ICMS) University of Macau Macau China
| |
Collapse
|
5
|
Przybyłek M. Application 2D Descriptors and Artificial Neural Networks for Beta-Glucosidase Inhibitors Screening. Molecules 2020; 25:E5942. [PMID: 33333961 PMCID: PMC7765417 DOI: 10.3390/molecules25245942] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 12/12/2020] [Accepted: 12/14/2020] [Indexed: 12/14/2022] Open
Abstract
Beta-glucosidase inhibitors play important medical and biological roles. In this study, simple two-variable artificial neural network (ANN) classification models were developed for beta-glucosidase inhibitors screening. All bioassay data were obtained from the ChEMBL database. The classifiers were generated using 2D molecular descriptors and the data miner tool available in the STATISTICA package (STATISTICA Automated Neural Networks, SANN). In order to evaluate the models' accuracy and select the best classifiers among automatically generated SANNs, the Matthews correlation coefficient (MCC) was used. The application of the combination of maxHBint3 and SpMax8_Bhs descriptors leads to the highest predicting abilities of SANNs, as evidenced by the averaged test set prediction results (MCC = 0.748) calculated for ten different dataset splits. Additionally, the models were analyzed employing receiver operating characteristics (ROC) and cumulative gain charts. The thirteen final classifiers obtained as a result of the model development procedure were applied for a natural compounds collection available in the BIOFACQUIM database. As a result of this beta-glucosidase inhibitors screening, eight compounds were univocally classified as active by all SANNs.
Collapse
Affiliation(s)
- Maciej Przybyłek
- Department of Physical Chemistry, Pharmacy Faculty, Collegium Medicum of Bydgoszcz, Nicolaus Copernicus University in Toruń, Kurpińskiego 5, 85-950 Bydgoszcz, Poland
| |
Collapse
|
6
|
Cerruela García G, Pérez-Parras Toledano J, de Haro García A, García-Pedrajas N. Filter feature selectors in the development of binary QSAR models. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2019; 30:313-345. [PMID: 31112077 DOI: 10.1080/1062936x.2019.1588160] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Accepted: 02/25/2019] [Indexed: 06/09/2023]
Abstract
The application of machine learning methods to the construction of quantitative structure-activity relationship models is a complex computational problem in which dimensionality reduction of the representation of the molecular structure plays a fundamental role in predicting a target activity. The feature selection pre-processing approach has been indicated to be effective in dimensionality reduction for building simpler and more understandable models. In this paper, a performance comparative study of 13 state-of-the-art feature selection filter methods is conducted. Structure-activity relationship models are constructed using three widely used classifiers and a diverse collection of datasets. The comparative study utilizes robust statistical tests to compare the algorithms. According to the experimental results, there are substantial differences in performance among the evaluated feature selection methods. The methods that exhibit the best performance are correlation-based feature selection, fast clustering-based feature selection and the set cover method.
Collapse
Affiliation(s)
- G Cerruela García
- a Department of Computing and Numerical Analysis , University of Córdoba, Campus de Rabanales, Albert Einstein Building , E-14071 Córdoba , Spain
| | - J Pérez-Parras Toledano
- a Department of Computing and Numerical Analysis , University of Córdoba, Campus de Rabanales, Albert Einstein Building , E-14071 Córdoba , Spain
| | - A de Haro García
- a Department of Computing and Numerical Analysis , University of Córdoba, Campus de Rabanales, Albert Einstein Building , E-14071 Córdoba , Spain
| | - N García-Pedrajas
- a Department of Computing and Numerical Analysis , University of Córdoba, Campus de Rabanales, Albert Einstein Building , E-14071 Córdoba , Spain
| |
Collapse
|
7
|
Cerruela García G, García-Pedrajas N. Boosted feature selectors: a case study on prediction P-gp inhibitors and substrates. J Comput Aided Mol Des 2018; 32:1273-1294. [PMID: 30367310 DOI: 10.1007/s10822-018-0171-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2018] [Accepted: 10/18/2018] [Indexed: 01/11/2023]
Abstract
Feature selection is commonly used as a preprocessing step to machine learning for improving learning performance, lowering computational complexity and facilitating model interpretation. This paper proposes the application of boosting feature selection to improve the classification performance of standard feature selection algorithms evaluated for the prediction of P-gp inhibitors and substrates. Two well-known classification algorithms, decision trees and support vector machines, were used to classify the chemical compounds. The experimental results showed better performance for boosting feature selection with respect to the standard feature selection algorithms while maintaining the capability for feature reduction.
Collapse
Affiliation(s)
- Gonzalo Cerruela García
- Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, 14071, Córdoba, Spain.
| | - Nicolás García-Pedrajas
- Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, 14071, Córdoba, Spain
| |
Collapse
|
8
|
Wolfson J, Venkatasubramaniam A. Branching Out: Use of Decision Trees in Epidemiology. CURR EPIDEMIOL REP 2018. [DOI: 10.1007/s40471-018-0163-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
9
|
Yang M, Chen J, Xu L, Shi X, Zhou X, Xi Z, An R, Wang X. A novel adaptive ensemble classification framework for ADME prediction. RSC Adv 2018; 8:11661-11683. [PMID: 35542768 PMCID: PMC9079056 DOI: 10.1039/c8ra01206g] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 03/20/2018] [Indexed: 12/20/2022] Open
Abstract
AECF is a GA based ensemble method. It includes four components which are (1) data balancing, (2) generating individual models, (3) combining individual models, and (4) optimizing the ensemble.
Collapse
Affiliation(s)
- Ming Yang
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
- Department of Chemistry
| | - Jialei Chen
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
| | - Liwen Xu
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
| | - Xiufeng Shi
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
| | - Xin Zhou
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
| | - Zhijun Xi
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
| | - Rui An
- Department of Chemistry
- College of Pharmacy
- Shanghai University of Traditional Chinese Medicine
- Shanghai
- People's Republic of China
| | - Xinhong Wang
- Department of Chemistry
- College of Pharmacy
- Shanghai University of Traditional Chinese Medicine
- Shanghai
- People's Republic of China
| |
Collapse
|
10
|
Yang H, Li X, Cai Y, Wang Q, Li W, Liu G, Tang Y. In silico prediction of chemical subcellular localization via multi-classification methods. MEDCHEMCOMM 2017; 8:1225-1234. [PMID: 30108833 PMCID: PMC6072212 DOI: 10.1039/c7md00074j] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 03/22/2017] [Indexed: 12/16/2022]
Abstract
Chemical subcellular localization is closely related to drug distribution in the body and hence important in drug discovery and design. Although many in vivo and in vitro methods have been developed, in silico methods play key roles in the prediction of chemical subcellular localization due to their low costs and high performance. For that purpose, machine learning-based methods were developed here. At first, 614 unique compounds localized in the lysosome, mitochondria, nucleus and plasma membrane were collected from the literature. 80% of the compounds were used to build the models and the rest as the external validation set. Both fingerprints and molecular descriptors were used to describe the molecules, and six machine learning methods were applied to build the multi-classification models. The performance of the models was measured by 5-fold cross-validation and external validation. We further detected key substructures for each localization and analyzed potential structure-localization relationships, which could be very helpful for molecular design and modification. The key substructures can also be used as features complementary to fingerprints to improve the performance of the models.
Collapse
Affiliation(s)
- Hongbin Yang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Xiao Li
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Yingchun Cai
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Qin Wang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| |
Collapse
|
11
|
Abstract
1. Biliary excretion of compounds is dependant on several transporter proteins for the active uptake of compounds from the blood into the hepatocytes. Organic anion-transporting polypeptides (OATPs) are some of the most abundant transporter proteins in the sinusoidal membrane and have been shown to have substrate specificity similar to the structural characteristics of cholephilic compounds. 2. In this study, we sought to use measures of OATP binding as predictors of biliary excretion in conjunction with molecular descriptors in a quantitative structure-activity relationship (QSAR) study. Percentage inhibitions of three subtypes of OATPs were used as surrogate indicators of OATP substrates. Several statistical modelling techniques were incorporated including classification and regression trees, boosted trees, random forest and multivariate adaptive regression splines (MARS) in order to first develop QSARs for the prediction of OATP inhibition of compounds. The predicted OATP percentage inhibition using selected models were then used as features of the QSAR models for the prediction of biliary excretion of compounds in rat. 3. The results indicated that incorporation of predicted OATP inhibition improves accuracy of biliary excretion models. The best result was obtained from a simple regression tree that used predicted OATP1B1 percentage inhibition at the root node of the tree.
Collapse
Affiliation(s)
- Mohsen Sharifi
- a Medway School of Pharmacy, Universities of Kent and Greenwich , Chatham , Kent , UK.,b Division of Systems Biology , National Center for Toxicological Research, US Food and Drug Administration , Jefferson , AR , USA , and
| | - Taravat Ghafourian
- a Medway School of Pharmacy, Universities of Kent and Greenwich , Chatham , Kent , UK.,c School of Life Sciences, University of Sussex , Falmer , Brighton , UK
| |
Collapse
|
12
|
Chen YS. A comprehensive identification-evidence based alternative for HIV/AIDS treatment with HAART in the healthcare industries. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016; 131:111-126. [PMID: 27265053 DOI: 10.1016/j.cmpb.2016.04.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Revised: 03/03/2016] [Accepted: 04/01/2016] [Indexed: 06/05/2023]
Abstract
BACKGROUND AND OBJECTIVE The HIV/AIDS-related issue has given rise to a priority concern in which potential new therapies are increasingly highlighted to lessen the negative impact of highly active anti-retroviral therapy (HAART) in the healthcare industry. With the motivation of "medical applications," this study focuses on the main advanced feature selection techniques and classification approaches that reflect a new architecture, and a trial to build a hybrid model for interested parties. METHODS This study first uses an integrated linear-nonlinear feature selection technique to identify the determinants influencing HAART medication and utilizes organizations of different condition-attributes to generate a hybrid model based on a rough set classifier to study evolving HIV/AIDS research in order to improve classification performance. RESULTS The proposed model makes use of a real data set from Taiwan's specialist medical center. The experimental results show that the proposed model yields a satisfactory result that is superior to the listed methods, and the core condition-attributes PVL, CD4, Code, Age, Year, PLT, and Sex were identified in the HIV/AIDS data set. In addition, the decision rule set created can be referenced as a knowledge-based healthcare service system as the best of evidence-based practices in the workflow of current clinical diagnosis. CONCLUSIONS This study highlights the importance of these key factors and provides the rationale that the proposed model is an effective alternative to analyzing sustained HAART medication in follow-up studies of HIV/AIDS treatment in practice.
Collapse
Affiliation(s)
- You-Shyang Chen
- Department of Information Management, Hwa Hsia University of Technology, 111, Gongzhuan Rd., Zhonghe Dist., New Taipei City 235, Taiwan.
| |
Collapse
|
13
|
Antanasijević D, Antanasijević J, Pocajt V, Ušćumlić G. A GMDH-type neural network with multi-filter feature selection for the prediction of transition temperatures of bent-core liquid crystals. RSC Adv 2016. [DOI: 10.1039/c6ra15056j] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The QSPR study on transition temperatures of five-ring bent-core LCs was performed using GMDH-type neural networks. A novel multi-filter approach, which combines chi square ranking, v-WSH and GMDH algorithm was used for the selection of descriptors.
Collapse
Affiliation(s)
- Davor Antanasijević
- Innovation Center of the Faculty of Technology and Metallurgy
- 11120 Belgrade
- Serbia
| | | | - Viktor Pocajt
- University of Belgrade
- Faculty of Technology and Metallurgy
- 11120 Belgrade
- Serbia
| | - Gordana Ušćumlić
- University of Belgrade
- Faculty of Technology and Metallurgy
- 11120 Belgrade
- Serbia
| |
Collapse
|
14
|
Abstract
In recent decades, in silico absorption, distribution, metabolism, excretion (ADME), and toxicity (T) modelling as a tool for rational drug design has received considerable attention from pharmaceutical scientists, and various ADME/T-related prediction models have been reported. The high-throughput and low-cost nature of these models permits a more streamlined drug development process in which the identification of hits or their structural optimization can be guided based on a parallel investigation of bioavailability and safety, along with activity. However, the effectiveness of these tools is highly dependent on their capacity to cope with needs at different stages, e.g. their use in candidate selection has been limited due to their lack of the required predictability. For some events or endpoints involving more complex mechanisms, the current in silico approaches still need further improvement. In this review, we will briefly introduce the development of in silico models for some physicochemical parameters, ADME properties and toxicity evaluation, with an emphasis on the modelling approaches thereof, their application in drug discovery, and the potential merits or deficiencies of these models. Finally, the outlook for future ADME/T modelling based on big data analysis and systems sciences will be discussed.
Collapse
|
15
|
Freitas AA, Limbu K, Ghafourian T. Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients. J Cheminform 2015; 7:6. [PMID: 25767566 PMCID: PMC4356883 DOI: 10.1186/s13321-015-0054-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2014] [Accepted: 01/27/2015] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. RESULTS Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. CONCLUSIONS Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.
Collapse
Affiliation(s)
- Alex A Freitas
- />School of Computing, University of Kent, Canterbury, CT2 7NF UK
| | - Kriti Limbu
- />Medway School of Pharmacy, Universities of Kent and Greenwich, Chatham, Kent, ME4 4TB UK
| | - Taravat Ghafourian
- />Medway School of Pharmacy, Universities of Kent and Greenwich, Chatham, Kent, ME4 4TB UK
- />Drug Applied Research Centre and Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
16
|
Newby D, Freitas AA, Ghafourian T. Decision trees to characterise the roles of permeability and solubility on the prediction of oral absorption. Eur J Med Chem 2014; 90:751-65. [PMID: 25528330 DOI: 10.1016/j.ejmech.2014.12.006] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2014] [Revised: 12/02/2014] [Accepted: 12/03/2014] [Indexed: 01/11/2023]
Abstract
Oral absorption of compounds depends on many physiological, physiochemical and formulation factors. Two important properties that govern oral absorption are in vitro permeability and solubility, which are commonly used as indicators of human intestinal absorption. Despite this, the nature and exact characteristics of the relationship between these parameters are not well understood. In this study a large dataset of human intestinal absorption was collated along with in vitro permeability, aqueous solubility, melting point, and maximum dose for the same compounds. The dataset allowed a permeability threshold to be established objectively to predict high or low intestinal absorption. Using this permeability threshold, classification decision trees incorporating a solubility-related parameter such as experimental or predicted solubility, or the melting point based absorption potential (MPbAP), along with structural molecular descriptors were developed and validated to predict oral absorption class. The decision trees were able to determine the individual roles of permeability and solubility in oral absorption process. Poorly permeable compounds with high solubility show low intestinal absorption, whereas poorly water soluble compounds with high or low permeability may have high intestinal absorption provided that they have certain molecular characteristics such as a small polar surface or specific topology.
Collapse
Affiliation(s)
- Danielle Newby
- Medway School of Pharmacy, Universities of Kent and Greenwich, Chatham, Kent ME4 4TB, UK
| | - Alex A Freitas
- School of Computing, University of Kent, Canterbury, Kent CT2 7NF, UK
| | - Taravat Ghafourian
- Medway School of Pharmacy, Universities of Kent and Greenwich, Chatham, Kent ME4 4TB, UK; Drug Applied Research Centre and Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran.
| |
Collapse
|
17
|
Newby D, Freitas AA, Ghafourian T. Comparing multilabel classification methods for provisional biopharmaceutics class prediction. Mol Pharm 2014; 12:87-102. [PMID: 25397721 DOI: 10.1021/mp500457t] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
The biopharmaceutical classification system (BCS) is now well established and utilized for the development and biowaivers of immediate oral dosage forms. The prediction of BCS class can be carried out using multilabel classification. Unlike single label classification, multilabel classification methods predict more than one class label at the same time. This paper compares two multilabel methods, binary relevance and classifier chain, for provisional BCS class prediction. Large data sets of permeability and solubility of drug and drug-like compounds were obtained from the literature and were used to build models using decision trees. The separate permeability and solubility models were validated, and a BCS validation set of 127 compounds where both permeability and solubility were known was used to compare the two aforementioned multilabel classification methods for provisional BCS class prediction. Overall, the results indicate that the classifier chain method, which takes into account label interactions, performed better compared to the binary relevance method. This work offers a comparison of multilabel methods and shows the potential of the classifier chain multilabel method for improved biological property predictions for use in drug discovery and development.
Collapse
Affiliation(s)
- Danielle Newby
- Medway School of Pharmacy, Universities of Kent and Greenwich , Chatham, Kent, ME4 4TB, U.K
| | | | | |
Collapse
|
18
|
Geraets L, Bessems JG, Zeilmaker MJ, Bos PM. Human risk assessment of dermal and inhalation exposures to chemicals assessed by route-to-route extrapolation: The necessity of kinetic data. Regul Toxicol Pharmacol 2014; 70:54-64. [DOI: 10.1016/j.yrtph.2014.05.024] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2014] [Accepted: 05/30/2014] [Indexed: 10/25/2022]
|