1
|
Bunally SB, Luscombe CN, Young RJ. Using Physicochemical Measurements to Influence Better Compound Design. SLAS DISCOVERY 2019; 24:791-801. [PMID: 31429385 DOI: 10.1177/2472555219859845] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
During the past decade, the physicochemical quality of molecules under investigation at all stages of the drug discovery process has come under particular scrutiny. The issues associated with excessive lipophilicity and poor solubility in particular are many and varied, ranging from poor outcomes in screening campaigns to promiscuity, limited and/or poorly predictable pharmacokinetic exposure, and, ultimately, greater chances of clinical failure. In this review, contemporary methods to secure key measurements are described along with their relevance to understanding the behavior of molecules in environments pertinent to pharmacological activity. Together, the various measurements contribute to predictive models of both the physicochemical properties themselves and the outcomes they influence.
Collapse
Affiliation(s)
| | | | - Robert J Young
- 1 GlaxoSmithKline Medicines Research Centre, Stevenage, UK
| |
Collapse
|
2
|
Lambrinidis G, Tsantili-Kakoulidou A. Challenges with multi-objective QSAR in drug discovery. Expert Opin Drug Discov 2018; 13:851-859. [DOI: 10.1080/17460441.2018.1496079] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- George Lambrinidis
- Division of Pharmaceutical Chemistry, Department of Pharmacy, National and Kapodistrian University of Athens, Zografou, Athens, Greece
| | - Anna Tsantili-Kakoulidou
- Division of Pharmaceutical Chemistry, Department of Pharmacy, National and Kapodistrian University of Athens, Zografou, Athens, Greece
| |
Collapse
|
3
|
Önlü S, Türker Saçan M. Impact of geometry optimization methods on QSAR modelling: A case study for predicting human serum albumin binding affinity. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2017; 28:491-509. [PMID: 28705017 DOI: 10.1080/1062936x.2017.1343253] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Accepted: 06/09/2017] [Indexed: 06/07/2023]
Abstract
Quantitative structure-activity relationship (QSAR) modelling is a major tool employed in the prediction of various endpoints. However, current QSAR literature is missing a full understanding of the impact of quantum chemical calculation methods on the estimation of molecular descriptors and model performance. Here, we provide a comprehensive analysis of the quantitative effects of different geometry optimization methods (semi-empirical, ab initio Hartee-Fock and density functional theory) on the molecular descriptors. Using experimental binding affinity to human serum albumin (HSA) data, we comparatively investigated the influence of employing descriptors derived from three calculation methods on the QSAR models. We propose a 4-descriptor QSAR model in line with the OECD validation principles for the prediction of drug binding affinity to HSA (log KHSA) as a potential tool for drug development. We also confirm the prediction capability of the proposed model on a heterogeneous external set of chemicals. Furthermore, we recommend an activity-independent rational approach for the selection of geometry optimization method for an improved QSAR model development.
Collapse
Affiliation(s)
- S Önlü
- a Boğaziçi University, Institute of Environmental Sciences , Hisar Campus, Istanbul , Turkey
| | - M Türker Saçan
- a Boğaziçi University, Institute of Environmental Sciences , Hisar Campus, Istanbul , Turkey
| |
Collapse
|
4
|
Abstract
Drug discovery utilizes chemical biology and computational drug design approaches for the efficient identification and optimization of lead compounds. Chemical biology is mostly involved in the elucidation of the biological function of a target and the mechanism of action of a chemical modulator. On the other hand, computer-aided drug design makes use of the structural knowledge of either the target (structure-based) or known ligands with bioactivity (ligand-based) to facilitate the determination of promising candidate drugs. Various virtual screening techniques are now being used by both pharmaceutical companies and academic research groups to reduce the cost and time required for the discovery of a potent drug. Despite the rapid advances in these methods, continuous improvements are critical for future drug discovery tools. Advantages presented by structure-based and ligand-based drug design suggest that their complementary use, as well as their integration with experimental routines, has a powerful impact on rational drug design. In this article, we give an overview of the current computational drug design and their application in integrated rational drug development to aid in the progress of drug discovery research.
Collapse
Affiliation(s)
- Stephani Joy Y Macalino
- National Leading Research Laboratory of Molecular Modeling and Drug Design, College of Pharmacy and Graduate School of Pharmaceutical Sciences, and Global Top 5 Research Program, Ewha Womans University, Seoul, 120-750, Korea
| | - Vijayakumar Gosu
- National Leading Research Laboratory of Molecular Modeling and Drug Design, College of Pharmacy and Graduate School of Pharmaceutical Sciences, and Global Top 5 Research Program, Ewha Womans University, Seoul, 120-750, Korea
| | - Sunhye Hong
- National Leading Research Laboratory of Molecular Modeling and Drug Design, College of Pharmacy and Graduate School of Pharmaceutical Sciences, and Global Top 5 Research Program, Ewha Womans University, Seoul, 120-750, Korea
| | - Sun Choi
- National Leading Research Laboratory of Molecular Modeling and Drug Design, College of Pharmacy and Graduate School of Pharmaceutical Sciences, and Global Top 5 Research Program, Ewha Womans University, Seoul, 120-750, Korea.
| |
Collapse
|
5
|
Lambrinidis G, Vallianatou T, Tsantili-Kakoulidou A. In vitro, in silico and integrated strategies for the estimation of plasma protein binding. A review. Adv Drug Deliv Rev 2015; 86:27-45. [PMID: 25819487 DOI: 10.1016/j.addr.2015.03.011] [Citation(s) in RCA: 73] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Revised: 02/11/2015] [Accepted: 03/20/2015] [Indexed: 12/28/2022]
Abstract
Plasma protein binding (PPB) strongly affects drug distribution and pharmacokinetic behavior with consequences in overall pharmacological action. Extended plasma protein binding may be associated with drug safety issues and several adverse effects, like low clearance, low brain penetration, drug-drug interactions, loss of efficacy, while influencing the fate of enantiomers and diastereoisomers by stereoselective binding within the body. Therefore in holistic drug design approaches, where ADME(T) properties are considered in parallel with target affinity, considerable efforts are focused in early estimation of PPB mainly in regard to human serum albumin (HSA), which is the most abundant and most important plasma protein. The second critical serum protein α1-acid glycoprotein (AGP), although often underscored, plays also an important and complicated role in clinical therapy and thus the last years it has been studied thoroughly too. In the present review, after an overview of the principles of HSA and AGP binding as well as the structure topology of the proteins, the current trends and perspectives in the field of PPB predictions are presented and discussed considering both HSA and AGP binding. Since however for the latter protein systematic studies have started only the last years, the review focuses mainly to HSA. One part of the review highlights the challenge to develop rapid techniques for HSA and AGP binding simulation and their performance in assessment of PPB. The second part focuses on in silico approaches to predict HSA and AGP binding, analyzing and evaluating structure-based and ligand-based methods, as well as combination of both methods in the aim to exploit the different information and overcome the limitations of each individual approach. Ligand-based methods use the Quantitative Structure-Activity Relationships (QSAR) methodology to establish quantitate models for the prediction of binding constants from molecular descriptors, while they provide only indirect information on binding mechanism. Efforts for the establishment of global models, automated workflows and web-based platforms for PPB predictions are presented and discussed. Structure-based methods relying on the crystal structures of drug-protein complexes provide detailed information on the underlying mechanism but are usually restricted to specific compounds. They are useful to identify the specific binding site while they may be important in investigating drug-drug interactions, related to PPB. Moreover, chemometrics or structure-based modeling may be supported by experimental data a promising integrated alternative strategy for ADME(T) properties optimization. In the case of PPB the use of molecular modeling combined with bioanalytical techniques is frequently used for the investigation of AGP binding.
Collapse
|
6
|
Nantasenamat C, Worachartcheewan A, Jamsak S, Preeyanon L, Shoombuatong W, Simeon S, Mandi P, Isarankura-Na-Ayudhya C, Prachayasittikul V. AutoWeka: toward an automated data mining software for QSAR and QSPR studies. Methods Mol Biol 2015; 1260:119-47. [PMID: 25502379 DOI: 10.1007/978-1-4939-2239-0_8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
UNLABELLED In biology and chemistry, a key goal is to discover novel compounds affording potent biological activity or chemical properties. This could be achieved through a chemical intuition-driven trial-and-error process or via data-driven predictive modeling. The latter is based on the concept of quantitative structure-activity/property relationship (QSAR/QSPR) when applied in modeling the biological activity and chemical properties, respectively, of compounds. Data mining is a powerful technology underlying QSAR/QSPR as it harnesses knowledge from large volumes of high-dimensional data via multivariate analysis. Although extremely useful, the technicalities of data mining may overwhelm potential users, especially those in the life sciences. Herein, we aim to lower the barriers to access and utilization of data mining software for QSAR/QSPR studies. AutoWeka is an automated data mining software tool that is powered by the widely used machine learning package Weka. The software provides a user-friendly graphical interface along with an automated parameter search capability. It employs two robust and popular machine learning methods: artificial neural networks and support vector machines. This chapter describes the practical usage of AutoWeka and relevant tools in the development of predictive QSAR/QSPR models. AVAILABILITY The software is freely available at http://www.mt.mahidol.ac.th/autoweka.
Collapse
Affiliation(s)
- Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand,
| | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Lewis RA, Wood D. Modern 2D QSAR for drug discovery. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2014. [DOI: 10.1002/wcms.1187] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Affiliation(s)
- Richard A. Lewis
- Novartis Institutes for BioMedical Research; Novartis Pharma AG; Basel Switzerland
| | - David Wood
- Novartis Institutes for BioMedical Research; Novartis Horsham Research Centre; Horsham UK
| |
Collapse
|
8
|
Leahy DE, Sykora V. Automation of decision making in drug design. DRUG DISCOVERY TODAY. TECHNOLOGIES 2014; 10:e437-41. [PMID: 24179997 DOI: 10.1016/j.ddtec.2013.02.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
9
|
Palczewska A, Fu X, Trundle P, Yang L, Neagu D, Ridley M, Travis K. Towards model governance in predictive toxicology. INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT 2013. [DOI: 10.1016/j.ijinfomgt.2013.02.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
10
|
Cox R, Green DVS, Luscombe CN, Malcolm N, Pickett SD. QSAR workbench: automating QSAR modeling to drive compound design. J Comput Aided Mol Des 2013; 27:321-36. [PMID: 23615761 PMCID: PMC3657086 DOI: 10.1007/s10822-013-9648-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2013] [Accepted: 04/15/2013] [Indexed: 12/02/2022]
Abstract
We describe the QSAR Workbench, a system for the building and analysis of QSAR models. The system is built around the Pipeline Pilot workflow tool and provides access to a variety of model building algorithms for both continuous and categorical data. Traditionally models are built on a one by one basis and fully exploring the model space of algorithms and descriptor subsets is a time consuming basis. The QSAR Workbench provides a framework to allow for multiple models to be built over a number of modeling algorithms, descriptor combinations and data splits (training and test sets). Methods to analyze and compare models are provided, enabling the user to select the most appropriate model. The Workbench provides a consistent set of routines for data preparation and chemistry normalization that are also applied for predictions. The Workbench provides a large degree of automation with the ability to publish preconfigured model building workflows for a variety of problem domains, whilst providing experienced users full access to the underlying parameterization if required. Methods are provided to allow for publication of selected models as web services, thus providing integration with the chemistry desktop. We describe the design and implementation of the QSAR Workbench and demonstrate its utility through application to two public domain datasets.
Collapse
Affiliation(s)
- Richard Cox
- Accelrys Ltd., 334 Cambridge Science Park, Cambridge, CB4 0WN, UK
| | | | | | | | | |
Collapse
|
11
|
Palczewska A, Neagu D, Ridley M. Using Pareto points for model identification in predictive toxicology. J Cheminform 2013; 5:16. [PMID: 23517649 PMCID: PMC3693991 DOI: 10.1186/1758-2946-5-16] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2012] [Accepted: 02/27/2013] [Indexed: 11/22/2022] Open
Abstract
: Predictive toxicology is concerned with the development of models that are able to predict the toxicity of chemicals. A reliable prediction of toxic effects of chemicals in living systems is highly desirable in cosmetics, drug design or food protection to speed up the process of chemical compound discovery while reducing the need for lab tests. There is an extensive literature associated with the best practice of model generation and data integration but management and automated identification of relevant models from available collections of models is still an open problem. Currently, the decision on which model should be used for a new chemical compound is left to users. This paper intends to initiate the discussion on automated model identification. We present an algorithm, based on Pareto optimality, which mines model collections and identifies a model that offers a reliable prediction for a new chemical compound. The performance of this new approach is verified for two endpoints: IGC50 and LogP. The results show a great potential for automated model identification methods in predictive toxicology.
Collapse
Affiliation(s)
- Anna Palczewska
- Department of Computing, University of Bradford, Richmond Road, Bradford, BD7 1DP, UK
| | - Daniel Neagu
- Department of Computing, University of Bradford, Richmond Road, Bradford, BD7 1DP, UK
| | - Mick Ridley
- Department of Computing, University of Bradford, Richmond Road, Bradford, BD7 1DP, UK
| |
Collapse
|
12
|
Vallianatou T, Lambrinidis G, Tsantili-Kakoulidou A. In silicoprediction of human serum albumin binding for drug leads. Expert Opin Drug Discov 2013; 8:583-95. [DOI: 10.1517/17460441.2013.777424] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
13
|
Davis AM, Wood DJ. Quantitative Structure–Activity Relationship Models That Stand the Test of Time. Mol Pharm 2013; 10:1183-90. [DOI: 10.1021/mp300466n] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Affiliation(s)
- Andrew M. Davis
- AstraZeneca R&D Mölndal, Pepparedsleden 1, Mölndal, 431 83 Sweden
| | - David J. Wood
- AstraZeneca R&D Alderley Park, Alderley Edge, Cheshire, United Kingdom
| |
Collapse
|
14
|
Hiden H, Woodman S, Watson P, Cala J. Developing cloud applications using the e-Science Central platform. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2013; 371:20120085. [PMID: 23230161 PMCID: PMC3538293 DOI: 10.1098/rsta.2012.0085] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
This paper describes the e-Science Central (e-SC) cloud data processing system and its application to a number of e-Science projects. e-SC provides both software as a service (SaaS) and platform as a service for scientific data management, analysis and collaboration. It is a portable system and can be deployed on both private (e.g. Eucalyptus) and public clouds (Amazon AWS and Microsoft Windows Azure). The SaaS application allows scientists to upload data, edit and run workflows and share results in the cloud, using only a Web browser. It is underpinned by a scalable cloud platform consisting of a set of components designed to support the needs of scientists. The platform is exposed to developers so that they can easily upload their own analysis services into the system and make these available to other users. A representational state transfer-based application programming interface (API) is also provided so that external applications can leverage the platform's functionality, making it easier to build scalable, secure cloud-based applications. This paper describes the design of e-SC, its API and its use in three different case studies: spectral data visualization, medical data capture and analysis, and chemical property prediction.
Collapse
|
15
|
Wood DJ, Buttar D, Cumming JG, Davis AM, Norinder U, Rodgers SL. Automated QSAR with a Hierarchy of Global and Local Models. Mol Inform 2011; 30:960-72. [DOI: 10.1002/minf.201100107] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2011] [Accepted: 10/13/2011] [Indexed: 11/06/2022]
|
16
|
Marchant CA. Computational toxicology: a tool for all industries. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2011. [DOI: 10.1002/wcms.100] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Carol A. Marchant
- Lhasa Limited, 22‐23 Blenheim Terrace, Woodhouse Lane, Leeds LS2 9HD, UK
| |
Collapse
|
17
|
Shim J, MacKerell AD. Computational ligand-based rational design: Role of conformational sampling and force fields in model development. MEDCHEMCOMM 2011; 2:356-370. [PMID: 21716805 PMCID: PMC3123535 DOI: 10.1039/c1md00044f] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
A significant number of drug discovery efforts are based on natural products or high throughput screens from which compounds showing potential therapeutic effects are identified without knowledge of the target molecule or its 3D structure. In such cases computational ligand-based drug design (LBDD) can accelerate the drug discovery processes. LBDD is a general approach to elucidate the relationship of a compound's structure and physicochemical attributes to its biological activity. The resulting structure-activity relationship (SAR) may then act as the basis for the prediction of compounds with improved biological attributes. LBDD methods range from pharmacophore models identifying essential features of ligands responsible for their activity, quantitative structure-activity relationships (QSAR) yielding quantitative estimates of activities based on physiochemical properties, and to similarity searching, which explores compounds with similar properties as well as various combinations of the above. A number of recent LBDD approaches involve the use of multiple conformations of the ligands being studied. One of the basic components to generate multiple conformations in LBDD is molecular mechanics (MM), which apply an empirical energy function to relate conformation to energies and forces. The collection of conformations for ligands is then combined with functional data using methods ranging from regression analysis to neural networks, from which the SAR is determined. Accordingly, for effective application of LBDD for SAR determinations it is important that the compounds be accurately modelled such that the appropriate range of conformations accessible to the ligands is identified. Such accurate modelling is largely based on use of the appropriate empirical force field for the molecules being investigated and the approaches used to generate the conformations. The present chapter includes a brief overview of currently used SAR methods in LBDD followed by a more detailed presentation of issues and limitations associated with empirical energy functions and conformational sampling methods.
Collapse
|
18
|
Rodgers SL, Davis AM, Tomkinson NP, van de Waterbeemd H. Predictivity of Simulated ADME AutoQSAR Models over Time. Mol Inform 2011; 30:256-66. [DOI: 10.1002/minf.201000160] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2010] [Accepted: 01/24/2011] [Indexed: 11/08/2022]
|
19
|
Filipovic N, Ivanovic M, Krstajic D, Kojic M. Hemodynamic Flow Modeling Through an Abdominal Aorta Aneurysm Using Data Mining Tools. ACTA ACUST UNITED AC 2011; 15:189-94. [DOI: 10.1109/titb.2010.2096541] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
20
|
Targeting drug transporters - combining in silico and in vitro approaches to predict in vivo. Methods Mol Biol 2010; 637:65-103. [PMID: 20419430 DOI: 10.1007/978-1-60761-700-6_4] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Transporter proteins are expressed throughout the human body in different vital organs. They play an important role to various extents in determining absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties of therapeutic molecules. Over the past decade, numerous drug transporters have been cloned and considerable progress has been made toward understanding the molecular characteristics of individual transporters. In this chapter several in vitro and in silico techniques are described with applications to understand transporter behavior. These include employing new techniques to rapidly identify novel ligands for transporters. Ultimately these methods should lead to a greater overall appreciation of the role of transporters in vivo.
Collapse
|
21
|
Sakiyama Y. The use of machine learning and nonlinear statistical tools for ADME prediction. Expert Opin Drug Metab Toxicol 2010; 5:149-69. [PMID: 19239395 DOI: 10.1517/17425250902753261] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Absorption, distribution, metabolism and excretion (ADME)-related failure of drug candidates is a major issue for the pharmaceutical industry today. Prediction of ADME by in silico tools has now become an inevitable paradigm to reduce cost and enhance efficiency in pharmaceutical research. Recently, machine learning as well as nonlinear statistical tools has been widely applied to predict routine ADME end points. To achieve accurate and reliable predictions, it would be a prerequisite to understand the concepts, mechanisms and limitations of these tools. Here, we have devised a small synthetic nonlinear data set to help understand the mechanism of machine learning by 2D-visualisation. We applied six new machine learning methods to four different data sets. The methods include Naive Bayes classifier, classification and regression tree, random forest, Gaussian process, support vector machine and k nearest neighbour. The results demonstrated that ensemble learning and kernel machine displayed greater accuracy of prediction than classical methods irrespective of the data set size. The importance of interaction with the engineering field is also addressed. The results described here provide insights into the mechanism of machine learning, which will enable appropriate usage in the future.
Collapse
Affiliation(s)
- Yojiro Sakiyama
- Pharmacokinetics Dynamics Metabolism, Pfizer Global Research and Development, Sandwich Laboratories, Kent, UK.
| |
Collapse
|
22
|
Demel MA, Kraemer O, Ettmayer P, Haaksma E, Ecker GF. Ensemble Rule-Based Classification of Substrates of the Human ABC-Transporter ABCB1 Using Simple Physicochemical Descriptors. Mol Inform 2010; 29:233-42. [DOI: 10.1002/minf.200900079] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2009] [Accepted: 02/12/2010] [Indexed: 11/09/2022]
|
23
|
Zientek M, Stoner C, Ayscue R, Klug-McLeod J, Jiang Y, West M, Collins C, Ekins S. Integrated in Silico−in Vitro Strategy for Addressing Cytochrome P450 3A4 Time-Dependent Inhibition. Chem Res Toxicol 2010; 23:664-76. [DOI: 10.1021/tx900417f] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Affiliation(s)
- Michael Zientek
- Dynamics & Drug Metabolism, Pharmacokinetics, Pfizer Global Research & Development, San Diego California, Groton, Connecticut, and Sandwich, United Kingdom, Computational Center of Emphasis, Pfizer, Groton, Connecticut, Arnold Consultancy and Technology LLC, 5 Penn Plaza, 19th Floor, New York, New York 10119, Department of Pharmaceutical Sciences, University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, and Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey,
| | - Chad Stoner
- Dynamics & Drug Metabolism, Pharmacokinetics, Pfizer Global Research & Development, San Diego California, Groton, Connecticut, and Sandwich, United Kingdom, Computational Center of Emphasis, Pfizer, Groton, Connecticut, Arnold Consultancy and Technology LLC, 5 Penn Plaza, 19th Floor, New York, New York 10119, Department of Pharmaceutical Sciences, University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, and Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey,
| | - Robyn Ayscue
- Dynamics & Drug Metabolism, Pharmacokinetics, Pfizer Global Research & Development, San Diego California, Groton, Connecticut, and Sandwich, United Kingdom, Computational Center of Emphasis, Pfizer, Groton, Connecticut, Arnold Consultancy and Technology LLC, 5 Penn Plaza, 19th Floor, New York, New York 10119, Department of Pharmaceutical Sciences, University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, and Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey,
| | - Jacquelyn Klug-McLeod
- Dynamics & Drug Metabolism, Pharmacokinetics, Pfizer Global Research & Development, San Diego California, Groton, Connecticut, and Sandwich, United Kingdom, Computational Center of Emphasis, Pfizer, Groton, Connecticut, Arnold Consultancy and Technology LLC, 5 Penn Plaza, 19th Floor, New York, New York 10119, Department of Pharmaceutical Sciences, University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, and Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey,
| | - Ying Jiang
- Dynamics & Drug Metabolism, Pharmacokinetics, Pfizer Global Research & Development, San Diego California, Groton, Connecticut, and Sandwich, United Kingdom, Computational Center of Emphasis, Pfizer, Groton, Connecticut, Arnold Consultancy and Technology LLC, 5 Penn Plaza, 19th Floor, New York, New York 10119, Department of Pharmaceutical Sciences, University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, and Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey,
| | - Michael West
- Dynamics & Drug Metabolism, Pharmacokinetics, Pfizer Global Research & Development, San Diego California, Groton, Connecticut, and Sandwich, United Kingdom, Computational Center of Emphasis, Pfizer, Groton, Connecticut, Arnold Consultancy and Technology LLC, 5 Penn Plaza, 19th Floor, New York, New York 10119, Department of Pharmaceutical Sciences, University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, and Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey,
| | - Claire Collins
- Dynamics & Drug Metabolism, Pharmacokinetics, Pfizer Global Research & Development, San Diego California, Groton, Connecticut, and Sandwich, United Kingdom, Computational Center of Emphasis, Pfizer, Groton, Connecticut, Arnold Consultancy and Technology LLC, 5 Penn Plaza, 19th Floor, New York, New York 10119, Department of Pharmaceutical Sciences, University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, and Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey,
| | - Sean Ekins
- Dynamics & Drug Metabolism, Pharmacokinetics, Pfizer Global Research & Development, San Diego California, Groton, Connecticut, and Sandwich, United Kingdom, Computational Center of Emphasis, Pfizer, Groton, Connecticut, Arnold Consultancy and Technology LLC, 5 Penn Plaza, 19th Floor, New York, New York 10119, Department of Pharmaceutical Sciences, University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, and Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey,
| |
Collapse
|
24
|
Ekins S, Williams AJ. Precompetitive preclinical ADME/Tox data: set it free on the web to facilitate computational model building and assist drug development. LAB ON A CHIP 2010; 10:13-22. [PMID: 20024044 DOI: 10.1039/b917760b] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Web-based technologies coupled with a drive for improved communication between scientists have resulted in the proliferation of scientific opinion, data and knowledge at an ever-increasing rate. The increasing array of chemistry-related computer-based resources now available provides chemists with a direct path to the discovery of information, once previously accessed via library services and limited to commercial and costly resources. We propose that preclinical absorption, distribution, metabolism, excretion and toxicity data as well as pharmacokinetic properties from studies published in the literature (which use animal or human tissues in vitro or from in vivo studies) are precompetitive in nature and should be freely available on the web. This could be made possible by curating the literature and patents, data donations from pharmaceutical companies and by expanding the currently freely available ChemSpider database of over 21 million molecules with physicochemical properties. This will require linkage to PubMed, PubChem and Wikipedia as well as other frequently used public databases that are currently used, mining the full text publications to extract the pertinent experimental data. These data will need to be extracted using automated and manual methods, cleaned and then published to the ChemSpider or other database such that it will be freely available to the biomedical research and clinical communities. The value of the data being accessible will improve development of drug molecules with good ADME/Tox properties, facilitate computational model building for these properties and enable researchers to not repeat the failures of past drug discovery studies.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, Jenkintown, PA 19046, USA.
| | | |
Collapse
|
25
|
Gedeck P, Kramer C, Ertl P. Computational analysis of structure-activity relationships. PROGRESS IN MEDICINAL CHEMISTRY 2010; 49:113-60. [PMID: 20855040 DOI: 10.1016/s0079-6468(10)49004-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Peter Gedeck
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Forum 1, Novartis Campus, CH-4056 Basel, Switzerland
| | | | | |
Collapse
|
26
|
|
27
|
vanâ
deâ
Waterbeemd H. Improving Compound Quality throughin vitroandin silicoPhysicochemical Profiling. Chem Biodivers 2009; 6:1760-6. [DOI: 10.1002/cbdv.200900056] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
28
|
Eklund M, Spjuth O, Wikberg JE. The C1C2: a framework for simultaneous model selection and assessment. BMC Bioinformatics 2008; 9:360. [PMID: 18761753 PMCID: PMC2556350 DOI: 10.1186/1471-2105-9-360] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2008] [Accepted: 09/02/2008] [Indexed: 11/12/2022] Open
Abstract
Background There has been recent concern regarding the inability of predictive modeling approaches to generalize to new data. Some of the problems can be attributed to improper methods for model selection and assessment. Here, we have addressed this issue by introducing a novel and general framework, the C1C2, for simultaneous model selection and assessment. The framework relies on a partitioning of the data in order to separate model choice from model assessment in terms of used data. Since the number of conceivable models in general is vast, it was also of interest to investigate the employment of two automatic search methods, a genetic algorithm and a brute-force method, for model choice. As a demonstration, the C1C2 was applied to simulated and real-world datasets. A penalized linear model was assumed to reasonably approximate the true relation between the dependent and independent variables, thus reducing the model choice problem to a matter of variable selection and choice of penalizing parameter. We also studied the impact of assuming prior knowledge about the number of relevant variables on model choice and generalization error estimates. The results obtained with the C1C2 were compared to those obtained by employing repeated K-fold cross-validation for choosing and assessing a model. Results The C1C2 framework performed well at finding the true model in terms of choosing the correct variable subset and producing reasonable choices for the penalizing parameter, even in situations when the independent variables were highly correlated and when the number of observations was less than the number of variables. The C1C2 framework was also found to give accurate estimates of the generalization error. Prior information about the number of important independent variables improved the variable subset choice but reduced the accuracy of generalization error estimates. Using the genetic algorithm worsened the model choice but not the generalization error estimates, compared to using the brute-force method. The results obtained with repeated K-fold cross-validation were similar to those produced by the C1C2 in terms of model choice, however a lower accuracy of the generalization error estimates was observed. Conclusion The C1C2 framework was demonstrated to work well for finding the true model within a penalized linear model class and accurately assess its generalization error, even for datasets with many highly correlated independent variables, a low observation-to-variable ratio, and model assumption deviations. A complete separation of the model choice and the model assessment in terms of data used for each task improves the estimates of the generalization error.
Collapse
Affiliation(s)
- Martin Eklund
- Department of Pharmaceutical Pharmacology, Uppsala University, Box 591, BMC, SE-751 24 Uppsala, Sweden.
| | | | | |
Collapse
|
29
|
|
30
|
Automatic QSAR modeling of ADME properties: blood-brain barrier penetration and aqueous solubility. J Comput Aided Mol Des 2008; 22:431-40. [PMID: 18273554 DOI: 10.1007/s10822-008-9193-8] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2007] [Accepted: 01/30/2008] [Indexed: 10/22/2022]
Abstract
In this article, we present an automatic model generation process for building QSAR models using Gaussian Processes, a powerful machine learning modeling method. We describe the stages of the process that ensure models are built and validated within a rigorous framework: descriptor calculation, splitting data into training, validation and test sets, descriptor filtering, application of modeling techniques and selection of the best model. We apply this automatic process to data sets of blood-brain barrier penetration and aqueous solubility and compare the resulting automatically generated models with 'manually' built models using external test sets. The results demonstrate the effectiveness of the automatic model generation process for two types of data sets commonly encountered in building ADME QSAR models, a small set of in vivo data and a large set of physico-chemical data.
Collapse
|
31
|
|
32
|
Rodgers SL, Davis AM, Tomkinson NP, van de Waterbeemd H. QSAR Modeling Using Automatically Updating Correction Libraries: Application to a Human Plasma Protein Binding Model. J Chem Inf Model 2007; 47:2401-7. [PMID: 17887744 DOI: 10.1021/ci700197x] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
It is assumed that compounds occupying the same region of model space will be subject to similar errors in prediction, and hence, where these errors are known, they can be applied to predictions. Thus, any available measured data can be used to refine predictions of query compounds. This study describes the application of a correction library to a human plasma protein binding model. Compounds that have been measured since the model was built are entered into the library to improve predictions of current compounds. Time-series simulations were conducted to measure the time dependence of the correction library. This study demonstrates significant improvements in predictions where a library is applied, compared with both a static model and an updating model that includes recently measured data.
Collapse
Affiliation(s)
- Sarah L Rodgers
- AstraZeneca R&D Charnwood, Bakewell Road, Loughborough, Leicestershire, United Kingdom.
| | | | | | | |
Collapse
|
33
|
Obrezanova O, Csanyi G, Gola JMR, Segall MD. Gaussian Processes: A Method for Automatic QSAR Modeling of ADME Properties. J Chem Inf Model 2007; 47:1847-57. [PMID: 17602549 DOI: 10.1021/ci7000633] [Citation(s) in RCA: 134] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In this article, we discuss the application of the Gaussian Process method for the prediction of absorption, distribution, metabolism, and excretion (ADME) properties. On the basis of a Bayesian probabilistic approach, the method is widely used in the field of machine learning but has rarely been applied in quantitative structure-activity relationship and ADME modeling. The method is suitable for modeling nonlinear relationships, does not require subjective determination of the model parameters, works for a large number of descriptors, and is inherently resistant to overtraining. The performance of Gaussian Processes compares well with and often exceeds that of artificial neural networks. Due to these features, the Gaussian Processes technique is eminently suitable for automatic model generation-one of the demands of modern drug discovery. Here, we describe the basic concept of the method in the context of regression problems and illustrate its application to the modeling of several ADME properties: blood-brain barrier, hERG inhibition, and aqueous solubility at pH 7.4. We also compare Gaussian Processes with other modeling techniques.
Collapse
Affiliation(s)
- Olga Obrezanova
- BioFocus DPI, 127 Cambridge Science Park, Milton Road, Cambridge, CB4 0GD, United Kingdom.
| | | | | | | |
Collapse
|
34
|
Rodgers S, Davis A, van de Waterbeemd H. Time-Series QSAR Analysis of Human Plasma Protein Binding Data. ACTA ACUST UNITED AC 2007. [DOI: 10.1002/qsar.200630114] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
35
|
Gola J, Obrezanova O, Champness E, Segall M. ADMET Property Prediction: The State of the Art and Current Challenges. ACTA ACUST UNITED AC 2006. [DOI: 10.1002/qsar.200610093] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
36
|
Glen R, Adams S. Similarity Metrics and Descriptor Spaces – Which Combinations to Choose? ACTA ACUST UNITED AC 2006. [DOI: 10.1002/qsar.200610097] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|