1
|
Mao J, Akhtar J, Zhang X, Sun L, Guan S, Li X, Chen G, Liu J, Jeon HN, Kim MS, No KT, Wang G. Comprehensive strategies of machine-learning-based quantitative structure-activity relationship models. iScience 2021; 24:103052. [PMID: 34553136 PMCID: PMC8441174 DOI: 10.1016/j.isci.2021.103052] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Early quantitative structure-activity relationship (QSAR) technologies have unsatisfactory versatility and accuracy in fields such as drug discovery because they are based on traditional machine learning and interpretive expert features. The development of Big Data and deep learning technologies significantly improve the processing of unstructured data and unleash the great potential of QSAR. Here we discuss the integration of wet experiments (which provide experimental data and reliable verification), molecular dynamics simulation (which provides mechanistic interpretation at the atomic/molecular levels), and machine learning (including deep learning) techniques to improve QSAR models. We first review the history of traditional QSAR and point out its problems. We then propose a better QSAR model characterized by a new iterative framework to integrate machine learning with disparate data input. Finally, we discuss the application of QSAR and machine learning to many practical research fields, including drug development and clinical trials.
Collapse
Affiliation(s)
- Jiashun Mao
- The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China
- Guangdong Provincial Key Laboratory of Computational Science and Material Design, Shenzhen, Guangdong 518055 China
| | - Javed Akhtar
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China
- Guangdong Provincial Key Laboratory of Cell Microenvironment and Disease Research, Shenzhen, Guangdong 518055, China
| | - Xiao Zhang
- Shanghai Rural Commercial Bank Co., Ltd, Shanghai 200002, China
| | - Liang Sun
- Department of Physics, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China
| | - Shenghui Guan
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China
- Guangdong Provincial Key Laboratory of Computational Science and Material Design, Shenzhen, Guangdong 518055 China
| | - Xinyu Li
- School of Life and Health Sciences and Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Guangming Chen
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China
- Guangdong Provincial Key Laboratory of Cell Microenvironment and Disease Research, Shenzhen, Guangdong 518055, China
| | - Jiaxin Liu
- Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
| | - Hyeon-Nae Jeon
- Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
| | - Min Sung Kim
- Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
| | - Kyoung Tai No
- The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Guanyu Wang
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China
- Guangdong Provincial Key Laboratory of Computational Science and Material Design, Shenzhen, Guangdong 518055 China
- Guangdong Provincial Key Laboratory of Cell Microenvironment and Disease Research, Shenzhen, Guangdong 518055, China
| |
Collapse
|
2
|
Hughes TB, Dang NL, Kumar A, Flynn NR, Swamidass SJ. Metabolic Forest: Predicting the Diverse Structures of Drug Metabolites. J Chem Inf Model 2020; 60:4702-4716. [PMID: 32881497 PMCID: PMC8716321 DOI: 10.1021/acs.jcim.0c00360] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Adverse drug metabolism often severely impacts patient morbidity and mortality. Unfortunately, drug metabolism experimental assays are costly, inefficient, and slow. Instead, computational modeling could rapidly flag potentially toxic molecules across thousands of candidates in the early stages of drug development. Most metabolism models focus on predicting sites of metabolism (SOMs): the specific substrate atoms targeted by metabolic enzymes. However, SOMs are merely a proxy for metabolic structures: knowledge of an SOM does not explicitly provide the actual metabolite structure. Without an explicit metabolite structure, computational systems cannot evaluate the new molecule's properties. For example, the metabolite's reactivity cannot be automatically predicted, a crucial limitation because reactive drug metabolites are a key driver of adverse drug reactions (ADRs). Additionally, further metabolic events cannot be forecast, even though the metabolic path of the majority of substrates includes two or more sequential steps. To overcome the myopia of the SOM paradigm, this study constructs a well-defined system-termed the metabolic forest-for generating exact metabolite structures. We validate the metabolic forest with the substrate and product structures from a large, chemically diverse, literature-derived dataset of 20 736 records. The metabolic forest finds a pathway linking each substrate and product for 79.42% of these records. By performing a breadth-first search of depth two or three, we improve performance to 88.43 and 88.77%, respectively. The metabolic forest includes a specialized algorithm for producing accurate quinone structures, the most common type of reactive metabolite. To our knowledge, this quinone structure algorithm is the first of its kind, as the diverse mechanisms of quinone formation are difficult to systematically reproduce. We validate the metabolic forest on a previously published dataset of 576 quinone reactions, predicting their structures with a depth three performance of 91.84%. The metabolic forest accurately enumerates metabolite structures, enabling promising new directions such as joint metabolism and reactivity modeling.
Collapse
Affiliation(s)
- Tyler B Hughes
- Department of Pathology and Immunology, Washington University School of Medicine, Campus Box 8118, 660 South Euclid Avenue, St. Louis, Missouri 63110, United States
| | - Na Le Dang
- Department of Pathology and Immunology, Washington University School of Medicine, Campus Box 8118, 660 South Euclid Avenue, St. Louis, Missouri 63110, United States
| | - Ayush Kumar
- Department of Pathology and Immunology, Washington University School of Medicine, Campus Box 8118, 660 South Euclid Avenue, St. Louis, Missouri 63110, United States
| | - Noah R Flynn
- Department of Pathology and Immunology, Washington University School of Medicine, Campus Box 8118, 660 South Euclid Avenue, St. Louis, Missouri 63110, United States
| | - S Joshua Swamidass
- Department of Pathology and Immunology, Washington University School of Medicine, Campus Box 8118, 660 South Euclid Avenue, St. Louis, Missouri 63110, United States
| |
Collapse
|
3
|
Hwang S, Shin HK, Shin SE, Seo M, Jeon HN, Yim DE, Kim DH, No KT. PreMetabo: An in silico phase I and II drug metabolism prediction platform. Drug Metab Pharmacokinet 2020; 35:361-367. [PMID: 32616370 DOI: 10.1016/j.dmpk.2020.05.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Revised: 03/23/2020] [Accepted: 05/18/2020] [Indexed: 10/24/2022]
Abstract
This study aimed to develop a drug metabolism prediction platform using knowledge-based prediction models. Site of Metabolism (SOM) prediction models for four cytochrome P450 (CYP) subtypes were developed along with uridine 5'-diphosphoglucuronosyltransferase (UGT) and sulfotransferase (SULT) substrate classification models. The SOM substrate for a certain CYP was determined using the sum of the activation energy required for the reaction at the reaction site of the substrate and the binding energy of the substrate to the CYP enzyme. Activation energy was calculated using the EaMEAD model and binding energy was calculated by docking simulation. Phase II prediction models were developed to predict whether a molecule is the substrate of a certain phase II conjugate protein, i.e., UGT or SULT. Using SOM prediction models, the predictability of the major metabolite in the top-3 was obtained as 72.5-84.5% for four CYPs, respectively. For internal validation, the accuracy of the UGT and SULT substrate classification model was obtained as 93.94% and 80.68%, respectively. Additionally, for external validation, the accuracy of the UGT substrate classification model was obtained as 81% in the case of 11 FDA-approved drugs. PreMetabo is implemented in a web environment and is available at https://premetabo.bmdrc.kr/.
Collapse
Affiliation(s)
- Sungbo Hwang
- Department of Biotechnology, Yonsei University, Seoul 120-479, Republic of Korea
| | - Hyun Kil Shin
- Toxicoinformatics Group, Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon 34114, Republic of Korea
| | - Seong Eun Shin
- Bioinformatics & Molecular Design Research Center, Seoul 120-749, Republic of Korea
| | - Myungwon Seo
- Department of Biotechnology, Yonsei University, Seoul 120-479, Republic of Korea
| | - Hyeon-Nae Jeon
- Department of Biotechnology, Yonsei University, Seoul 120-479, Republic of Korea
| | - Da-Eun Yim
- Department of Pharmacology and Pharmacogenomics Research Center, Inje University, College of Medicine, Busan, Republic of Korea
| | - Dong-Hyun Kim
- Department of Pharmacology and Pharmacogenomics Research Center, Inje University, College of Medicine, Busan, Republic of Korea
| | - Kyoung Tai No
- Department of Biotechnology, Yonsei University, Seoul 120-479, Republic of Korea; Bioinformatics & Molecular Design Research Center, Seoul 120-749, Republic of Korea.
| |
Collapse
|
4
|
Luirink RA, Verkade‐Vreeker MCA, Commandeur JNM, Geerke DP. A Modified Arrhenius Approach to Thermodynamically Study Regioselectivity in Cytochrome P450-Catalyzed Substrate Conversion. Chembiochem 2020; 21:1461-1472. [PMID: 31919943 PMCID: PMC7318578 DOI: 10.1002/cbic.201900751] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Indexed: 12/21/2022]
Abstract
The regio- (and stereo-)selectivity and specific activity of cytochrome P450s are determined by the accessibility of potential sites of metabolism (SOMs) of the bound substrate relative to the heme, and the activation barrier of the regioselective oxidation reaction(s). The accessibility of potential SOMs depends on the relative binding free energy (ΔΔGbind ) of the catalytically active substrate-binding poses, and the probability of the substrate to adopt a transition-state geometry. An established experimental method to measure activation energies of enzymatic reactions is the analysis of reaction rate constants at different temperatures and the construction of Arrhenius plots. This is a challenge for multistep P450-catalyzed processes that involve redox partners. We introduce a modified Arrhenius approach to overcome the limitations in studying P450 selectivity, which can be applied in multiproduct enzyme catalysis. Our approach gives combined information on relative activation energies, ΔΔGbind values, and collision entropies, yielding direct insight into the basis of selectivity in substrate conversion.
Collapse
Affiliation(s)
- Rosa A. Luirink
- AIMMS Division of Molecular ToxicologyVrije UniversiteitDe Boelelaan 11081081 HZAmsterdamThe Netherlands
| | | | - Jan N. M. Commandeur
- AIMMS Division of Molecular ToxicologyVrije UniversiteitDe Boelelaan 11081081 HZAmsterdamThe Netherlands
| | - Daan P. Geerke
- AIMMS Division of Molecular ToxicologyVrije UniversiteitDe Boelelaan 11081081 HZAmsterdamThe Netherlands
| |
Collapse
|
5
|
Hughes TB, Swamidass SJ. Deep Learning to Predict the Formation of Quinone Species in Drug Metabolism. Chem Res Toxicol 2017; 30:642-656. [PMID: 28099803 DOI: 10.1021/acs.chemrestox.6b00385] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Many adverse drug reactions are thought to be caused by electrophilically reactive drug metabolites that conjugate to nucleophilic sites within DNA and proteins, causing cancer or toxic immune responses. Quinone species, including quinone-imines, quinone-methides, and imine-methides, are electrophilic Michael acceptors that are often highly reactive and comprise over 40% of all known reactive metabolites. Quinone metabolites are created by cytochromes P450 and peroxidases. For example, cytochromes P450 oxidize acetaminophen to N-acetyl-p-benzoquinone imine, which is electrophilically reactive and covalently binds to nucleophilic sites within proteins. This reactive quinone metabolite elicits a toxic immune response when acetaminophen exceeds a safe dose. Using a deep learning approach, this study reports the first published method for predicting quinone formation: the formation of a quinone species by metabolic oxidation. We model both one- and two-step quinone formation, enabling accurate quinone formation predictions in nonobvious cases. We predict atom pairs that form quinones with an AUC accuracy of 97.6%, and we identify molecules that form quinones with 88.2% AUC. By modeling the formation of quinones, one of the most common types of reactive metabolites, our method provides a rapid screening tool for a key drug toxicity risk. The XenoSite quinone formation model is available at http://swami.wustl.edu/xenosite/p/quinone .
Collapse
Affiliation(s)
- Tyler B Hughes
- Department of Pathology and Immunology, Washington University School of Medicine , Campus Box 8118, 660 S. Euclid Avenue, St. Louis, Missouri 63110, United States
| | - S Joshua Swamidass
- Department of Pathology and Immunology, Washington University School of Medicine , Campus Box 8118, 660 S. Euclid Avenue, St. Louis, Missouri 63110, United States
| |
Collapse
|
6
|
Hughes T, Dang NL, Miller GP, Swamidass SJ. Modeling Reactivity to Biological Macromolecules with a Deep Multitask Network. ACS CENTRAL SCIENCE 2016; 2:529-37. [PMID: 27610414 PMCID: PMC4999971 DOI: 10.1021/acscentsci.6b00162] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Indexed: 05/14/2023]
Abstract
Most small-molecule drug candidates fail before entering the market, frequently because of unexpected toxicity. Often, toxicity is detected only late in drug development, because many types of toxicities, especially idiosyncratic adverse drug reactions (IADRs), are particularly hard to predict and detect. Moreover, drug-induced liver injury (DILI) is the most frequent reason drugs are withdrawn from the market and causes 50% of acute liver failure cases in the United States. A common mechanism often underlies many types of drug toxicities, including both DILI and IADRs. Drugs are bioactivated by drug-metabolizing enzymes into reactive metabolites, which then conjugate to sites in proteins or DNA to form adducts. DNA adducts are often mutagenic and may alter the reading and copying of genes and their regulatory elements, causing gene dysregulation and even triggering cancer. Similarly, protein adducts can disrupt their normal biological functions and induce harmful immune responses. Unfortunately, reactive metabolites are not reliably detected by experiments, and it is also expensive to test drug candidates for potential to form DNA or protein adducts during the early stages of drug development. In contrast, computational methods have the potential to quickly screen for covalent binding potential, thereby flagging problematic molecules and reducing the total number of necessary experiments. Here, we train a deep convolution neural network-the XenoSite reactivity model-using literature data to accurately predict both sites and probability of reactivity for molecules with glutathione, cyanide, protein, and DNA. On the site level, cross-validated predictions had area under the curve (AUC) performances of 89.8% for DNA and 94.4% for protein. Furthermore, the model separated molecules electrophilically reactive with DNA and protein from nonreactive molecules with cross-validated AUC performances of 78.7% and 79.8%, respectively. On both the site- and molecule-level, the model's performances significantly outperformed reactivity indices derived from quantum simulations that are reported in the literature. Moreover, we developed and applied a selectivity score to assess preferential reactions with the macromolecules as opposed to the common screening traps. For the entire data set of 2803 molecules, this approach yielded totals of 257 (9.2%) and 227 (8.1%) molecules predicted to be reactive only with DNA and protein, respectively, and hence those that would be missed by standard reactivity screening experiments. Site of reactivity data is an underutilized resource that can be used to not only predict if molecules are reactive, but also show where they might be modified to reduce toxicity while retaining efficacy. The XenoSite reactivity model is available at http://swami.wustl.edu/xenosite/p/reactivity.
Collapse
Affiliation(s)
- Tyler
B. Hughes
- Department
of Pathology and Immunology, Washington
University School of Medicine, Campus
Box 8118, 660 South Euclid Avenue, St. Louis, Missouri 63110, United States
| | - Na Le Dang
- Department
of Pathology and Immunology, Washington
University School of Medicine, Campus
Box 8118, 660 South Euclid Avenue, St. Louis, Missouri 63110, United States
| | - Grover P. Miller
- Department
of Biochemistry and Molecular Biology, University
of Arkansas for Medical Sciences, Little Rock, Arkansas 72205, United States
| | - S. Joshua Swamidass
- Department
of Pathology and Immunology, Washington
University School of Medicine, Campus
Box 8118, 660 South Euclid Avenue, St. Louis, Missouri 63110, United States
- E-mail:
| |
Collapse
|
7
|
Hughes TB, Miller GP, Swamidass SJ. Modeling Epoxidation of Drug-like Molecules with a Deep Machine Learning Network. ACS CENTRAL SCIENCE 2015; 1:168-80. [PMID: 27162970 PMCID: PMC4827534 DOI: 10.1021/acscentsci.5b00131] [Citation(s) in RCA: 109] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Indexed: 05/02/2023]
Abstract
Drug toxicity is frequently caused by electrophilic reactive metabolites that covalently bind to proteins. Epoxides comprise a large class of three-membered cyclic ethers. These molecules are electrophilic and typically highly reactive due to ring tension and polarized carbon-oxygen bonds. Epoxides are metabolites often formed by cytochromes P450 acting on aromatic or double bonds. The specific location on a molecule that undergoes epoxidation is its site of epoxidation (SOE). Identifying a molecule's SOE can aid in interpreting adverse events related to reactive metabolites and direct modification to prevent epoxidation for safer drugs. This study utilized a database of 702 epoxidation reactions to build a model that accurately predicted sites of epoxidation. The foundation for this model was an algorithm originally designed to model sites of cytochromes P450 metabolism (called XenoSite) that was recently applied to model the intrinsic reactivity of diverse molecules with glutathione. This modeling algorithm systematically and quantitatively summarizes the knowledge from hundreds of epoxidation reactions with a deep convolution network. This network makes predictions at both an atom and molecule level. The final epoxidation model constructed with this approach identified SOEs with 94.9% area under the curve (AUC) performance and separated epoxidized and non-epoxidized molecules with 79.3% AUC. Moreover, within epoxidized molecules, the model separated aromatic or double bond SOEs from all other aromatic or double bonds with AUCs of 92.5% and 95.1%, respectively. Finally, the model separated SOEs from sites of sp(2) hydroxylation with 83.2% AUC. Our model is the first of its kind and may be useful for the development of safer drugs. The epoxidation model is available at http://swami.wustl.edu/xenosite.
Collapse
Affiliation(s)
- Tyler B. Hughes
- Department
of Pathology and Immunology, Washington
University School of Medicine, Campus Box 8118, 660 South Euclid Avenue, St. Louis, Missouri 63110, United States
| | - Grover P. Miller
- Department
of Biochemistry and Molecular Biology, University
of Arkansas for Medical Sciences, Little Rock, Arkansas 72205, United States
| | - S. Joshua Swamidass
- Department
of Pathology and Immunology, Washington
University School of Medicine, Campus Box 8118, 660 South Euclid Avenue, St. Louis, Missouri 63110, United States
| |
Collapse
|
8
|
Hughes TB, Miller GP, Swamidass SJ. Site of reactivity models predict molecular reactivity of diverse chemicals with glutathione. Chem Res Toxicol 2015; 28:797-809. [PMID: 25742281 DOI: 10.1021/acs.chemrestox.5b00017] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Drug toxicity is often caused by electrophilic reactive metabolites that covalently bind to proteins. Consequently, the quantitative strength of a molecule's reactivity with glutathione (GSH) is a frequently used indicator of its toxicity. Through cysteine, GSH (and proteins) scavenges reactive molecules to form conjugates in the body. GSH conjugates to specific atoms in reactive molecules: their sites of reactivity. The value of knowing a molecule's sites of reactivity is unexplored in the literature. This study tests the value of site of reactivity data that identifies the atoms within 1213 reactive molecules that conjugate to GSH and builds models to predict molecular reactivity with glutathione. An algorithm originally written to model sites of cytochrome P450 metabolism (called XenoSite) finds clear patterns in molecular structure that identify sites of reactivity within reactive molecules with 90.8% accuracy and separate reactive and unreactive molecules with 80.6% accuracy. Furthermore, the model output strongly correlates with quantitative GSH reactivity data in chemically diverse, external data sets. Site of reactivity data is nearly unstudied in the literature prior to our efforts, yet it contains a strong signal for reactivity that can be utilized to more accurately predict molecule reactivity and, eventually, toxicity.
Collapse
Affiliation(s)
- Tyler B Hughes
- †Department of Pathology and Immunology, Washington University School of Medicine, Campus Box 8118, 660 S. Euclid Ave., St. Louis, Missouri 63110, United States
| | - Grover P Miller
- ‡Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72205, United States
| | - S Joshua Swamidass
- †Department of Pathology and Immunology, Washington University School of Medicine, Campus Box 8118, 660 S. Euclid Ave., St. Louis, Missouri 63110, United States
| |
Collapse
|
9
|
Rydberg P. Reactivity‐Based Approaches and Machine Learning Methods for Predicting the Sites of Cytochrome P450‐Mediated Metabolism. ACTA ACUST UNITED AC 2014. [DOI: 10.1002/9783527673261.ch11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
10
|
Kirchmair J, Williamson MJ, Afzal AM, Tyzack JD, Choy APK, Howlett A, Rydberg P, Glen RC. FAst MEtabolizer (FAME): A rapid and accurate predictor of sites of metabolism in multiple species by endogenous enzymes. J Chem Inf Model 2013; 53:2896-907. [PMID: 24219364 DOI: 10.1021/ci400503s] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
FAst MEtabolizer (FAME) is a fast and accurate predictor of sites of metabolism (SoMs). It is based on a collection of random forest models trained on diverse chemical data sets of more than 20 000 molecules annotated with their experimentally determined SoMs. Using a comprehensive set of available data, FAME aims to assess metabolic processes from a holistic point of view. It is not limited to a specific enzyme family or species. Besides a global model, dedicated models are available for human, rat, and dog metabolism; specific prediction of phase I and II metabolism is also supported. FAME is able to identify at least one known SoM among the top-1, top-2, and top-3 highest ranked atom positions in up to 71%, 81%, and 87% of all cases tested, respectively. These prediction rates are comparable to or better than SoM predictors focused on specific enzyme families (such as cytochrome P450s), despite the fact that FAME uses only seven chemical descriptors. FAME covers a very broad chemical space, which together with its inter- and extrapolation power makes it applicable to a wide range of chemicals. Predictions take less than 2.5 s per molecule in batch mode on an Ultrabook. Results are visualized using Jmol, with the most likely SoMs highlighted.
Collapse
Affiliation(s)
- Johannes Kirchmair
- Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge , Lensfield Road, CB2 1EW, Cambridge, United Kingdom
| | | | | | | | | | | | | | | |
Collapse
|
11
|
Tyzack JD, Williamson MJ, Torella R, Glen RC. Prediction of cytochrome P450 xenobiotic metabolism: tethered docking and reactivity derived from ligand molecular orbital analysis. J Chem Inf Model 2013; 53:1294-305. [PMID: 23701380 DOI: 10.1021/ci400058s] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Metabolism of xenobiotic and endogenous compounds is frequently complex, not completely elucidated, and therefore often ambiguous. The prediction of sites of metabolism (SoM) can be particularly helpful as a first step toward the identification of metabolites, a process especially relevant to drug discovery. This paper describes a reactivity approach for predicting SoM whereby reactivity is derived directly from the ground state ligand molecular orbital analysis, calculated using Density Functional Theory, using a novel implementation of the average local ionization energy. Thus each potential SoM is sampled in the context of the whole ligand, in contrast to other popular approaches where activation energies are calculated for a predefined database of molecular fragments and assigned to matching moieties in a query ligand. In addition, one of the first descriptions of molecular dynamics of cytochrome P450 (CYP) isoforms 3A4, 2D6, and 2C9 in their Compound I state is reported, and, from the representative protein structures obtained, an analysis and evaluation of various docking approaches using GOLD is performed. In particular, a covalent docking approach is described coupled with the modeling of important electrostatic interactions between CYP and ligand using spherical constraints. Combining the docking and reactivity results, obtained using standard functionality from common docking and quantum chemical applications, enables a SoM to be identified in the top 2 predictions for 75%, 80%, and 78% of the data sets for 3A4, 2D6, and 2C9, respectively, results that are accessible and competitive with other recently published prediction tools.
Collapse
Affiliation(s)
- Jonathan D Tyzack
- Unilever Centre for Molecular Science Informatics, Department of Chemistry, Lensfield Road, Cambridge, CB2 1EW, United Kingdom
| | | | | | | |
Collapse
|
12
|
Zaretzki J, Rydberg P, Bergeron C, Bennett KP, Olsen L, Breneman CM. RS-Predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes. J Chem Inf Model 2012; 52:1637-59. [PMID: 22524152 DOI: 10.1021/ci300009z] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
RS-Predictor is a tool for creating pathway-independent, isozyme-specific, site of metabolism (SOM) prediction models using any set of known cytochrome P450 (CYP) substrates and metabolites. Until now, the RS-Predictor method was only trained and validated on CYP 3A4 data, but in the present study, we report on the versatility the RS-Predictor modeling paradigm by creating and testing regioselectivity models for substrates of the nine most important CYP isozymes. Through curation of source literature, we have assembled 680 substrates distributed among CYPs 1A2, 2A6, 2B6, 2C19, 2C8, 2C9, 2D6, 2E1, and 3A4, the largest publicly accessible collection of P450 ligands and metabolites released to date. A comprehensive investigation into the importance of different descriptor classes for identifying the regioselectivity mediated by each isozyme is made through the generation of multiple independent RS-Predictor models for each set of isozyme substrates. Two of these models include a density functional theory (DFT) reactivity descriptor derived from SMARTCyp. Optimal combinations of RS-Predictor and SMARTCyp are shown to have stronger performance than either method alone, while also exceeding the accuracy of the commercial regioselectivity prediction methods distributed by Optibrium and Schrödinger, correctly identifying a large proportion of the metabolites in each substrate set within the top two rank-positions: 1A2 (83.0%), 2A6 (85.7%), 2B6 (82.1%), 2C19 (86.2%), 2C8 (83.8%), 2C9 (84.5%), 2D6 (85.9%), 2E1 (82.8%), 3A4 (82.3%), and merged (86.0%). Comprehensive datamining of each substrate set and careful statistical analyses of the predictions made by the different models revealed new insights into molecular features that control metabolic regioselectivity and enable accurate prospective prediction of likely SOMs.
Collapse
Affiliation(s)
- Jed Zaretzki
- Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, New York 12180, USA
| | | | | | | | | | | |
Collapse
|
13
|
Quantitative Property-Property Relationship for Screening-Level Prediction of Intrinsic Clearance of Volatile Organic Chemicals in Rats and Its Integration within PBPK Models to Predict Inhalation Pharmacokinetics in Humans. J Toxicol 2012; 2012:286079. [PMID: 22685458 PMCID: PMC3364689 DOI: 10.1155/2012/286079] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2011] [Revised: 01/13/2012] [Accepted: 01/13/2012] [Indexed: 01/28/2023] Open
Abstract
The objectives of this study were (i) to develop a screening-level Quantitative property-property relationship (QPPR) for intrinsic clearance (CLint) obtained from in vivo animal studies and (ii) to incorporate it with human physiology in a PBPK model for predicting the inhalation pharmacokinetics of VOCs. CLint, calculated as the ratio of the in vivo Vmax (μmol/h/kg bw rat) to the Km (μM), was obtained for 26 VOCs from the literature. The QPPR model resulting from stepwise linear regression analysis passed the validation step (R2 = 0.8; leave-one-out cross-validation Q2 = 0.75) for CLint normalized to the phospholipid (PL) affinity of the VOCs. The QPPR facilitated the calculation of CLint (L PL/h/kg bw rat) from the input data on log Pow, log blood: water PC and ionization potential. The predictions of the QPPR as lower and upper bounds of the 95% mean confidence intervals (LMCI and UMCI, resp.) were then integrated within a human PBPK model. The ratio of the maximum (using LMCI for
CLint) to minimum (using UMCI for CLint) AUC predicted by the QPPR-PBPK model was 1.36 ± 0.4 and ranged from 1.06 (1,1-dichloroethylene) to 2.8 (isoprene). Overall, the integrated QPPR-PBPK modeling method developed in this study is a pragmatic way of characterizing the impact of the lack of knowledge of CLint in predicting human pharmacokinetics of VOCs, as well as the impact of prediction uncertainty of CLint on human pharmacokinetics of VOCs.
Collapse
|
14
|
Tie Y, McPhail B, Hong H, Pearce BA, Schnackenberg LK, Ge W, Buzatu DA, Wilkes JG, Fuscoe JC, Tong W, Fowler BA, Beger RD, Demchuk E. Modeling chemical interaction profiles: II. Molecular docking, spectral data-activity relationship, and structure-activity relationship models for potent and weak inhibitors of cytochrome P450 CYP3A4 isozyme. Molecules 2012; 17:3407-60. [PMID: 22421793 PMCID: PMC6268819 DOI: 10.3390/molecules17033407] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2012] [Revised: 02/27/2012] [Accepted: 02/28/2012] [Indexed: 01/15/2023] Open
Abstract
Polypharmacy increasingly has become a topic of public health concern, particularly as the U.S. population ages. Drug labels often contain insufficient information to enable the clinician to safely use multiple drugs. Because many of the drugs are bio-transformed by cytochrome P450 (CYP) enzymes, inhibition of CYP activity has long been associated with potentially adverse health effects. In an attempt to reduce the uncertainty pertaining to CYP-mediated drug-drug/chemical interactions, an interagency collaborative group developed a consensus approach to prioritizing information concerning CYP inhibition. The consensus involved computational molecular docking, spectral data-activity relationship (SDAR), and structure-activity relationship (SAR) models that addressed the clinical potency of CYP inhibition. The models were built upon chemicals that were categorized as either potent or weak inhibitors of the CYP3A4 isozyme. The categorization was carried out using information from clinical trials because currently available in vitro high-throughput screening data were not fully representative of the in vivo potency of inhibition. During categorization it was found that compounds, which break the Lipinski rule of five by molecular weight, were about twice more likely to be inhibitors of CYP3A4 compared to those, which obey the rule. Similarly, among inhibitors that break the rule, potent inhibitors were 2–3 times more frequent. The molecular docking classification relied on logistic regression, by which the docking scores from different docking algorithms, CYP3A4 three-dimensional structures, and binding sites on them were combined in a unified probabilistic model. The SDAR models employed a multiple linear regression approach applied to binned 1D 13C-NMR and 1D 15N-NMR spectral descriptors. Structure-based and physical-chemical descriptors were used as the basis for developing SAR models by the decision forest method. Thirty-three potent inhibitors and 88 weak inhibitors of CYP3A4 were used to train the models. Using these models, a synthetic majority rules consensus classifier was implemented, while the confidence of estimation was assigned following the percent agreement strategy. The classifier was applied to a testing set of 120 inhibitors not included in the development of the models. Five compounds of the test set, including known strong inhibitors dalfopristin and tioconazole, were classified as probable potent inhibitors of CYP3A4. Other known strong inhibitors, such as lopinavir, oltipraz, quercetin, raloxifene, and troglitazone, were among 18 compounds classified as plausible potent inhibitors of CYP3A4. The consensus estimation of inhibition potency is expected to aid in the nomination of pharmaceuticals, dietary supplements, environmental pollutants, and occupational and other chemicals for in-depth evaluation of the CYP3A4 inhibitory activity. It may serve also as an estimate of chemical interactions via CYP3A4 metabolic pharmacokinetic pathways occurring through polypharmacy and nutritional and environmental exposures to chemical mixtures.
Collapse
Affiliation(s)
- Yunfeng Tie
- Division of Toxicology and Environmental Medicine, Agency for Toxic Substances and Disease Registry, Atlanta, GA 30333, USA; (Y.T.); (B.M.); (B.A.F.)
| | - Brooks McPhail
- Division of Toxicology and Environmental Medicine, Agency for Toxic Substances and Disease Registry, Atlanta, GA 30333, USA; (Y.T.); (B.M.); (B.A.F.)
| | - Huixiao Hong
- Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (H.H.); (B.A.P.); (L.K.S.); (W.G.); (D.A.B.); (J.G.W.); (J.C.F.); (W.T.); (R.D.B.)
| | - Bruce A. Pearce
- Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (H.H.); (B.A.P.); (L.K.S.); (W.G.); (D.A.B.); (J.G.W.); (J.C.F.); (W.T.); (R.D.B.)
| | - Laura K. Schnackenberg
- Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (H.H.); (B.A.P.); (L.K.S.); (W.G.); (D.A.B.); (J.G.W.); (J.C.F.); (W.T.); (R.D.B.)
| | - Weigong Ge
- Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (H.H.); (B.A.P.); (L.K.S.); (W.G.); (D.A.B.); (J.G.W.); (J.C.F.); (W.T.); (R.D.B.)
| | - Dan A. Buzatu
- Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (H.H.); (B.A.P.); (L.K.S.); (W.G.); (D.A.B.); (J.G.W.); (J.C.F.); (W.T.); (R.D.B.)
| | - Jon G. Wilkes
- Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (H.H.); (B.A.P.); (L.K.S.); (W.G.); (D.A.B.); (J.G.W.); (J.C.F.); (W.T.); (R.D.B.)
| | - James C. Fuscoe
- Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (H.H.); (B.A.P.); (L.K.S.); (W.G.); (D.A.B.); (J.G.W.); (J.C.F.); (W.T.); (R.D.B.)
| | - Weida Tong
- Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (H.H.); (B.A.P.); (L.K.S.); (W.G.); (D.A.B.); (J.G.W.); (J.C.F.); (W.T.); (R.D.B.)
| | - Bruce A. Fowler
- Division of Toxicology and Environmental Medicine, Agency for Toxic Substances and Disease Registry, Atlanta, GA 30333, USA; (Y.T.); (B.M.); (B.A.F.)
| | - Richard D. Beger
- Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (H.H.); (B.A.P.); (L.K.S.); (W.G.); (D.A.B.); (J.G.W.); (J.C.F.); (W.T.); (R.D.B.)
| | - Eugene Demchuk
- Division of Toxicology and Environmental Medicine, Agency for Toxic Substances and Disease Registry, Atlanta, GA 30333, USA; (Y.T.); (B.M.); (B.A.F.)
- Department of Basic Pharmaceutical Sciences, West Virginia University, Morgantown, WV 26506-9530, USA
- Author to whom correspondence should be addressed; ; Tel.: +1-770-488-3327; Fax: +1-404-248-4142
| |
Collapse
|
15
|
Danielson ML, Desai PV, Mohutsky MA, Wrighton SA, Lill MA. Potentially increasing the metabolic stability of drug candidates via computational site of metabolism prediction by CYP2C9: The utility of incorporating protein flexibility via an ensemble of structures. Eur J Med Chem 2011; 46:3953-63. [PMID: 21703735 DOI: 10.1016/j.ejmech.2011.05.067] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2011] [Revised: 05/24/2011] [Accepted: 05/26/2011] [Indexed: 10/18/2022]
Abstract
Cytochrome P450 enzymes are responsible for metabolizing many endogenous and xenobiotic molecules encountered by the human body. It has been estimated that 75% of all drugs are metabolized by cytochrome P450 enzymes. Thus, predicting a compound's potential sites of metabolism (SOM) is highly advantageous early in the drug development process. We have combined molecular dynamics, AutoDock Vina docking, the neighboring atom type (NAT) reactivity model, and a solvent-accessible surface-area term to form a reactivity-accessibility model capable of predicting SOM for cytochrome P450 2C9 substrates. To investigate the importance of protein flexibility during the ligand-binding process, the results of SOM prediction using a static protein structure for docking were compared to SOM prediction using multiple protein structures in ensemble docking. The results reported here indicate that ensemble docking increases the number of ligands that can be docked in a bioactive conformation (ensemble: 96%, static: 85%) but only leads to a slight improvement (49% vs. 44%) in predicting an experimentally known SOM in the top-1 position for a ligand library of 75 CYP2C9 substrates. Using ensemble docking, the reactivity-accessibility model accurately predicts SOM in the top-1 ranked position for 49% of the ligand library and considering the top-3 predicted sites increases the prediction success rate to approximately 70% of the ligand library. Further classifying the substrate library according to K(m) values leads to an improvement in SOM prediction for substrates with low K(m) values (57% at top-1). While the current predictive power of the reactivity-accessibility model still leaves significant room for improvement, the results illustrate the usefulness of this method to identify key protein-ligand interactions and guide structural modifications of the ligand to increase its metabolic stability.
Collapse
Affiliation(s)
- Matthew L Danielson
- Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, 575 Stadium Mall Drive, West Lafayette, IN 47907, USA
| | | | | | | | | |
Collapse
|
16
|
Zaretzki J, Bergeron C, Rydberg P, Huang TW, Bennett KP, Breneman CM. RS-predictor: a new tool for predicting sites of cytochrome P450-mediated metabolism applied to CYP 3A4. J Chem Inf Model 2011; 51:1667-89. [PMID: 21528931 DOI: 10.1021/ci2000488] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
This article describes RegioSelectivity-Predictor (RS-Predictor), a new in silico method for generating predictive models of P450-mediated metabolism for drug-like compounds. Within this method, potential sites of metabolism (SOMs) are represented as "metabolophores": A concept that describes the hierarchical combination of topological and quantum chemical descriptors needed to represent the reactivity of potential metabolic reaction sites. RS-Predictor modeling involves the use of metabolophore descriptors together with multiple-instance ranking (MIRank) to generate an optimized descriptor weight vector that encodes regioselectivity trends across all cases in a training set. The resulting pathway-independent (O-dealkylation vs N-oxidation vs Csp(3) hydroxylation, etc.), isozyme-specific regioselectivity model may be used to predict potential metabolic liabilities. In the present work, cross-validated RS-Predictor models were generated for a set of 394 substrates of CYP 3A4 as a proof-of-principle for the method. Rank aggregation was then employed to merge independently generated predictions for each substrate into a single consensus prediction. The resulting consensus RS-Predictor models were shown to reliably identify at least one observed site of metabolism in the top two rank-positions on 78% of the substrates. Comparisons between RS-Predictor and previously described regioselectivity prediction methods reveal new insights into how in silico metabolite prediction methods should be compared.
Collapse
Affiliation(s)
- Jed Zaretzki
- Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, New York 12180, USA
| | | | | | | | | | | |
Collapse
|
17
|
Vaz RJ, Zamora I, Li Y, Reiling S, Shen J, Cruciani G. The challenges of in silico contributions to drug metabolism in lead optimization. Expert Opin Drug Metab Toxicol 2011; 6:851-61. [PMID: 20565339 DOI: 10.1517/17425255.2010.499123] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
IMPORTANCE OF THE FIELD The site of metabolism (SOM) predictions by CYP 3A4 are extremely important during the drug discovery process especially during the lead discovery or library design phases. With the ability to rapidly characterize metabolites from these enzymes, the challenges facing in silico contribution change during the drug optimization phase. Some of the challenges are addressed in this article. Some aspects of the SOM prediction software and methodology are discussed in this opinion article and examples of software utility in overcoming metabolic instability in drug optimization are shown. AREAS COVERED IN THIS REVIEW SOM prediction by various approaches is discussed. Two ways of overcoming metabolic instability, blocking the metabolic softspots and rational modification of the instable molecule to avoid interaction with the CYP pocket, are discussed. The contribution plot in MetaSite and its use are discussed. WHAT THE READER WILL GAIN The reader will gain an understanding of possible approaches to either blocking the metabolic softspot or rationally modifying the molecule using MetaSite software or docking approaches. Blocking metabolism using fluorination has risks especially introducing multifluorinated benzene rings in the molecule. TAKE HOME MESSAGE During the lead optimization phase of drug discovery, when metabolic instability is an issue in a series, in silico approaches can be used to modify the molecule in order to decrease clearance due to metabolism, even that due to CYP3A4.
Collapse
Affiliation(s)
- Roy J Vaz
- Structure, Design, Informatics, Sanofi-Aventis US, 1041 Rt 202/206N, Bridgewater, NJ 08807, USA.
| | | | | | | | | | | |
Collapse
|
18
|
Tarcsay Á, Keserű GM. In silicosite of metabolism prediction of cytochrome P450-mediated biotransformations. Expert Opin Drug Metab Toxicol 2011; 7:299-312. [DOI: 10.1517/17425255.2011.553599] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
19
|
Mayeno AN, Robinson JL, Reisfeld B. Rapid estimation of activation enthalpies for cytochrome-P450-mediated hydroxylations. J Comput Chem 2010; 32:639-57. [DOI: 10.1002/jcc.21649] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2010] [Revised: 06/25/2010] [Accepted: 07/11/2010] [Indexed: 11/08/2022]
|
20
|
Hasegawa K, Koyama M, Funatsu K. Quantitative Prediction of Regioselectivity Toward Cytochrome P450/3A4 Using Machine Learning Approaches. Mol Inform 2010; 29:243-9. [PMID: 27462767 DOI: 10.1002/minf.200900086] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2009] [Accepted: 02/11/2010] [Indexed: 11/11/2022]
Abstract
In the drug discovery process, it is important to know the properties of both drug candidates and their metabolites. Fast and precise prediction of metabolites is essential. However, it has been difficult to predict metabolites because of the complexity of the mechanism of cytochrome P450/3A4 (CYP 3A4), which is the main metabolite enzyme of drugs. In this study, we focus on the regioselectivity of CYP 3A4, i.e., the selectivity of metabolic sites. We have developed a model to predict the regioselectivity of drug candidates by using machine learning (ML) approaches.
Collapse
Affiliation(s)
- Kiyoshi Hasegawa
- Chugai Pharmaceutical Company, Kamakura Research, Laboratories, Kajiwara 200, Kamakura, Kanagawa 247-8530, Japan
| | - Michio Koyama
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan phone: (+81) 03-5841-7751 fax: (+81) 03-5841-7771
| | - Kimito Funatsu
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan phone: (+81) 03-5841-7751 fax: (+81) 03-5841-7771.
| |
Collapse
|