1
|
Zheng S, Xue C, Li S, Zao X, Li X, Liu Q, Cao X, Wang W, Qi W, Zhang P, Ye Y. Chinese medicine in the treatment of non-alcoholic fatty liver disease based on network pharmacology: a review. Front Pharmacol 2024; 15:1381712. [PMID: 38694920 PMCID: PMC11061375 DOI: 10.3389/fphar.2024.1381712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 03/29/2024] [Indexed: 05/04/2024] Open
Abstract
Non-alcoholic fatty liver disease (NAFLD) is a clinicopathological syndrome characterized by abnormalities in hepatic fat deposition, the incidence of which has been increasing year by year in recent years. It has become the largest chronic liver disease globally and one of the important causes of cirrhosis and even primary liver cancer formation. The pathogenesis of NAFLD has not yet been fully clarified. Modern medicine lacks targeted clinical treatment protocols for NAFLD, and most drugs lack efficacy and have high side effects. In contrast, Traditional Chinese Medicine (TCM) has significant advantages in the treatment and prevention of NAFLD, which have been widely recognized by scholars around the world. In recent years, through the establishment of a "medicine-disease-target-pathway" network relationship, network pharmacology can explore the molecular basis of the role of medicines in disease prevention and treatment from various perspectives, predicting the pharmacological mechanism of the corresponding medicines. This approach is compatible with the holistic view and treatment based on pattern differentiation of TCM and has been widely used in TCM research. In this paper, by searching relevant databases such as PubMed, Web of Science, and Embase, we reviewed and analyzed the relevant signaling pathways and specific mechanisms of action of single Chinese medicine, Chinese medicine combinations, and Chinese patent medicine for the treatment of NAFLD in recent years. These related studies fully demonstrated the therapeutic characteristics of TCM with multi-components, multi-targets, and multi-pathways, which provided strong support for the exact efficacy of TCM exerted in the clinic. In conclusion, we believe that network pharmacology is more in line with the TCM mindset of treating diseases, but with some limitations. In the future, we should eliminate the potential risks of false positives and false negatives, clarify the interconnectivity between components, targets, and diseases, and conduct deeper clinical or experimental studies.
Collapse
Affiliation(s)
- Shihao Zheng
- Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
- Beijing University of Chinese Medicine, Beijing, China
| | - Chengyuan Xue
- Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
- Beijing University of Chinese Medicine, Beijing, China
| | - Size Li
- Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
- Beijing University of Chinese Medicine, Beijing, China
| | - Xiaobin Zao
- Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
- Key Laboratory of Chinese Internal Medicine of Ministry of Education and Beijing, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Xiaoke Li
- Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
- Liver Diseases Academy of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Qiyao Liu
- Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
- Beijing University of Chinese Medicine, Beijing, China
| | - Xu Cao
- Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
- Liver Diseases Academy of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Wei Wang
- Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
- Beijing University of Chinese Medicine, Beijing, China
| | - Wenying Qi
- Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
- Beijing University of Chinese Medicine, Beijing, China
| | - Peng Zhang
- Dongfang Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Yongan Ye
- Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
- Liver Diseases Academy of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| |
Collapse
|
2
|
Chang LY, Lee MZ, Wu Y, Lee WK, Ma CL, Chang JM, Chen CW, Huang TC, Lee CH, Lee JC, Tseng YY, Lin CY. Gene set correlation enrichment analysis for interpreting and annotating gene expression profiles. Nucleic Acids Res 2024; 52:e17. [PMID: 38096046 PMCID: PMC10853793 DOI: 10.1093/nar/gkad1187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 11/17/2023] [Accepted: 11/29/2023] [Indexed: 02/10/2024] Open
Abstract
Pathway analysis, including nontopology-based (non-TB) and topology-based (TB) methods, is widely used to interpret the biological phenomena underlying differences in expression data between two phenotypes. By considering dependencies and interactions between genes, TB methods usually perform better than non-TB methods in identifying pathways that include closely relevant or directly causative genes for a given phenotype. However, most TB methods may be limited by incomplete pathway data used as the reference network or by difficulties in selecting appropriate reference networks for different research topics. Here, we propose a gene set correlation enrichment analysis method, Gscore, based on an expression dataset-derived coexpression network to examine whether a differentially expressed gene (DEG) list (or each of its DEGs) is associated with a known gene set. Gscore is better able to identify target pathways in 89 human disease expression datasets than eight other state-of-the-art methods and offers insight into how disease-wide and pathway-wide associations reflect clinical outcomes. When applied to RNA-seq data from COVID-19-related cells and patient samples, Gscore provided a means for studying how DEGs are implicated in COVID-19-related pathways. In summary, Gscore offers a powerful analytical approach for annotating individual DEGs, DEG lists, and genome-wide expression profiles based on existing biological knowledge.
Collapse
Affiliation(s)
- Lan-Yun Chang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Meng-Zhan Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Yujia Wu
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Wen-Kai Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Chia-Liang Ma
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Jun-Mao Chang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Ciao-Wen Chen
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Tzu-Chun Huang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Chia-Hwa Lee
- School of Medical Laboratory Science and Biotechnology, College of Medical Science and Technology, Taipei Medical University, New Taipei City 235, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDSB), National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- TMU Research Center of Cancer Translational Medicine, Taipei Medical University, Taipei 110, Taiwan
- Ph.D. Program in Medical Biotechnology, College of Medical Science and Technology, Taipei Medical University, New Taipei City 235, Taiwan
| | - Jih-Chin Lee
- Department of Otolaryngology-Head and Neck Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei 110, Taiwan
| | - Yu-Yao Tseng
- Department of Food Science, Nutrition, and Nutraceutical Biotechnology, Shih Chien University, Taipei 104, Taiwan
| | - Chun-Yu Lin
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDSB), National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Cancer and Immunology Research Center, National Yang Ming Chiao Tung University, Taipei 112, Taiwan
- Institute of Data Science and Engineering, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- School of Dentistry, Kaohsiung Medical University, Kaohsiung 807, Taiwan
| |
Collapse
|
3
|
Hissong R, Evans KR, Evans CR. Compound Identification Strategies in Mass Spectrometry-Based Metabolomics and Pharmacometabolomics. Handb Exp Pharmacol 2023; 277:43-71. [PMID: 36409330 DOI: 10.1007/164_2022_617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The metabolome is composed of a vast array of molecules, including endogenous metabolites and lipids, diet- and microbiome-derived substances, pharmaceuticals and supplements, and exposome chemicals. Correct identification of compounds from this diversity of classes is essential to derive biologically relevant insights from metabolomics data. In this chapter, we aim to provide a practical overview of compound identification strategies for mass spectrometry-based metabolomics, with a particular eye toward pharmacologically-relevant studies. First, we describe routine compound identification strategies applicable to targeted metabolomics. Next, we discuss both experimental (data acquisition-focused) and computational (software-focused) strategies used to identify unknown compounds in untargeted metabolomics data. We then discuss the importance of, and methods for, assessing and reporting the level of confidence of compound identifications. Throughout the chapter, we discuss how these steps can be implemented using today's technology, but also highlight research underway to further improve accuracy and certainty of compound identification. For readers interested in interpreting metabolomics data already collected, this chapter will supply important context regarding the origin of the metabolite names assigned to features in the data and help them assess the certainty of the identifications. For those planning new data acquisition, the chapter supplies guidance for designing experiments and selecting analysis methods to enable accurate compound identification, and it will point the reader toward best-practice data analysis and reporting strategies to allow sound biological and pharmacological interpretation.
Collapse
|
4
|
Artificial intelligence and machine-learning approaches in structure and ligand-based discovery of drugs affecting central nervous system. Mol Divers 2022; 27:959-985. [PMID: 35819579 DOI: 10.1007/s11030-022-10489-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 06/21/2022] [Indexed: 12/11/2022]
Abstract
CNS disorders are indications with a very high unmet medical needs, relatively smaller number of available drugs, and a subpar satisfaction level among patients and caregiver. Discovery of CNS drugs is extremely expensive affair with its own unique challenges leading to extremely high attrition rates and low efficiency. With explosion of data in information age, there is hardly any aspect of life that has not been touched by data driven technologies such as artificial intelligence (AI) and machine learning (ML). Drug discovery is no exception, emergence of big data via genomic, proteomic, biological, and chemical technologies has driven pharmaceutical giants to collaborate with AI oriented companies to revolutionise drug discovery, with the goal of increasing the efficiency of the process. In recent years many examples of innovative applications of AI and ML techniques in CNS drug discovery has been reported. Research on therapeutics for diseases such as schizophrenia, Alzheimer's and Parkinsonism has been provided with a new direction and thrust from these developments. AI and ML has been applied to both ligand-based and structure-based drug discovery and design of CNS therapeutics. In this review, we have summarised the general aspects of AI and ML from the perspective of drug discovery followed by a comprehensive coverage of the recent developments in the applications of AI/ML techniques in CNS drug discovery.
Collapse
|
5
|
Prediction of Anti-Glioblastoma Drug-Decorated Nanoparticle Delivery Systems Using Molecular Descriptors and Machine Learning. Int J Mol Sci 2021; 22:ijms222111519. [PMID: 34768951 PMCID: PMC8584266 DOI: 10.3390/ijms222111519] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/08/2021] [Accepted: 10/22/2021] [Indexed: 12/22/2022] Open
Abstract
The theoretical prediction of drug-decorated nanoparticles (DDNPs) has become a very important task in medical applications. For the current paper, Perturbation Theory Machine Learning (PTML) models were built to predict the probability of different pairs of drugs and nanoparticles creating DDNP complexes with anti-glioblastoma activity. PTML models use the perturbations of molecular descriptors of drugs and nanoparticles as inputs in experimental conditions. The raw dataset was obtained by mixing the nanoparticle experimental data with drug assays from the ChEMBL database. Ten types of machine learning methods have been tested. Only 41 features have been selected for 855,129 drug-nanoparticle complexes. The best model was obtained with the Bagging classifier, an ensemble meta-estimator based on 20 decision trees, with an area under the receiver operating characteristic curve (AUROC) of 0.96, and an accuracy of 87% (test subset). This model could be useful for the virtual screening of nanoparticle-drug complexes in glioblastoma. All the calculations can be reproduced with the datasets and python scripts, which are freely available as a GitHub repository from authors.
Collapse
|
6
|
Urista DV, Carrué DB, Otero I, Arrasate S, Quevedo-Tumailli VF, Gestal M, González-Díaz H, Munteanu CR. Prediction of Antimalarial Drug-Decorated Nanoparticle Delivery Systems with Random Forest Models. BIOLOGY 2020; 9:biology9080198. [PMID: 32751710 PMCID: PMC7465777 DOI: 10.3390/biology9080198] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 07/22/2020] [Accepted: 07/27/2020] [Indexed: 12/13/2022]
Abstract
Drug-decorated nanoparticles (DDNPs) have important medical applications. The current work combined Perturbation Theory with Machine Learning and Information Fusion (PTMLIF). Thus, PTMLIF models were proposed to predict the probability of nanoparticle–compound/drug complexes having antimalarial activity (against Plasmodium). The aim is to save experimental resources and time by using a virtual screening for DDNPs. The raw data was obtained by the fusion of experimental data for nanoparticles with compound chemical assays from the ChEMBL database. The inputs for the eight Machine Learning classifiers were transformed features of drugs/compounds and nanoparticles as perturbations of molecular descriptors in specific experimental conditions (experiment-centered features). The resulting dataset contains 107 input features and 249,992 examples. The best classification model was provided by Random Forest, with 27 selected features of drugs/compounds and nanoparticles in all experimental conditions considered. The high performance of the model was demonstrated by the mean Area Under the Receiver Operating Characteristics (AUC) in a test subset with a value of 0.9921 ± 0.000244 (10-fold cross-validation). The results demonstrated the power of information fusion of the experimental-centered features of drugs/compounds and nanoparticles for the prediction of nanoparticle–compound antimalarial activity. The scripts and dataset for this project are available in the open GitHub repository.
Collapse
Affiliation(s)
- Diana V. Urista
- Department of Organic Chemistry II, University of Basque Country (UPV/EHU), Sarriena w/n, 48940 Leioa, Spain; (D.V.U.); (S.A.); (H.G.-D.)
| | - Diego B. Carrué
- RNASA-IMEDIR, Computer Science Faculty, CITIC, University of A Coruna, Campus Elviña s/n, 15071 A Coruña, Spain; (D.B.C.); (I.O.); (V.F.Q.-T.); (M.G.)
| | - Iago Otero
- RNASA-IMEDIR, Computer Science Faculty, CITIC, University of A Coruna, Campus Elviña s/n, 15071 A Coruña, Spain; (D.B.C.); (I.O.); (V.F.Q.-T.); (M.G.)
| | - Sonia Arrasate
- Department of Organic Chemistry II, University of Basque Country (UPV/EHU), Sarriena w/n, 48940 Leioa, Spain; (D.V.U.); (S.A.); (H.G.-D.)
| | - Viviana F. Quevedo-Tumailli
- RNASA-IMEDIR, Computer Science Faculty, CITIC, University of A Coruna, Campus Elviña s/n, 15071 A Coruña, Spain; (D.B.C.); (I.O.); (V.F.Q.-T.); (M.G.)
- Universidad Estatal Amazónica UEA, Km. 2 1/2 vía Puyo a Tena (paso lateral), Puyo 160150, Pastaza, Ecuador
| | - Marcos Gestal
- RNASA-IMEDIR, Computer Science Faculty, CITIC, University of A Coruna, Campus Elviña s/n, 15071 A Coruña, Spain; (D.B.C.); (I.O.); (V.F.Q.-T.); (M.G.)
- Biomedical Research Institute of A Coruña (INIBIC), Hospital Teresa Herrera, Xubias de Arriba 84, 15006 A Coruña, Spain
| | - Humbert González-Díaz
- Department of Organic Chemistry II, University of Basque Country (UPV/EHU), Sarriena w/n, 48940 Leioa, Spain; (D.V.U.); (S.A.); (H.G.-D.)
- IKERBASQUE, Basque Foundation for Science, Alameda Urquijo 36, 48011 Bilbao, Spain
- Basque Centre for Biophysics CSIC-UPVEHU, University of Basque Country UPV/EHU, Barrio Sarriena, 48940 Leioa, Spain
| | - Cristian R. Munteanu
- RNASA-IMEDIR, Computer Science Faculty, CITIC, University of A Coruna, Campus Elviña s/n, 15071 A Coruña, Spain; (D.B.C.); (I.O.); (V.F.Q.-T.); (M.G.)
- Biomedical Research Institute of A Coruña (INIBIC), Hospital Teresa Herrera, Xubias de Arriba 84, 15006 A Coruña, Spain
- Correspondence:
| |
Collapse
|
7
|
Gong J, Chen Y, Pu F, Sun P, He F, Zhang L, Li Y, Ma Z, Wang H. Understanding Membrane Protein Drug Targets in Computational Perspective. Curr Drug Targets 2020; 20:551-564. [PMID: 30516106 DOI: 10.2174/1389450120666181204164721] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 09/03/2018] [Accepted: 09/04/2018] [Indexed: 01/16/2023]
Abstract
Membrane proteins play crucial physiological roles in vivo and are the major category of drug targets for pharmaceuticals. The research on membrane protein is a significant part in the drug discovery. The biological process is a cycled network, and the membrane protein is a vital hub in the network since most drugs achieve the therapeutic effect via interacting with the membrane protein. In this review, typical membrane protein targets are described, including GPCRs, transporters and ion channels. Also, we conclude network servers and databases that are referring to the drug, drug-target information and their relevant data. Furthermore, we chiefly introduce the development and practice of modern medicines, particularly demonstrating a series of state-of-the-art computational models for the prediction of drug-target interaction containing network-based approach and machine-learningbased approach as well as showing current achievements. Finally, we discuss the prospective orientation of drug repurposing and drug discovery as well as propose some improved framework in bioactivity data, created or improved predicted approaches, alternative understanding approaches of drugs bioactivity and their biological processes.
Collapse
Affiliation(s)
- Jianting Gong
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Yongbing Chen
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Feng Pu
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Pingping Sun
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Fei He
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Li Zhang
- School of Computer Science and Engineering, Changchun University of Technology, Changchun, China
| | - Yanwen Li
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Han Wang
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| |
Collapse
|
8
|
Zhou Z, Chen B, Chen S, Lin M, Chen Y, Jin S, Chen W, Zhang Y. Applications of Network Pharmacology in Traditional Chinese Medicine Research. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE : ECAM 2020; 2020:1646905. [PMID: 32148533 PMCID: PMC7042531 DOI: 10.1155/2020/1646905] [Citation(s) in RCA: 181] [Impact Index Per Article: 45.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 01/08/2020] [Accepted: 01/20/2020] [Indexed: 01/01/2023]
Abstract
Human diseases, especially infectious ones, have been evolving constantly. However, their treatment strategies are not developing quickly. Some diseases are caused by a variety of factors with very complex pathologies, and the use of a single drug cannot solve these problems. Traditional Chinese Medicine (TCM) medication is a unique treatment method in China. TCM formulae contain multiple herbs with multitarget, multichannel, and multilink characteristics. In recent years, with the flourishing development of network pharmacology, a new method for searching therapeutic drugs has emerged. The multitarget action in network pharmacology is consistent with the complex mechanisms of disease and drug action. Using network pharmacology to understand TCM is an emerging trend.
Collapse
Affiliation(s)
- Zhuchen Zhou
- School of Life Science, Zhejiang Chinese Medical University, Hangzhou, Zhejiang 310053, China
| | - Bing Chen
- School of Life Science, Zhejiang Chinese Medical University, Hangzhou, Zhejiang 310053, China
| | - Simiao Chen
- School of Life Science, Zhejiang Chinese Medical University, Hangzhou, Zhejiang 310053, China
| | - Minqiu Lin
- School of Life Science, Zhejiang Chinese Medical University, Hangzhou, Zhejiang 310053, China
| | - Ying Chen
- School of Life Science, Zhejiang Chinese Medical University, Hangzhou, Zhejiang 310053, China
| | - Shan Jin
- School of Life Science, Zhejiang Chinese Medical University, Hangzhou, Zhejiang 310053, China
| | - Weiyan Chen
- School of Basic Medicine, Zhejiang Chinese Medical University, Hangzhou, Zhejiang 310053, China
| | - Yuyan Zhang
- School of Life Science, Zhejiang Chinese Medical University, Hangzhou, Zhejiang 310053, China
| |
Collapse
|
9
|
Diez-Alarcia R, Yáñez-Pérez V, Muneta-Arrate I, Arrasate S, Lete E, Meana JJ, González-Díaz H. Big Data Challenges Targeting Proteins in GPCR Signaling Pathways; Combining PTML-ChEMBL Models and [ 35S]GTPγS Binding Assays. ACS Chem Neurosci 2019; 10:4476-4491. [PMID: 31618004 DOI: 10.1021/acschemneuro.9b00302] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
G-protein-coupled receptors (GPCRs), also known as 7-transmembrane receptors, are the single largest class of drug targets. Consequently, a large amount of preclinical assays having GPCRs as molecular targets has been released to public sources like the Chemical European Molecular Biology Laboratory (ChEMBL) database. These data are also very complex covering changes in drug chemical structure and assay conditions like c0 = activity parameter (Ki, IC50, etc.), c1 = target protein, c2 = cell line, c3 = assay organism, etc., making difficult the analysis of these databases that are placed in the borders of a Big Data challenge. One of the aims of this work is to develop a computational model able to predict new GPCRs targeting drugs taking into consideration multiple conditions of assay. Another objective is to perform new predictive and experimental studies of selective 5-HTA2 receptor agonist, antagonist, or inverse agonist in human comparing the results with those from the literature. In this work, we combined Perturbation Theory (PT) and Machine Learning (ML) to seek a general PTML model for this data set. We analyzed 343 738 unique compounds with 812 072 end points (assay outcomes), with 185 different experimental parameters, 592 protein targets, 51 cell lines, and/or 55 organisms (species). The best PTML linear model found has three input variables only and predicted 56 202/58 653 positive outcomes (sensitivity = 95.8%) and 470 230/550 401 control cases (specificity = 85.4%) in training series. The model also predicted correctly 18 732/19 549 (95.8%) of positive outcomes and 156 739/183 469 (85.4%) of cases in external validation series. To illustrate its practical use, we used the model to predict the outcomes of six different 5-HT2A receptor drugs, namely, TCB-2, DOI, DOB, altanserin, pimavanserin, and nelotanserin, in a very large number of different pharmacological assays. 5-HT2A receptors are altered in schizophrenia and represent drug target for antipsychotic therapeutic activity. The model correctly predicted 93.83% (76 of 86) experimental results for these compounds reported in ChEMBL. Moreover, [35S]GTPγS binding assays were performed experimentally with the same six drugs with the aim of determining their potency and efficacy in the modulation of G-proteins in human brain tissue. The antagonist ketanserin was included as inactive drug with demonstrated affinity for 5-HT2A/C receptors. Our results demonstrate that some of these drugs, previously described as serotonin 5-HT2A receptor agonists, antagonists, or inverse agonists, are not so specific and show different intrinsic activity to that previously reported. Overall, this work opens a new gate for the prediction of GPCRs targeting compounds.
Collapse
Affiliation(s)
- Rebeca Diez-Alarcia
- Centro de Investigación Biomédica en Red en Salud Mental, 48940 Leioa, Spain
| | | | | | | | | | - J. Javier Meana
- Centro de Investigación Biomédica en Red en Salud Mental, 48940 Leioa, Spain
| | - Humbert González-Díaz
- Biophysics Institute, CSIC-UPV/EHU, University of the Basque Country UPV/EHU, Leioa, 48940, Spain
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain
| |
Collapse
|
10
|
Öztürk H, Ozkirimli E, Özgür A. A novel methodology on distributed representations of proteins using their interacting ligands. Bioinformatics 2019; 34:i295-i303. [PMID: 29949957 PMCID: PMC6022674 DOI: 10.1093/bioinformatics/bty287] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Motivation The effective representation of proteins is a crucial task that directly affects the performance of many bioinformatics problems. Related proteins usually bind to similar ligands. Chemical characteristics of ligands are known to capture the functional and mechanistic properties of proteins suggesting that a ligand-based approach can be utilized in protein representation. In this study, we propose SMILESVec, a Simplified molecular input line entry system (SMILES)-based method to represent ligands and a novel method to compute similarity of proteins by describing them based on their ligands. The proteins are defined utilizing the word-embeddings of the SMILES strings of their ligands. The performance of the proposed protein description method is evaluated in protein clustering task using TransClust and MCL algorithms. Two other protein representation methods that utilize protein sequence, Basic local alignment tool and ProtVec, and two compound fingerprint-based protein representation methods are compared. Results We showed that ligand-based protein representation, which uses only SMILES strings of the ligands that proteins bind to, performs as well as protein sequence-based representation methods in protein clustering. The results suggest that ligand-based protein description can be an alternative to the traditional sequence or structure-based representation of proteins and this novel approach can be applied to different bioinformatics problems such as prediction of new protein–ligand interactions and protein function annotation. Availability and implementation https://github.com/hkmztrk/SMILESVecProteinRepresentation Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hakime Öztürk
- Department of Computer Engineering, Bogazici University, Istanbul, Turkey
| | - Elif Ozkirimli
- Department of Chemical Engineering, Bogazici University, Istanbul, Turkey
| | - Arzucan Özgür
- Department of Computer Engineering, Bogazici University, Istanbul, Turkey
| |
Collapse
|
11
|
Dmitriev AV, Lagunin AA, Karasev DА, Rudik AV, Pogodin PV, Filimonov DA, Poroikov VV. Prediction of Drug-Drug Interactions Related to Inhibition or Induction of Drug-Metabolizing Enzymes. Curr Top Med Chem 2019; 19:319-336. [PMID: 30674264 DOI: 10.2174/1568026619666190123160406] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Revised: 01/02/2019] [Accepted: 01/07/2019] [Indexed: 02/07/2023]
Abstract
Drug-drug interaction (DDI) is the phenomenon of alteration of the pharmacological activity of a drug(s) when another drug(s) is co-administered in cases of so-called polypharmacy. There are three types of DDIs: pharmacokinetic (PK), pharmacodynamic, and pharmaceutical. PK is the most frequent type of DDI, which often appears as a result of the inhibition or induction of drug-metabolising enzymes (DME). In this review, we summarise in silico methods that may be applied for the prediction of the inhibition or induction of DMEs and describe appropriate computational methods for DDI prediction, showing the current situation and perspectives of these approaches in medicinal and pharmaceutical chemistry. We review sources of information on DDI, which can be used in pharmaceutical investigations and medicinal practice and/or for the creation of computational models. The problem of the inaccuracy and redundancy of these data are discussed. We provide information on the state-of-the-art physiologically- based pharmacokinetic modelling (PBPK) approaches and DME-based in silico methods. In the section on ligand-based methods, we describe pharmacophore models, molecular field analysis, quantitative structure-activity relationships (QSAR), and similarity analysis applied to the prediction of DDI related to the inhibition or induction of DME. In conclusion, we discuss the problems of DDI severity assessment, mention factors that influence severity, and highlight the issues, perspectives and practical using of in silico methods.
Collapse
Affiliation(s)
| | - Alexey A Lagunin
- Institute of Biomedical Chemistry, Moscow, Russian Federation.,Pirogov Russian National Research Medical University, Moscow, RussiaN Federation
| | | | | | - Pavel V Pogodin
- Institute of Biomedical Chemistry, Moscow, Russian Federation
| | | | | |
Collapse
|
12
|
Jacob TF, Singh V, Dixit M, Ginsburg-Shmuel T, Fonseca B, Pintor J, Youdim MBH, Major DT, Weinreb O, Fischer B. A promising drug candidate for the treatment of glaucoma based on a P2Y6-receptor agonist. Purinergic Signal 2018; 14:271-284. [PMID: 30019187 DOI: 10.1007/s11302-018-9614-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 05/31/2018] [Indexed: 11/28/2022] Open
Abstract
Extracellular nucleotides can regulate the production/drainage of the aqueous humor via activation of P2 receptors, thus affecting the intraocular pressure (IOP). We evaluated 5-OMe-UDP(α-B), 1A, a potent P2Y6-receptor agonist, for reducing IOP and treating glaucoma. Cell viability in the presence of 1A was measured using [3-(4, 5-dimethyl-thiazol-2-yl) 2, 5-diphenyl-tetrazolium bromide] (MTT) assay in rabbit NPE ciliary non-pigmented and corneal epithelial cells, human retinoblastoma, and liver Huh7 cells. The effect of 1A on IOP was determined in acute glaucomatous rabbit hyaluronate model and phenol-induced chronic glaucomatous rabbit model. The origin of activity of 1A was investigated by generation of a homology model of hP2Y6-R and docking studies. 1A did not exert cytotoxic effects up to 100 mM vs. trusopt and timolol in MTT assay in ocular and liver cells. In normotensive rabbits, 100 μM 1A vs. xalatan, trusopt, and pilocarpine reduced IOP by 45 vs. 20-30%, respectively. In the phenol animal model, 1A (100 μM) showed reduction of IOP by 40 and 20%, following early and late administration, respectively. Docking results suggest that the high activity and selectivity of 1A is due to intramolecular interaction between Pα-BH3 and C5-OMe which positions 1A in a most favorable site inside the receptor. P2Y6-receptor agonist 1A effectively and safely reduces IOP in normotense, acute, and chronic glaucomatous rabbits, and hence may be suggested as a novel approach for the treatment of glaucoma.
Collapse
Affiliation(s)
- Tali Fishman Jacob
- GlaucoPharm Ltd, P.O.Box 620, New Industrial Park, 20692, Yokneam, Israel
| | - Vijay Singh
- Department of Chemistry, Gonda-Goldschmied Medical Research Center, Bar-Ilan University, 52900, Ramat Gan, Israel
| | - Mudit Dixit
- Department of Chemistry, Gonda-Goldschmied Medical Research Center, Bar-Ilan University, 52900, Ramat Gan, Israel
| | - Tamar Ginsburg-Shmuel
- Department of Chemistry, Gonda-Goldschmied Medical Research Center, Bar-Ilan University, 52900, Ramat Gan, Israel
| | - Begoña Fonseca
- Escuela Universitaria De Optica, Universidad Complutense De Madrid, C/Arcos De Jalon 118, 28037, Madrid, Spain
| | - Jesus Pintor
- Escuela Universitaria De Optica, Universidad Complutense De Madrid, C/Arcos De Jalon 118, 28037, Madrid, Spain
| | - Moussa B H Youdim
- GlaucoPharm Ltd, P.O.Box 620, New Industrial Park, 20692, Yokneam, Israel
| | - Dan T Major
- Department of Chemistry, Gonda-Goldschmied Medical Research Center, Bar-Ilan University, 52900, Ramat Gan, Israel.
| | - Orly Weinreb
- GlaucoPharm Ltd, P.O.Box 620, New Industrial Park, 20692, Yokneam, Israel.
| | - Bilha Fischer
- Department of Chemistry, Gonda-Goldschmied Medical Research Center, Bar-Ilan University, 52900, Ramat Gan, Israel.
| |
Collapse
|
13
|
Heikamp K, Zuccotto F, Kiczun M, Ray P, Gilbert IH. Exhaustive sampling of the fragment space associated to a molecule leading to the generation of conserved fragments. Chem Biol Drug Des 2018; 91:655-667. [PMID: 29063731 PMCID: PMC5836963 DOI: 10.1111/cbdd.13129] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Revised: 10/09/2017] [Accepted: 10/14/2017] [Indexed: 11/28/2022]
Abstract
The first step in hit optimization is the identification of the pharmacophore, which is normally achieved by deconstruction of the hit molecule to generate "deletion analogues." In silico fragmentation approaches often focus on the generation of small fragments that do not describe properly the fragment space associated to the deletion analogues. We present significant modifications to the molecular fragmentation programme molBLOCKS, which allows the exhaustive sampling of the fragment space associated with a molecule to generate all possible molecular fragments. This generates larger fragments, by combining the smallest fragments. Additionally, it has been modified to deal with the problem of changing pharmacophoric properties through fragmentation, by highlighting bond cuts. The modified molBLOCKS programme was used on a set of drug compounds, where it generated more unique fragments than standard fragmentation approaches by increasing the number of fragments derived per compound. This fragment set was found to be more diverse than those generated by standard fragmentation programmes and was relevant to drug discovery as it contains the key fragments representing the pharmacophoric elements associated with ligand recognition. The use of dummy atoms to highlight bond cuts further increases the information content of fragments by visualizing their previous bonding pattern.
Collapse
Affiliation(s)
- Kathrin Heikamp
- Drug Discovery UnitDivision of Biological Chemistry and Drug DiscoverySchool of Life SciencesUniversity of DundeeDundeeScotland, UK
| | - Fabio Zuccotto
- Drug Discovery UnitDivision of Biological Chemistry and Drug DiscoverySchool of Life SciencesUniversity of DundeeDundeeScotland, UK
| | - Michael Kiczun
- Drug Discovery UnitDivision of Biological Chemistry and Drug DiscoverySchool of Life SciencesUniversity of DundeeDundeeScotland, UK
| | - Peter Ray
- Drug Discovery UnitDivision of Biological Chemistry and Drug DiscoverySchool of Life SciencesUniversity of DundeeDundeeScotland, UK
| | - Ian H. Gilbert
- Drug Discovery UnitDivision of Biological Chemistry and Drug DiscoverySchool of Life SciencesUniversity of DundeeDundeeScotland, UK
| |
Collapse
|
14
|
Ekins S, Clark AM, Dole K, Gregory K, Mcnutt AM, Spektor AC, Weatherall C, Litterman NK, Bunin BA. Data Mining and Computational Modeling of High-Throughput Screening Datasets. Methods Mol Biol 2018; 1755:197-221. [PMID: 29671272 DOI: 10.1007/978-1-4939-7724-6_14] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
We are now seeing the benefit of investments made over the last decade in high-throughput screening (HTS) that is resulting in large structure activity datasets entering public and open databases such as ChEMBL and PubChem. The growth of academic HTS screening centers and the increasing move to academia for early stage drug discovery suggests a great need for the informatics tools and methods to mine such data and learn from it. Collaborative Drug Discovery, Inc. (CDD) has developed a number of tools for storing, mining, securely and selectively sharing, as well as learning from such HTS data. We present a new web based data mining and visualization module directly within the CDD Vault platform for high-throughput drug discovery data that makes use of a novel technology stack following modern reactive design principles. We also describe CDD Models within the CDD Vault platform that enables researchers to share models, share predictions from models, and create models from distributed, heterogeneous data. Our system is built on top of the Collaborative Drug Discovery Vault Activity and Registration data repository ecosystem which allows users to manipulate and visualize thousands of molecules in real time. This can be performed in any browser on any platform. In this chapter we present examples of its use with public datasets in CDD Vault. Such approaches can complement other cheminformatics tools, whether open source or commercial, in providing approaches for data mining and modeling of HTS data.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.
| | - Alex M Clark
- Collaborative Drug Discovery, Inc., Burlingame, CA, USA
- Molecular Materials Informatics, Inc., Montreal, QC, Canada
| | - Krishna Dole
- Collaborative Drug Discovery, Inc., Burlingame, CA, USA
| | | | | | | | | | | | - Barry A Bunin
- Collaborative Drug Discovery, Inc., Burlingame, CA, USA
| |
Collapse
|
15
|
Yan B, Wang P, Wang J, Boheler KR. Discovery of Surface Target Proteins Linking Drugs, Molecular Markers, Gene Regulation, Protein Networks, and Disease by Using a Web-Based Platform Targets-search. Methods Mol Biol 2018; 1722:331-344. [PMID: 29264813 DOI: 10.1007/978-1-4939-7553-2_19] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Integration and analysis of high content omics data have been critical to the investigation of molecule interactions (e.g., DNA-protein, protein-protein, chemical-protein) in biological systems. Human proteomic strategies that provide enriched information on cell surface proteins can be utilized for repurposing of drug targets and discovery of disease biomarkers. Although several published resources have proved useful to the analysis of these interactions, our newly developed web-based platform Targets-search has the capability of integrating multiple types of omics data to unravel their association with diverse molecule interactions and disease. Here, we describe how to use Targets-search, for the integrated and systemic exploitation of surface proteins to identify potential drug targets, which can further be used to analyze gene regulation, protein networks, and possible biomarkers for diseases and cancers. To illustrate this process, we have taken data from Ewing's sarcoma to identify surface proteins differentially expressed in Ewing's sarcoma cells. These surface proteins were then analyzed to determine which ones were known drug targets. The information suggested putative targets for drug repurposing and subsequent analyses illustrated their regulation by the transcription factor EWSR1.
Collapse
Affiliation(s)
- Bin Yan
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, SAR, China.,Centre of Genomics Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Panwen Wang
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Scottsdale, AZ, USA.,Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, USA
| | - Junwen Wang
- Centre of Genomics Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China. .,Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Scottsdale, AZ, USA. .,Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, USA.
| | - Kenneth R Boheler
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, SAR, China. .,Stem Cell & Regenerative Medicine Consortium and School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, SAR, China.
| |
Collapse
|
16
|
González-Medina M, Méndez-Lucio O, Medina-Franco JL. Activity Landscape Plotter: A Web-Based Application for the Analysis of Structure-Activity Relationships. J Chem Inf Model 2017; 57:397-402. [PMID: 28234475 DOI: 10.1021/acs.jcim.6b00776] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Activity landscape modeling is a powerful method for the quantitative analysis of structure-activity relationships. This cheminformatics area is in continuous growth, and several quantitative and visual approaches are constantly being developed. However, these approaches often fall into disuse due to their limited access. Herein, we present Activity Landscape Plotter as the first freely available web-based tool to automatically analyze structure-activity relationships of compound data sets. Based on the concept of activity landscape modeling, the online service performs pairwise structure and activity relationships from an input data set supplied by the user. For visual analysis, Activity Landscape Plotter generates Structure-Activity Similarity and Dual-Activity Difference maps. The user can interactively navigate through the maps and export all the pairwise structure-activity information as comma delimited files. Activity Landscape Plotter is freely accessible at https://unam-shiny-difacquim.shinyapps.io/ActLSmaps /.
Collapse
Affiliation(s)
- Mariana González-Medina
- School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México , Avenida Universidad 3000, Mexico City 04510, Mexico
| | - Oscar Méndez-Lucio
- School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México , Avenida Universidad 3000, Mexico City 04510, Mexico
| | - José L Medina-Franco
- School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México , Avenida Universidad 3000, Mexico City 04510, Mexico
| |
Collapse
|
17
|
Practical applications of matched series analysis: SAR transfer, binding mode suggestion and data point validation. Future Med Chem 2017; 9:153-168. [DOI: 10.4155/fmc-2016-0203] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Aim: The assumption in scaffold hopping is that changing the scaffold does not change the binding mode and the same structure–activity relationships (SARs) are seen for substituents decorating each scaffold. Results/methodology: We present the use of matched series analysis, an extension of matched molecular pair analysis, to automate the analysis of a project's data and detect the presence or absence of comparable SAR between chemical series. Conclusion: The presence of SAR transfer can confirm the perceived binding mode overlay of different chemotypes or suggest new arrangements between scaffolds that may have gone unnoticed. The absence of series correlation can highlight the presence of inconsistent data points where assay values should be reconfirmed, or provide challenge to any project dogma.
Collapse
|
18
|
García-Jacas CR, Martinez-Mayorga K, Marrero-Ponce Y, Medina-Franco JL. Conformation-dependent QSAR approach for the prediction of inhibitory activity of bromodomain modulators. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2017; 28:41-58. [PMID: 28161994 DOI: 10.1080/1062936x.2017.1278616] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Accepted: 12/22/2016] [Indexed: 06/06/2023]
Abstract
Epigenetic drug discovery is a promising research field with growing interest in the scientific community, as evidenced by the number of publications and the large amount of structure-epigenetic activity information currently available in the public domain. Computational methods are valuable tools to analyse and understand the activity of large compound collections from their structural information. In this manuscript, QSAR models to predict the inhibitory activity of a diverse and heterogeneous set of 88 organic molecules against the bromodomains BRD2, BRD3 and BRD4 are presented. A conformation-dependent representation of the chemical structures was established using the RDKit software and a training and test set division was performed. Several two-linear and three-linear QuBiLS-MIDAS molecular descriptors ( www.tomocomd.com ) were computed to extract the geometric structural features of the compounds studied. QuBiLS-MIDAS-based features sets, to be used in the modelling, were selected using dimensionality reduction strategies. The multiple linear regression procedure coupled with a genetic algorithm were employed to build the predictive models. Regression models containing between 6 to 9 variables were developed and assessed according to several internal and external validation methods. Analyses of outlier compounds and the applicability domain for each model were performed. As a result, the models against BRD2 and BRD3 with 8 variables and the model with 9 variables against BRD4 were those with the best overall performance according to the criteria accounted for. The results obtained suggest that the models proposed will be a good tool for studying the inhibitory activities of drug candidates against the bromodomains considered during epigenetic drug discovery.
Collapse
Affiliation(s)
- C R García-Jacas
- a Instituto de Química, Universidad Nacional Autónoma de México (UNAM) , Ciudad de México , México
- b Escuela de Sistemas y Computación , Pontificia Universidad Católica del Ecuador Sede Esmeraldas (PUCESE) , Esmeraldas , Ecuador
- c Grupo de Investigación de Bioinformática , Universidad de las Ciencias Informáticas (UCI) , La Habana , Cuba
| | - K Martinez-Mayorga
- a Instituto de Química, Universidad Nacional Autónoma de México (UNAM) , Ciudad de México , México
| | - Y Marrero-Ponce
- d Grupo de Medicina Molecular y Traslacional (MeM&T) , Universidad San Francisco de Quito (USFQ) , Quito , Ecuador
- e Grupo de Investigación Ambiental (GIA) , Fundación Universitaria Tecnológica de Comfenalco , Bolívar , Colombia
| | - J L Medina-Franco
- f Departamento de Farmacia , Universidad Nacional Autónoma de México (UNAM) , Ciudad de México , México
| |
Collapse
|
19
|
García-Sánchez MO, Cruz-Monteagudo M, Medina-Franco JL. Quantitative Structure-Epigenetic Activity Relationships. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2017. [DOI: 10.1007/978-3-319-56850-8_8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
20
|
Alvarsson J, Lampa S, Schaal W, Andersson C, Wikberg JES, Spjuth O. Large-scale ligand-based predictive modelling using support vector machines. J Cheminform 2016; 8:39. [PMID: 27516811 PMCID: PMC4980776 DOI: 10.1186/s13321-016-0151-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Accepted: 07/12/2016] [Indexed: 12/25/2022] Open
Abstract
The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse.
Collapse
Affiliation(s)
- Jonathan Alvarsson
- Department of Pharmaceutical Biosciences, Uppsala University, 751 24 Uppsala, Sweden
| | - Samuel Lampa
- Department of Pharmaceutical Biosciences, Uppsala University, 751 24 Uppsala, Sweden
| | - Wesley Schaal
- Department of Pharmaceutical Biosciences, Uppsala University, 751 24 Uppsala, Sweden ; Science for Life Laboratory, Uppsala University, 751 24 Uppsala, Sweden
| | - Claes Andersson
- Department of Medical Sciences, Uppsala University, 751 85 Uppsala, Sweden
| | - Jarl E S Wikberg
- Department of Pharmaceutical Biosciences, Uppsala University, 751 24 Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, 751 24 Uppsala, Sweden ; Science for Life Laboratory, Uppsala University, 751 24 Uppsala, Sweden
| |
Collapse
|
21
|
Andrews DM, Broad LM, Edwards PJ, Fox DNA, Gallagher T, Garland SL, Kidd R, Sweeney JB. The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot. Chem Sci 2016; 7:3869-3878. [PMID: 30155031 PMCID: PMC6013800 DOI: 10.1039/c6sc00264a] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Accepted: 02/22/2016] [Indexed: 11/21/2022] Open
Abstract
We present a summary of the National Compound Collection (NCC) pilot; which harvested chemical structure data from 746 publicly-available PhD theses to create an enhanced database of diverse and interesting (largely organic) molecular entities. The database comprised ∼75 000 structure entries, of which 70% were new to ChemSpider at the time of upload. The dataset was evaluated for structural uniqueness by twelve external drug discovery groups from the pharmaceutical, biotech, academic and not-for-profit sectors. These partners generated data reported here comparing the NCC pilot with their in-house compound collections. The proportion of NCC structures considered to be useful for drug discovery ranged from 5-80% depending on the strictness of the filters used; most interestingly from a drug discovery standpoint ∼13k NCC compounds (18% of the NCC) passed the filters and were of good diversity. These compounds are quite different from those that are already present in the screening collections but not so different that they are no longer considered to be drug-like. In general, the drug discovery teams would consider these compounds to be high value molecules for inclusion in their screening collections. This pilot addressed the potential value of unpublished data and explored the practicalities of large-scale data extraction, to inform both retrospective and prospective extraction of chemical data from theses.
Collapse
Affiliation(s)
- David M Andrews
- Royal Society of Chemistry , Thomas Graham House, Science Park, Milton Road , Cambridge , CB4 0WF , UK .
| | - Laura M Broad
- School of Chemistry , University of Bristol , Bristol , BS8 1TS , UK
| | - Paul J Edwards
- Scicate Limited , Mendip Court , Bath Road , Wells , Somerset BA5 3DG , UK
| | - David N A Fox
- Royal Society of Chemistry , Thomas Graham House, Science Park, Milton Road , Cambridge , CB4 0WF , UK .
| | - Timothy Gallagher
- School of Chemistry , University of Bristol , Bristol , BS8 1TS , UK
| | - Stephen L Garland
- NQuiX Ltd , Causeway House, Dane Street , Bishops Stortford , Hertfordshire CM23 3BT , UK
| | - Richard Kidd
- Royal Society of Chemistry , Thomas Graham House, Science Park, Milton Road , Cambridge , CB4 0WF , UK .
| | - Joseph B Sweeney
- Department of Chemical Sciences , University of Huddersfield , Huddersfield HD1 3DH , UK
| |
Collapse
|
22
|
Zhang YQ, Mao X, Guo QY, Lin N, Li S. Network Pharmacology-based Approaches Capture Essence of Chinese Herbal Medicines. CHINESE HERBAL MEDICINES 2016. [DOI: 10.1016/s1674-6384(16)60018-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
|
23
|
Perryman AL, Stratton TP, Ekins S, Freundlich JS. Predicting Mouse Liver Microsomal Stability with "Pruned" Machine Learning Models and Public Data. Pharm Res 2016; 33:433-49. [PMID: 26415647 PMCID: PMC4712113 DOI: 10.1007/s11095-015-1800-5] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 09/22/2015] [Indexed: 02/07/2023]
Abstract
PURPOSE Mouse efficacy studies are a critical hurdle to advance translational research of potential therapeutic compounds for many diseases. Although mouse liver microsomal (MLM) stability studies are not a perfect surrogate for in vivo studies of metabolic clearance, they are the initial model system used to assess metabolic stability. Consequently, we explored the development of machine learning models that can enhance the probability of identifying compounds possessing MLM stability. METHODS Published assays on MLM half-life values were identified in PubChem, reformatted, and curated to create a training set with 894 unique small molecules. These data were used to construct machine learning models assessed with internal cross-validation, external tests with a published set of antitubercular compounds, and independent validation with an additional diverse set of 571 compounds (PubChem data on percent metabolism). RESULTS "Pruning" out the moderately unstable / moderately stable compounds from the training set produced models with superior predictive power. Bayesian models displayed the best predictive power for identifying compounds with a half-life ≥1 h. CONCLUSIONS Our results suggest the pruning strategy may be of general benefit to improve test set enrichment and provide machine learning models with enhanced predictive value for the MLM stability of small organic molecules. This study represents the most exhaustive study to date of using machine learning approaches with MLM data from public sources.
Collapse
Affiliation(s)
- Alexander L Perryman
- Division of Infectious Disease, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University-New Jersey Medical School, Newark, New Jersey, 07103, USA
| | - Thomas P Stratton
- Department of Pharmacology & Physiology, Rutgers University-New Jersey Medical School, Medical Sciences Building, I-503, 185 South Orange Ave., Newark, New Jersey, 07103, USA
| | - Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC, 27526, USA
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA, 94010, USA
| | - Joel S Freundlich
- Division of Infectious Disease, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University-New Jersey Medical School, Newark, New Jersey, 07103, USA.
- Department of Pharmacology & Physiology, Rutgers University-New Jersey Medical School, Medical Sciences Building, I-503, 185 South Orange Ave., Newark, New Jersey, 07103, USA.
| |
Collapse
|
24
|
Clark AM, Dole K, Ekins S. Open Source Bayesian Models. 3. Composite Models for Prediction of Binned Responses. J Chem Inf Model 2016; 56:275-85. [PMID: 26750305 PMCID: PMC4764945 DOI: 10.1021/acs.jcim.5b00555] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
![]()
Bayesian models constructed from
structure-derived fingerprints
have been a popular and useful method for drug discovery research
when applied to bioactivity measurements that can be effectively classified
as active or inactive. The results can be used to rank candidate structures
according to their probability of activity, and this ranking benefits
from the high degree of interpretability when structure-based fingerprints
are used, making the results chemically intuitive. Besides selecting
an activity threshold, building a Bayesian model is fast and requires
few or no parameters or user intervention. The method also does not
suffer from such acute overtraining problems as quantitative structure–activity
relationships or quantitative structure–property relationships
(QSAR/QSPR). This makes it an approach highly suitable for automated
workflows that are independent of user expertise or prior knowledge
of the training data. We now describe a new method for creating a
composite group of Bayesian models to extend the method to work with
multiple states, rather than just binary. Incoming activities are
divided into bins, each covering a mutually exclusive range of activities.
For each of these bins, a Bayesian model is created to model whether
or not the compound belongs in the bin. Analyzing putative molecules
using the composite model involves making a prediction for each bin
and examining the relative likelihood for each assignment, for example,
highest value wins. The method has been evaluated on a collection
of hundreds of data sets extracted from ChEMBL v20 and validated data
sets for ADME/Tox and bioactivity.
Collapse
Affiliation(s)
- Alex M Clark
- Molecular Materials Informatics, Inc. , 1900 St. Jacques #302, Montreal H3J 2S1, Quebec, Canada
| | - Krishna Dole
- Collaborative Drug Discovery, Inc. , 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| | - Sean Ekins
- Collaborative Drug Discovery, Inc. , 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States.,Collaborations in Chemistry , 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| |
Collapse
|
25
|
Prieto-Martínez FD, Gortari EFD, Méndez-Lucio O, Medina-Franco JL. A chemical space odyssey of inhibitors of histone deacetylases and bromodomains. RSC Adv 2016. [DOI: 10.1039/c6ra07224k] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The interest in epigenetic drug and probe discovery is growing as reflected in the large amount of structure-epigenetic activity information available.
Collapse
Affiliation(s)
| | - Eli Fernández-de Gortari
- Facultad de Química
- Departamento de Farmacia
- Universidad Nacional Autónoma de México
- Mexico City 04510
- Mexico
| | - Oscar Méndez-Lucio
- Facultad de Química
- Departamento de Farmacia
- Universidad Nacional Autónoma de México
- Mexico City 04510
- Mexico
| | - José L. Medina-Franco
- Facultad de Química
- Departamento de Farmacia
- Universidad Nacional Autónoma de México
- Mexico City 04510
- Mexico
| |
Collapse
|
26
|
Andrews DM, Degorce SL, Drake DJ, Gustafsson M, Higgins KM, Winter JJ. Compound Passport Service: supporting corporate collection owners in open innovation. Drug Discov Today 2015; 20:1250-5. [PMID: 26136162 DOI: 10.1016/j.drudis.2015.06.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Revised: 05/28/2015] [Accepted: 06/22/2015] [Indexed: 10/23/2022]
|
27
|
Ekins S, Litterman NK, Lipinski CA, Bunin BA. Thermodynamic Proxies to Compensate for Biases in Drug Discovery Methods. Pharm Res 2015; 33:194-205. [DOI: 10.1007/s11095-015-1779-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2015] [Accepted: 08/13/2015] [Indexed: 11/24/2022]
|
28
|
Schirle M, Jenkins JL. Identifying compound efficacy targets in phenotypic drug discovery. Drug Discov Today 2015; 21:82-89. [PMID: 26272035 DOI: 10.1016/j.drudis.2015.08.001] [Citation(s) in RCA: 102] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2015] [Revised: 07/10/2015] [Accepted: 08/03/2015] [Indexed: 12/30/2022]
Abstract
The identification of the efficacy target(s) for hits from phenotypic compound screens remains a key step to progress compounds into drug development. In addition to efficacy targets, the characterization of epistatic proteins influencing compound activity often facilitates the elucidation of the underlying mechanism of action; and, further, early determination of off-targets that cause potentially unwanted secondary phenotypes helps in assessing potential liabilities. This short review discusses the most important technologies currently available for characterizing the direct and indirect target space of bioactive compounds following phenotypic screening. We present a comprehensive strategy employing complementary approaches to balance individual technology strengths and weaknesses.
Collapse
Affiliation(s)
- Markus Schirle
- Developmental & Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA 02139, USA.
| | - Jeremy L Jenkins
- Developmental & Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA 02139, USA.
| |
Collapse
|
29
|
Abstract
The emergence of a number of publicly available bioactivity databases, such as ChEMBL, PubChem BioAssay and BindingDB, has raised awareness about the topics of data curation, quality and integrity. Here we provide an overview and discussion of the current and future approaches to activity, assay and target data curation of the ChEMBL database. This curation process involves several manual and automated steps and aims to: (1) maximise data accessibility and comparability; (2) improve data integrity and flag outliers, ambiguities and potential errors; and (3) add further curated annotations and mappings thus increasing the usefulness and accuracy of the ChEMBL data for all users and modellers in particular. Issues related to activity, assay and target data curation and integrity along with their potential impact for users of the data are discussed, alongside robust selection and filter strategies in order to avoid or minimise these, depending on the desired application.
Collapse
|
30
|
Ekins S, Clark AM, Wright SH. Making Transporter Models for Drug-Drug Interaction Prediction Mobile. Drug Metab Dispos 2015. [PMID: 26199424 DOI: 10.1124/dmd.115.064956] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The past decade has seen increased numbers of studies publishing ligand-based computational models for drug transporters. Although they generally use small experimental data sets, these models can provide insights into structure-activity relationships for the transporter. In addition, such models have helped to identify new compounds as substrates or inhibitors of transporters of interest. We recently proposed that many transporters are promiscuous and may require profiling of new chemical entities against multiple substrates for a specific transporter. Furthermore, it should be noted that virtually all of the published ligand-based transporter models are only accessible to those involved in creating them and, consequently, are rarely shared effectively. One way to surmount this is to make models shareable or more accessible. The development of mobile apps that can access such models is highlighted here. These apps can be used to predict ligand interactions with transporters using Bayesian algorithms. We used recently published transporter data sets (MATE1, MATE2K, OCT2, OCTN2, ASBT, and NTCP) to build preliminary models in a commercial tool and in open software that can deliver the model in a mobile app. In addition, several transporter data sets extracted from the ChEMBL database were used to illustrate how such public data and models can be shared. Predicting drug-drug interactions for various transporters using computational models is potentially within reach of anyone with an iPhone or iPad. Such tools could help prioritize which substrates should be used for in vivo drug-drug interaction testing and enable open sharing of models.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations Pharmaceuticals, Inc., and Collaborations in Chemistry, Fuquay-Varina, North Carolina (S.E.); Collaborative Drug Discovery, Burlingame, California (S.E.); Molecular Materials Informatics, Inc., Montreal, Quebec, Canada (A.M.C.); and Department of Physiology, University of Arizona, Tucson, Arizona (S.H.W.)
| | - Alex M Clark
- Collaborations Pharmaceuticals, Inc., and Collaborations in Chemistry, Fuquay-Varina, North Carolina (S.E.); Collaborative Drug Discovery, Burlingame, California (S.E.); Molecular Materials Informatics, Inc., Montreal, Quebec, Canada (A.M.C.); and Department of Physiology, University of Arizona, Tucson, Arizona (S.H.W.)
| | - Stephen H Wright
- Collaborations Pharmaceuticals, Inc., and Collaborations in Chemistry, Fuquay-Varina, North Carolina (S.E.); Collaborative Drug Discovery, Burlingame, California (S.E.); Molecular Materials Informatics, Inc., Montreal, Quebec, Canada (A.M.C.); and Department of Physiology, University of Arizona, Tucson, Arizona (S.H.W.)
| |
Collapse
|
31
|
Tarasova OA, Urusova AF, Filimonov DA, Nicklaus MC, Zakharov AV, Poroikov VV. QSAR Modeling Using Large-Scale Databases: Case Study for HIV-1 Reverse Transcriptase Inhibitors. J Chem Inf Model 2015; 55:1388-99. [PMID: 26046311 DOI: 10.1021/acs.jcim.5b00019] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Large-scale databases are important sources of training sets for various QSAR modeling approaches. Generally, these databases contain information extracted from different sources. This variety of sources can produce inconsistency in the data, defined as sometimes widely diverging activity results for the same compound against the same target. Because such inconsistency can reduce the accuracy of predictive models built from these data, we are addressing the question of how best to use data from publicly and commercially accessible databases to create accurate and predictive QSAR models. We investigate the suitability of commercially and publicly available databases to QSAR modeling of antiviral activity (HIV-1 reverse transcriptase (RT) inhibition). We present several methods for the creation of modeling (i.e., training and test) sets from two, either commercially or freely available, databases: Thomson Reuters Integrity and ChEMBL. We found that the typical predictivities of QSAR models obtained using these different modeling set compilation methods differ significantly from each other. The best results were obtained using training sets compiled for compounds tested using only one method and material (i.e., a specific type of biological assay). Compound sets aggregated by target only typically yielded poorly predictive models. We discuss the possibility of "mix-and-matching" assay data across aggregating databases such as ChEMBL and Integrity and their current severe limitations for this purpose. One of them is the general lack of complete and semantic/computer-parsable descriptions of assay methodology carried by these databases that would allow one to determine mix-and-matchability of result sets at the assay level.
Collapse
Affiliation(s)
- Olga A Tarasova
- †Institute of Biochemical Chemistry, 10-8, Pogodinskaya St., 119121, Moscow, Russia
| | - Aleksandra F Urusova
- †Institute of Biochemical Chemistry, 10-8, Pogodinskaya St., 119121, Moscow, Russia
| | - Dmitry A Filimonov
- †Institute of Biochemical Chemistry, 10-8, Pogodinskaya St., 119121, Moscow, Russia
| | - Marc C Nicklaus
- ‡CADD Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, NCI-Frederick, 376 Boyles St., Frederick, Maryland 21702, United States
| | - Alexey V Zakharov
- ‡CADD Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, NCI-Frederick, 376 Boyles St., Frederick, Maryland 21702, United States
| | - Vladimir V Poroikov
- †Institute of Biochemical Chemistry, 10-8, Pogodinskaya St., 119121, Moscow, Russia
| |
Collapse
|
32
|
Clark AM, Dole K, Coulon-Spektor A, McNutt A, Grass G, Freundlich JS, Reynolds RC, Ekins S. Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets. J Chem Inf Model 2015; 55:1231-45. [PMID: 25994950 PMCID: PMC4478615 DOI: 10.1021/acs.jcim.5b00143] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
![]()
On the order of hundreds of absorption,
distribution, metabolism,
excretion, and toxicity (ADME/Tox) models have been described in the
literature in the past decade which are more often than not inaccessible
to anyone but their authors. Public accessibility is also an issue
with computational models for bioactivity, and the ability to share
such models still remains a major challenge limiting drug discovery.
We describe the creation of a reference implementation of a Bayesian
model-building software module, which we have released as an open
source component that is now included in the Chemistry Development
Kit (CDK) project, as well as implemented in the CDD Vault and
in several mobile apps. We use this implementation to build an array
of Bayesian models for ADME/Tox, in vitro and in vivo bioactivity, and other physicochemical properties.
We show that these models possess cross-validation receiver operator
curve values comparable to those generated previously in prior publications
using alternative tools. We have now described how the implementation
of Bayesian models with FCFP6 descriptors generated in the CDD Vault
enables the rapid production of robust machine learning models from
public data or the user’s own datasets. The current study sets
the stage for generating models in proprietary software (such as CDD)
and exporting these models in a format that could be run in open source
software using CDK components. This work also demonstrates that we
can enable biocomputation across distributed private or public datasets
to enhance drug discovery.
Collapse
Affiliation(s)
- Alex M Clark
- †Molecular Materials Informatics, Inc., 1900 St. Jacques No. 302, Montreal H3J 2S1, Quebec, Canada
| | - Krishna Dole
- ‡Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| | - Anna Coulon-Spektor
- ‡Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| | - Andrew McNutt
- ‡Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| | - George Grass
- §G2 Research, Inc., P.O. Box 1242, Tahoe City, California 96145, United States
| | | | - Robert C Reynolds
- #Department of Chemistry, College of Arts and Sciences, University of Alabama at Birmingham, , 1530 Third Avenue South, Birmingham, Alabama 35294-1240, United States
| | - Sean Ekins
- ‡Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States.,∇Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| |
Collapse
|
33
|
Clark AM, Ekins S. Open Source Bayesian Models. 2. Mining a "Big Dataset" To Create and Validate Models with ChEMBL. J Chem Inf Model 2015; 55:1246-60. [PMID: 25995041 DOI: 10.1021/acs.jcim.5b00144] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
In an associated paper, we have described a reference implementation of Laplacian-corrected naïve Bayesian model building using extended connectivity (ECFP)- and molecular function class fingerprints of maximum diameter 6 (FCFP)-type fingerprints. As a follow-up, we have now undertaken a large-scale validation study in order to ensure that the technique generalizes to a broad variety of drug discovery datasets. To achieve this, we have used the ChEMBL (version 20) database and split it into more than 2000 separate datasets, each of which consists of compounds and measurements with the same target and activity measurement. In order to test these datasets with the two-state Bayesian classification, we developed an automated algorithm for detecting a suitable threshold for active/inactive designation, which we applied to all collections. With these datasets, we were able to establish that our Bayesian model implementation is effective for the large majority of cases, and we were able to quantify the impact of fingerprint folding on the receiver operator curve cross-validation metrics. We were also able to study the impact that the choice of training/testing set partitioning has on the resulting recall rates. The datasets have been made publicly available to be downloaded, along with the corresponding model data files, which can be used in conjunction with the CDK and several mobile apps. We have also explored some novel visualization methods which leverage the structural origins of the ECFP/FCFP fingerprints to attribute regions of a molecule responsible for positive and negative contributions to activity. The ability to score molecules across thousands of relevant datasets across organisms also may help to access desirable and undesirable off-target effects as well as suggest potential targets for compounds derived from phenotypic screens.
Collapse
Affiliation(s)
- Alex M Clark
- †Molecular Materials Informatics, Inc., 1900 St. Jacques No. 302, Montreal H3J 2S1, Quebec, Canada
| | - Sean Ekins
- ‡Collaborations Pharmaceuticals, Inc., 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States.,§Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States.,∥Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| |
Collapse
|
34
|
Chen YC, Totrov M, Abagyan R. Docking to multiple pockets or ligand fields for screening, activity prediction and scaffold hopping. Future Med Chem 2014; 6:1741-55. [PMID: 25407367 PMCID: PMC4285145 DOI: 10.4155/fmc.14.113] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Two recent technological advances dramatically reducing the rate of false-negatives in activity prediction by docking flexible 3D models of compounds include multi-conformational docking (mPockDock) and the docking of candidates to atomic property fields derived by co-crystallized ligands (mApfDock). RESULTS The mApfDock and mPockDock provide the AUC of 90.4 and 83.8%, respectively. The mApfDock gave better performance when compounds required large induced-fit pocket changes unseen in crystallography, whereas the mPockDock is superior when the co-crystallized ligands do not represent sufficient chemical and binding location diversity. CONCLUSION Both approaches proved to be efficient for scaffold hopping; they are complementary when the coverage of the co-crystallized complexes is poor but become convergent when the complexes are diverse enough.
Collapse
Affiliation(s)
- Yu-Chen Chen
- Bioinformatics, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Max Totrov
- Molsoft LLC, 11199 Sorrento Valley Road, S209, San Diego, CA 92121, USA
| | - Ruben Abagyan
- Skaggs School of Pharmacy & Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, MC 0747, La Jolla, CA 92093-0747, USA
| |
Collapse
|