1
|
Doğan T, Akhan Güzelcan E, Baumann M, Koyas A, Atas H, Baxendale IR, Martin M, Cetin-Atalay R. Protein domain-based prediction of drug/compound-target interactions and experimental validation on LIM kinases. PLoS Comput Biol 2021; 17:e1009171. [PMID: 34843456 PMCID: PMC8659301 DOI: 10.1371/journal.pcbi.1009171] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 12/09/2021] [Accepted: 11/09/2021] [Indexed: 12/23/2022] Open
Abstract
Predictive approaches such as virtual screening have been used in drug discovery with the objective of reducing developmental time and costs. Current machine learning and network-based approaches have issues related to generalization, usability, or model interpretability, especially due to the complexity of target proteins' structure/function, and bias in system training datasets. Here, we propose a new method "DRUIDom" (DRUg Interacting Domain prediction) to identify bio-interactions between drug candidate compounds and targets by utilizing the domain modularity of proteins, to overcome problems associated with current approaches. DRUIDom is composed of two methodological steps. First, ligands/compounds are statistically mapped to structural domains of their target proteins, with the aim of identifying their interactions. As such, other proteins containing the same mapped domain or domain pair become new candidate targets for the corresponding compounds. Next, a million-scale dataset of small molecule compounds, including those mapped to domains in the previous step, are clustered based on their molecular similarities, and their domain associations are propagated to other compounds within the same clusters. Experimentally verified bioactivity data points, obtained from public databases, are meticulously filtered to construct datasets of active/interacting and inactive/non-interacting drug/compound-target pairs (~2.9M data points), and used as training data for calculating parameters of compound-domain mappings, which led to 27,032 high-confidence associations between 250 domains and 8,165 compounds, and a finalized output of ~5 million new compound-protein interactions. DRUIDom is experimentally validated by syntheses and bioactivity analyses of compounds predicted to target LIM-kinase proteins, which play critical roles in the regulation of cell motility, cell cycle progression, and differentiation through actin filament dynamics. We showed that LIMK-inhibitor-2 and its derivatives significantly block the cancer cell migration through inhibition of LIMK phosphorylation and the downstream protein cofilin. One of the derivative compounds (LIMKi-2d) was identified as a promising candidate due to its action on resistant Mahlavu liver cancer cells. The results demonstrated that DRUIDom can be exploited to identify drug candidate compounds for intended targets and to predict new target proteins based on the defined compound-domain relationships. Datasets, results, and the source code of DRUIDom are fully-available at: https://github.com/cansyl/DRUIDom.
Collapse
Affiliation(s)
- Tunca Doğan
- Department of Computer Engineering, Hacettepe University, Ankara, Turkey
- Institute of Informatics, Hacettepe University, Ankara, Turkey
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Ece Akhan Güzelcan
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
- Center for Genomics and Rare Diseases & Biobank for Rare Diseases, Hacettepe University, Ankara, Turkey
| | - Marcus Baumann
- School of Chemistry, University College Dublin, Dublin, Ireland
| | - Altay Koyas
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Heval Atas
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Ian R. Baxendale
- Department of Chemistry, University of Durham, Durham, United Kingdom
| | - Maria Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Rengul Cetin-Atalay
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
- Section of Pulmonary and Critical Care Medicine, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
2
|
Kruger FA, Gaulton A, Nowotka M, Overington JP. PPDMs-a resource for mapping small molecule bioactivities from ChEMBL to Pfam-A protein domains. Bioinformatics 2014; 31:776-8. [PMID: 25348214 PMCID: PMC4341065 DOI: 10.1093/bioinformatics/btu711] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Summary: PPDMs is a resource that maps small molecule bioactivities to protein domains from the Pfam-A collection of protein families. Small molecule bioactivities mapped to protein domains add important precision to approaches that use protein sequence searches alignments to assist applications in computational drug discovery and systems and chemical biology. We have previously proposed a mapping heuristic for a subset of bioactivities stored in ChEMBL with the Pfam-A domain most likely to mediate small molecule binding. We have since refined this mapping using a manual procedure. Here, we present a resource that provides up-to-date mappings and the possibility to review assigned mappings as well as to participate in their assignment and curation. We also describe how mappings provided through the PPDMs resource are made accessible through the main schema of the ChEMBL database. Availability and implementation: The PPDMs resource and curation interface is available at https://www.ebi.ac.uk/chembl/research/ppdms/pfam_maps. The source-code for PPDMs is available under the Apache license at https://github.com/chembl/pfam_maps. Source code is available at https://github.com/chembl/pfam_map_loader to demonstrate the integration process with the main schema of ChEMBL. Contact:jpo@ebi.ac.uk
Collapse
Affiliation(s)
- Felix A Kruger
- ChEMBL group, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, UK
| | - Anna Gaulton
- ChEMBL group, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, UK
| | - Michal Nowotka
- ChEMBL group, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, UK
| | | |
Collapse
|
3
|
Analysis of the protein domain and domain architecture content in fungi and its application in the search of new antifungal targets. PLoS Comput Biol 2014; 10:e1003733. [PMID: 25033262 PMCID: PMC4102429 DOI: 10.1371/journal.pcbi.1003733] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Accepted: 06/04/2014] [Indexed: 01/25/2023] Open
Abstract
Over the past several years fungal infections have shown an increasing incidence in the susceptible population, and caused high mortality rates. In parallel, multi-resistant fungi are emerging in human infections. Therefore, the identification of new potential antifungal targets is a priority. The first task of this study was to analyse the protein domain and domain architecture content of the 137 fungal proteomes (corresponding to 111 species) available in UniProtKB (UniProt KnowledgeBase) by January 2013. The resulting list of core and exclusive domain and domain architectures is provided in this paper. It delineates the different levels of fungal taxonomic classification: phylum, subphylum, order, genus and species. The analysis highlighted Aspergillus as the most diverse genus in terms of exclusive domain content. In addition, we also investigated which domains could be considered promiscuous in the different organisms. As an application of this analysis, we explored three different ways to detect potential targets for antifungal drugs. First, we compared the domain and domain architecture content of the human and fungal proteomes, and identified those domains and domain architectures only present in fungi. Secondly, we looked for information regarding fungal pathways in public repositories, where proteins containing promiscuous domains could be involved. Three pathways were identified as a result: lovastatin biosynthesis, xylan degradation and biosynthesis of siroheme. Finally, we classified a subset of the studied fungi in five groups depending on their occurrence in clinical samples. We then looked for exclusive domains in the groups that were more relevant clinically and determined which of them had the potential to bind small molecules. Overall, this study provides a comprehensive analysis of the available fungal proteomes and shows three approaches that can be used as a first step in the detection of new antifungal targets. Some fungi have become pathogenic to plants and in a lesser extent to animals. Under certain conditions their presence in the human body can prove a threat for human health, especially for immunocompromised patients. Yet, some fungi can also infect healthy individuals. The low sensitivity of the antifungal drugs available together with the clinically observed resistance of some fungi raises the demand for new alternative treatments. Proteins are biological molecules which perform essential functions within the living organisms. Many of those functions are attributed to the varying folded structure of each protein. These configurations are composed of functional units -also called domains- each one independently responsible for a fraction of the overall biological function. Understanding how the different block combinations are distributed across members of the same or similar families of organisms is important. For instance, exclusive domain combinations can hold particular acquired functions. Blocks displaying a high mobility can play major roles for the organism's survival. The biological goal of this study was to analyse the functional implications of protein domains and domain combinations in the available fungal proteomes. This information can be used to highlight proteins and pathways that could be potentially used as drug targets.
Collapse
|
4
|
Wang L, Li Z, Shao Q, Li X, Ai N, Zhao X, Fan X. Dissecting active ingredients of Chinese medicine by content-weighted ingredient–target network. ACTA ACUST UNITED AC 2014; 10:1905-11. [DOI: 10.1039/c3mb70581a] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
A novel approach integrating network pharmacology analysis with ingredient content and ingredient–target relationships to identify active ingredients of Chinese medicine.
Collapse
Affiliation(s)
- Linli Wang
- Pharmaceutical Informatics Institute
- College of Pharmaceutical Sciences
- Zhejiang University
- Hangzhou 310058, China
| | - Zheng Li
- State Key Laboratory of Modern Chinese Medicine
- Tianjin University of Traditional Chinese Medicine
- Tianjin 300193, China
| | - Qing Shao
- Pharmaceutical Informatics Institute
- College of Pharmaceutical Sciences
- Zhejiang University
- Hangzhou 310058, China
| | - Xiang Li
- Pharmaceutical Informatics Institute
- College of Pharmaceutical Sciences
- Zhejiang University
- Hangzhou 310058, China
| | - Ni Ai
- Pharmaceutical Informatics Institute
- College of Pharmaceutical Sciences
- Zhejiang University
- Hangzhou 310058, China
| | - Xiaoping Zhao
- College of Preclinical Medicine
- Zhejiang Chinese Medical University
- Hangzhou 310053, China
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute
- College of Pharmaceutical Sciences
- Zhejiang University
- Hangzhou 310058, China
| |
Collapse
|