1
|
Liu C, Xiao K, Yu C, Lei Y, Lyu K, Tian T, Zhao D, Zhou F, Tang H, Zeng J. A probabilistic knowledge graph for target identification. PLoS Comput Biol 2024; 20:e1011945. [PMID: 38578805 PMCID: PMC11034645 DOI: 10.1371/journal.pcbi.1011945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 04/22/2024] [Accepted: 02/24/2024] [Indexed: 04/07/2024] Open
Abstract
Early identification of safe and efficacious disease targets is crucial to alleviating the tremendous cost of drug discovery projects. However, existing experimental methods for identifying new targets are generally labor-intensive and failure-prone. On the other hand, computational approaches, especially machine learning-based frameworks, have shown remarkable application potential in drug discovery. In this work, we propose Progeni, a novel machine learning-based framework for target identification. In addition to fully exploiting the known heterogeneous biological networks from various sources, Progeni integrates literature evidence about the relations between biological entities to construct a probabilistic knowledge graph. Graph neural networks are then employed in Progeni to learn the feature embeddings of biological entities to facilitate the identification of biologically relevant target candidates. A comprehensive evaluation of Progeni demonstrated its superior predictive power over the baseline methods on the target identification task. In addition, our extensive tests showed that Progeni exhibited high robustness to the negative effect of exposure bias, a common phenomenon in recommendation systems, and effectively identified new targets that can be strongly supported by the literature. Moreover, our wet lab experiments successfully validated the biological significance of the top target candidates predicted by Progeni for melanoma and colorectal cancer. All these results suggested that Progeni can identify biologically effective targets and thus provide a powerful and useful tool for advancing the drug discovery process.
Collapse
Affiliation(s)
- Chang Liu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Kaimin Xiao
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, China
- Joint Graduate Program of Peking-Tsinghua-NIBS, School of Life Sciences, Tsinghua University, Beijing, China
| | - Cuinan Yu
- Machine Learning Department, Silexon AI Technology Co., Ltd., Nanjing, Jiangsu Province, China
| | - Yipin Lei
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Kangbo Lyu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Fengfeng Zhou
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, Jilin Province, China
| | - Haidong Tang
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, China
| | - Jianyang Zeng
- School of Engineering, Westlake University, Hangzhou, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China
- Research Center for Industries of the Future and School of Engineering, Westlake University, Hangzhou, Zhejiang Province, China
| |
Collapse
|
2
|
Yang J, Zhang D, Cai Y, Yu K, Li M, Liu L, Chen X. Computational Prediction of Drug Phenotypic Effects Based on Substructure-Phenotype Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:256-265. [PMID: 35239490 DOI: 10.1109/tcbb.2022.3155453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Identifying drug phenotypic effects, including therapeutic effects and adverse drug reactions (ADRs), is an inseparable part for evaluating the potentiality of new drug candidates (NDCs). However, current computational methods for predicting phenotypic effects of NDCs are mainly based on the overall structure of an NDC or a related target. These approaches often lead to inconsistencies between the structures and functions and limit the prediction space of NDCs. In this study, first, we constructed quantitative associations of substructure-domain, domain-ADR, and domain-ATC (Anatomical Therapeutic Chemical Classification System code) through L1LOG and L1SVM machine learning models. These associations represent relationships between phenotypes (ADRs and ATCs) and local structures of drugs and proteins. Then, based on these established associations, substructure-phenotype relationships were constructed which were utilized to quantify drug-phenotype relationships. Thus, this approach could achieve high-throughput and effective evaluations of the druggability of NDCs by referring to the established substructure-phenotype relationships and structural information of NDCs without additional prior knowledge. Using this computational pipeline, 83,205 drug-ATC relationships (including 1,479 drugs and 178 ATCs) and 306,421 drug-ADR relationships (including 1,752 drugs and 454 ADRs) were predicted in total. The prediction results were validated at four levels: five-fold cross validation, public databases, literature, and molecular docking. Furthermore, three case studies demonstrated the feasibility of our method. 79 ATCs and 269 ADRs were predicted to be related to Maraviroc, an approved drug, including the existing antiviral effect in clinical use. Additionally, we also found risk substructures of severe ADRs, for example, SUB215 (>= 1, saturated or only aromatic carbon ring size 7) can result in shock. And we analyzed the mechanism of action (MOA) of interested drugs based on the established drug-substructure-domain-protein associations. In a word, this approach through establishing drug-substructure-phenotype relationships can achieve quantitative prediction of phenotypes for a given NDC or drug without any prior knowledge except its structure information. Using that way, we can directly obtain the relationships between substructure and phenotype of a compound, which is more convenient to analyze the phenotypic mechanism of drugs and accelerate the process of rational drug design.
Collapse
|
3
|
Lang X, Liu J, Zhang G, Feng X, Dan W. Knowledge Mapping of Drug Repositioning's Theme and Development. Drug Des Devel Ther 2023; 17:1157-1174. [PMID: 37096060 PMCID: PMC10122475 DOI: 10.2147/dddt.s405906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 04/11/2023] [Indexed: 04/26/2023] Open
Abstract
Background In recent years, the emergence of new diseases and resistance to known diseases have led to increasing demand for new drugs. By means of bibliometric analysis, this paper studied the relevant articles on drug repositioning in recent years and analyzed the current research foci and trends. Methodology The Web of Science database was searched to collect all relevant literature on drug repositioning from 2001 to 2022. These data were imported into CiteSpace and bibliometric online analysis platforms for bibliometric analysis. The processed data and visualized images predict the development trends in the research field. Results The quality and quantity of articles published after 2011 have improved significantly, with 45 of them cited more than 100 times. Articles posted by journals from different countries have high citation values. Authors from other institutions have also collaborated to analyze drug rediscovery. Keywords found in the literature include molecular docking (N=223), virtual screening (N=170), drug discovery (N=126), machine learning (N=125), and drug-target interaction (N=68); these words represent the core content of drug repositioning. Conclusion The key focus of drug research and development is related to the discovery of new indications for drugs. Researchers are starting to retarget drugs after analyzing online databases and clinical trials. More and more drugs are being targeted at other diseases to treat more patients, based on saving money and time. It is worth noting that researchers need more financial and technical support to complete drug development.
Collapse
Affiliation(s)
- Xiaona Lang
- Pharmacy Department, Tianjin Hospital, Tianjin, People’s Republic of China
| | - Jinlei Liu
- Cardiology Department, Guang ‘anmen Hospital, Chinese Academy of Traditional Chinese Medicine, Beijing, People’s Republic of China
| | - Guangzhong Zhang
- Dermatological Department, Beijing Hospital of Traditional Chinese Medicine, Capital Medical University, Beijing, People’s Republic of China
| | - Xin Feng
- Pharmacy Department, Tianjin Hospital, Tianjin, People’s Republic of China
| | - Wenchao Dan
- Dermatological Department, Beijing Hospital of Traditional Chinese Medicine, Capital Medical University, Beijing, People’s Republic of China
- Correspondence: Wenchao Dan, Dermatological Department, Beijing Hospital of Traditional Chinese Medicine, Capital Medical University, Beijing, People’s Republic of China, Tel +86 13652001152, Email
| |
Collapse
|
4
|
Kim HA, Kim JE. Development of Nafamostat Mesylate Immediate-Release Tablet by Drug Repositioning Using Quality-by-Design Approach. Pharmaceutics 2022; 14:1219. [PMID: 35745792 PMCID: PMC9228348 DOI: 10.3390/pharmaceutics14061219] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 05/27/2022] [Accepted: 06/06/2022] [Indexed: 12/01/2022] Open
Abstract
We aimed to develop nafamostat mesylate immediate-release tablets for the treatment of COVID-19 through drug repositioning studies of nafamostat mesylate injection. Nafamostat mesylate is a serine protease inhibitor known to inhibit the activity of the transmembrane protease, serine 2 enzyme that affects the penetration of the COVID-19 virus, thereby preventing the binding of the angiotensin-converting enzyme 2 receptor in vivo and the spike protein of the COVID-19 virus. The formulation was selected through a stability study after manufacturing by a wet granulation process and a direct tableting process to develop a stable nafamostat mesylate immediate-release tablet. Formulation issues for the selected processes were addressed using the design of experiments and quality-by-design approaches. The dissolution rate of the developed tablet was confirmed to be >90% within 30 min in the four major dissolutions, except in the pH 6.8 dissolution medium. Additionally, an in vivo pharmacokinetic study was performed in monkeys, and the pharmacokinetic profiles of nafamostat injections, oral solutions, and tablets were compared. The half-life during oral administration was confirmed to be significantly longer than the reported literature value of 8 min, and the bioavailability of the tablet was approximately 25% higher than that of the oral solution.
Collapse
Affiliation(s)
| | - Joo-Eun Kim
- Department of Pharmaceutical Engineering, Catholic University of Daegu, Hayang-Ro 13-13, Gyeongsan City 38430, Korea;
| |
Collapse
|
5
|
Yu L, Su Y, Liu Y, Zeng X. Review of unsupervised pretraining strategies for molecules representation. Brief Funct Genomics 2021; 20:323-332. [PMID: 34342611 DOI: 10.1093/bfgp/elab036] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 07/07/2021] [Accepted: 07/08/2021] [Indexed: 11/14/2022] Open
Abstract
In recent years, the computer-assisted techniques make a great progress in the field of drug discovery. And, yet, the problem of limited labeled data problem is still challenging and also restricts the performance of these techniques in specific tasks, such as molecular property prediction, compound-protein interaction and de novo molecular generation. One effective solution is to utilize the experience and knowledge gained from other tasks to cope with related pursuits. Unsupervised pretraining is promising, due to its capability of leveraging a vast number of unlabeled molecules and acquiring a more informative molecular representation for the downstream tasks. In particular, models trained on large-scale unlabeled molecules can capture generalizable features, and this ability can be employed to improve the performance of specific downstream tasks. Many relevant pretraining works have been recently proposed. Here, we provide an overview of molecular unsupervised pretraining and related applications in drug discovery. Challenges and possible solutions are also summarized.
Collapse
|
6
|
An Alternative Pipeline for Glioblastoma Therapeutics: A Systematic Review of Drug Repurposing in Glioblastoma. Cancers (Basel) 2021; 13:cancers13081953. [PMID: 33919596 PMCID: PMC8073966 DOI: 10.3390/cancers13081953] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 04/13/2021] [Accepted: 04/16/2021] [Indexed: 12/12/2022] Open
Abstract
Simple Summary Glioblastoma is a devastating malignancy that has continued to prove resistant to a variety of therapeutics. No new systemic therapy has been approved for use against glioblastoma in almost two decades. This observation is particularly disturbing given the amount of money invested in identifying novel therapies for this disease. A relatively rapid and economical pipeline for identification of novel agents is drug repurposing. Here, a comprehensive review detailing the state of drug repurposing in glioblastoma is provided. We reveal details on studies that have examined agents in vitro, in animal models and in patients. While most agents have not progressed beyond the initial stages, several drugs, from a variety of classes, have demonstrated promising results in early phase clinical trials. Abstract The treatment of glioblastoma (GBM) remains a significant challenge, with outcome for most pa-tients remaining poor. Although novel therapies have been developed, several obstacles restrict the incentive of drug developers to continue these efforts including the exorbitant cost, high failure rate and relatively small patient population. Repositioning drugs that have well-characterized mechanistic and safety profiles is an attractive alternative for drug development in GBM. In ad-dition, the relative ease with which repurposed agents can be transitioned to the clinic further supports their potential for examination in patients. Here, a systematic analysis of the literature and clinical trials provides a comprehensive review of primary articles and unpublished trials that use repurposed drugs for the treatment of GBM. The findings demonstrate that numerous drug classes that have a range of initial indications have efficacy against preclinical GBM models and that certain agents have shown significant potential for clinical benefit. With examination in randomized, placebo-controlled trials and the targeting of particular GBM subgroups, it is pos-sible that repurposing can be a cost-effective approach to identify agents for use in multimodal anti-GBM strategies.
Collapse
|
7
|
Chen H, Zhang Z, Zhang J. In silico drug repositioning based on the integration of chemical, genomic and pharmacological spaces. BMC Bioinformatics 2021; 22:52. [PMID: 33557749 PMCID: PMC7868667 DOI: 10.1186/s12859-021-03988-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 01/25/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Drug repositioning refers to the identification of new indications for existing drugs. Drug-based inference methods for drug repositioning apply some unique features of drugs for new indication prediction. Complementary information is provided by these different features. It is therefore necessary to integrate these features for more accurate in silico drug repositioning. RESULTS In this study, we collect 3 different types of drug features (i.e., chemical, genomic and pharmacological spaces) from public databases. Similarities between drugs are separately calculated based on each of the features. We further develop a fusion method to combine the 3 similarity measurements. We test the inference abilities of the 4 similarity datasets in drug repositioning under the guilt-by-association principle. Leave-one-out cross-validations show the integrated similarity measurement IntegratedSim receives the best prediction performance, with the highest AUC value of 0.8451 and the highest AUPR value of 0.2201. Case studies demonstrate IntegratedSim produces the largest numbers of confirmed predictions in most cases. Moreover, we compare our integration method with 3 other similarity-fusion methods using the datasets in our study. Cross-validation results suggest our method improves the prediction accuracy in terms of AUC and AUPR values. CONCLUSIONS Our study suggests that the 3 drug features used in our manuscript are valuable information for drug repositioning. The comparative results indicate that integration of the 3 drug features would improve drug-disease association prediction. Our study provides a strategy for the fusion of different drug features for in silico drug repositioning.
Collapse
Affiliation(s)
- Hailin Chen
- School of Software, East China Jiaotong University, Nanchang, 330013 China
| | - Zuping Zhang
- School of Computer Science and Engineering, Central South University, Changsha, 410083 China
| | - Jingpu Zhang
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan, 467000 China
| |
Collapse
|
8
|
Veatch OJ, Butler MG, Elsea SH, Malow BA, Sutcliffe JS, Moore JH. An Automated Functional Annotation Pipeline That Rapidly Prioritizes Clinically Relevant Genes for Autism Spectrum Disorder. Int J Mol Sci 2020; 21:ijms21239029. [PMID: 33261099 PMCID: PMC7734579 DOI: 10.3390/ijms21239029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 11/24/2020] [Accepted: 11/25/2020] [Indexed: 11/16/2022] Open
Abstract
Human genetic studies have implicated more than a hundred genes in Autism Spectrum Disorder (ASD). Understanding how variation in implicated genes influence expression of co-occurring conditions and drug response can inform more effective, personalized approaches for treatment of individuals with ASD. Rapidly translating this information into the clinic requires efficient algorithms to sort through the myriad of genes implicated by rare gene-damaging single nucleotide and copy number variants, and common variation detected in genome-wide association studies (GWAS). To pinpoint genes that are more likely to have clinically relevant variants, we developed a functional annotation pipeline. We defined clinical relevance in this project as any ASD associated gene with evidence indicating a patient may have a complex, co-occurring condition that requires direct intervention (e.g., sleep and gastrointestinal disturbances, attention deficit hyperactivity, anxiety, seizures, depression), or is relevant to drug development and/or approaches to maximizing efficacy and minimizing adverse events (i.e., pharmacogenomics). Starting with a list of all candidate genes implicated in all manifestations of ASD (i.e., idiopathic and syndromic), this pipeline uses databases that represent multiple lines of evidence to identify genes: (1) expressed in the human brain, (2) involved in ASD-relevant biological processes and resulting in analogous phenotypes in mice, (3) whose products are targeted by approved pharmaceutical compounds or possessing pharmacogenetic variation and (4) whose products directly interact with those of genes with variants recommended to be tested for by the American College of Medical Genetics (ACMG). Compared with 1000 gene sets, each with a random selection of human protein coding genes, more genes in the ASD set were annotated for each category evaluated (p ≤ 1.99 × 10−2). Of the 956 ASD-implicated genes in the full set, 18 were flagged based on evidence in all categories. Fewer genes from randomly drawn sets were annotated in all categories (x = 8.02, sd = 2.56, p = 7.75 × 10−4). Notably, none of the prioritized genes are represented among the 59 genes compiled by the ACMG, and 78% had a pathogenic or likely pathogenic variant in ClinVar. Results from this work should rapidly prioritize potentially actionable results from genetic studies and, in turn, inform future work toward clinical decision support for personalized care based on genetic testing.
Collapse
Affiliation(s)
- Olivia J. Veatch
- Department of Psychiatry and Behavioral Sciences, University of Kansas Medical Center, Kansas City, MO 66160, USA;
- Correspondence:
| | - Merlin G. Butler
- Department of Psychiatry and Behavioral Sciences, University of Kansas Medical Center, Kansas City, MO 66160, USA;
| | - Sarah H. Elsea
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA;
| | - Beth A. Malow
- Sleep Disorders Division, Department of Neurology, Vanderbilt University Medical Center, Nashville, TN 37232, USA;
| | - James S. Sutcliffe
- Vanderbilt Genetics Institute, Department of Molecular Physiology & Biophysics, Department of Psychiatry and Behavioral Sciences, Vanderbilt University School of Medicine, Nashville, TN 37232, USA;
| | - Jason H. Moore
- Department of Biostatistics, Epidemiology, & Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA;
| |
Collapse
|
9
|
Xie J, Ma A, Fennell A, Ma Q, Zhao J. It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data. Brief Bioinform 2020; 20:1449-1464. [PMID: 29490019 DOI: 10.1093/bib/bby014] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Revised: 01/16/2018] [Indexed: 12/12/2022] Open
Abstract
Biclustering is a powerful data mining technique that allows clustering of rows and columns, simultaneously, in a matrix-format data set. It was first applied to gene expression data in 2000, aiming to identify co-expressed genes under a subset of all the conditions/samples. During the past 17 years, tens of biclustering algorithms and tools have been developed to enhance the ability to make sense out of large data sets generated in the wake of high-throughput omics technologies. These algorithms and tools have been applied to a wide variety of data types, including but not limited to, genomes, transcriptomes, exomes, epigenomes, phenomes and pharmacogenomes. However, there is still a considerable gap between biclustering methodology development and comprehensive data interpretation, mainly because of the lack of knowledge for the selection of appropriate biclustering tools and further supporting computational techniques in specific studies. Here, we first deliver a brief introduction to the existing biclustering algorithms and tools in public domain, and then systematically summarize the basic applications of biclustering for biological data and more advanced applications of biclustering for biomedical data. This review will assist researchers to effectively analyze their big data and generate valuable biological knowledge and novel insights with higher efficiency.
Collapse
|