1
|
Zhang H, Liang S, Xu T, Li W, Huang D, Dong Y, Li G, Miller JP, Goedegebuure SP, Sardiello M, Cooper J, Buchser W, Dickson P, Fields RC, Cruchaga C, Chen Y, Province M, Payne P, Li F. BioMedGraphica: An All-in-One Platform for Biomedical Prior Knowledge and Omic Signaling Graph Generation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.05.627020. [PMID: 39713411 PMCID: PMC11661111 DOI: 10.1101/2024.12.05.627020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Artificial intelligence (AI) is revolutionizing scientific discovery because of its super capability, following the neural scaling laws, to integrate and analyze large-scale datasets to mine knowledge. Foundation models, large language models (LLMs) and large vision models (LVMs), are among the most important foundations paving the way for general AI by pre-training on massive domain-specific datasets. Different from the well annotated, formatted and integrated large textual and image datasets for LLMs and LVMs, biomedical knowledge and datasets are fragmented with data scattered across publications and inconsistent databases that often use diverse nomenclature systems in the field of AI for Precision Health and Medicine (AI4PHM). These discrepancies, spanning different levels of biomedical organization from genes to clinical traits, present major challenges for data integration and alignment. To facilitate foundation AI model development and applications in AI4PHM, herein, we developed BioMedGraphica, an all-in-one platform and unified text-attributed knowledge graph (TAKG), consists of 3,131,788 entities and 56,817,063 relations, which are obtained from 11 distinct entity types and harmonizes 29 relations/edge types using data from 43 biomedical databases. All entities and relations are labeled a unique ID and associated with textual descriptions (textual features). Since covers most of research entities in AI4PHM, BioMedGraphica supports the zero-shot or few-shot knowledge discoveries via new relation prediction on the graph. Via a graphical user interface (GUI), researchers can access the knowledge graph with prior knowledge of target functional annotations, drugs, phenotypes and diseases (drug-protein-disease-phenotype), in the graph AI ready format. It also supports the generation of knowledge-multi-omic signaling graphs to facilitate the development and applications of novel AI models, like LLMs, graph AI, for AI4PHM science discovery, like discovering novel disease pathogenesis, signaling pathways, therapeutic targets, drugs and synergistic cocktails.
Collapse
Affiliation(s)
- Heming Zhang
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Shunning Liang
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Tim Xu
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
- Department of Computer Science and Engineering, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Wenyu Li
- Department of Computer Science and Engineering, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Di Huang
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
- Department of Computer Science and Engineering, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Yuhan Dong
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Guangfu Li
- Department of Surgery, School of Medicine, University of Connecticut, CT, 06032, USA
| | - J. Philip Miller
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - S. Peter Goedegebuure
- Department of Surgery, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
- Siteman Cancer Center, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Marco Sardiello
- Department of Pediatrics, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Jonathan Cooper
- Department of Pediatrics, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - William Buchser
- Department of Genetics, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Patricia Dickson
- Department of Pediatrics, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Ryan C. Fields
- Department of Surgery, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
- Siteman Cancer Center, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Carlos Cruchaga
- Department of Psychiatry, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
- NeuroGenomics and Informatics, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Yixin Chen
- Department of Computer Science and Engineering, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Michael Province
- Division of Statistical Genomics, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Philip Payne
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Fuhai Li
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
- Department of Pediatrics, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| |
Collapse
|
2
|
Ren Z, Ren Y, Li Z, Xu H. TCMM: A unified database for traditional Chinese medicine modernization and therapeutic innovations. Comput Struct Biotechnol J 2024; 23:1619-1630. [PMID: 38680873 PMCID: PMC11047297 DOI: 10.1016/j.csbj.2024.04.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 03/30/2024] [Accepted: 04/09/2024] [Indexed: 05/01/2024] Open
Abstract
Mining the potential of traditional Chinese medicine (TCM) in treating modern diseases requires a profound understanding of its action mechanism and a comprehensive knowledge system that seamlessly bridges modern medical insights with traditional theories. However, existing databases for modernizing TCM are plagued by varying degrees of information loss, which impede the multidimensional dissection of pharmacological effects. To address this challenge, we introduce traditional Chinese medicine modernization (TCMM), the currently largest modernized TCM database that integrates pioneering intelligent pipelines. By aligning high-quality TCM and modern medicine data, TCMM boasts the most extensive TCM modernization knowledge, including 20 types of modernized TCM concepts such as prescription, ingredient, target and 46 biological relations among them, totaling 3,447,023 records. We demonstrate the efficacy and reliability of TCMM with two features, prescription generation and knowledge discovery, the outcomes show consistency with biological experimental results. A publicly available web interface is at https://www.tcmm.net.cn/.
Collapse
Affiliation(s)
- Zhixiang Ren
- Peng Cheng Laboratory, Shenzhen, 518055, Guangdong Province, China
| | - Yiming Ren
- Peng Cheng Laboratory, Shenzhen, 518055, Guangdong Province, China
| | - Zeting Li
- Peng Cheng Laboratory, Shenzhen, 518055, Guangdong Province, China
| | - Huan Xu
- School of Public Health, Anhui University of Science and Technology, Hefei, 231131, Anhui Province, China
| |
Collapse
|
4
|
Luo D, Tong Z, Wen L, Bai M, Jin X, Liu Z, Li Y, Xue W. DTNPD: A comprehensive database of drugs and targets for neurological and psychiatric disorders. Comput Biol Med 2024; 175:108536. [PMID: 38701592 DOI: 10.1016/j.compbiomed.2024.108536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/15/2024] [Accepted: 04/28/2024] [Indexed: 05/05/2024]
Abstract
In response to the shortcomings in data quality and coverage for neurological and psychiatric disorders (NPDs) in existing comprehensive databases, this paper introduces the DTNPD database, specifically designed for NPDs. DTNPD contains detailed information on 30 NPDs types, 1847 drugs, 514 drug targets, 64 drug combinations, and 61 potential target combinations, forming a network with 2389 drug-target associations. The database is user-friendly, offering open access and downloadable data, which is crucial for network pharmacology studies. The key strength of DTNPD lies in its robust networks of drug and target combinations, as well as drug-target networks, facilitating research and development in the field of NPDs. The development of the DTNPD database marks a significant milestone in understanding and treating NPDs. For accessing the DTNPD database, the primary URL is http://dtnpd.cnsdrug.com, complemented by a mirror site available at http://dtnpd.lyhbio.com.
Collapse
Affiliation(s)
- Ding Luo
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
| | - Zhuohao Tong
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Lu Wen
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Mingze Bai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Xiaojie Jin
- College of Pharmacy, Gansu University of Chinese Medicine, Lanzhou, 730000, China
| | - Zerong Liu
- Central Nervous System Drug Key Laboratory of Sichuan Province, Sichuan Credit Pharmaceutical Co., Ltd, Sichuan, 646100, China; Key Laboratory of Biorheological Science and Technology, Ministry of Education, College of Bioengineering, Chongqing University, Chongqing, 400030, China
| | - Yinghong Li
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China.
| |
Collapse
|