1
|
Zhang C, Zhang Z, Zhang F, Zeng B, Liu X, Wang L. A computational model for potential microbe-disease association detection based on improved graph convolutional networks and multi-channel autoencoders. Front Microbiol 2024; 15:1435408. [PMID: 39144226 PMCID: PMC11322764 DOI: 10.3389/fmicb.2024.1435408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 07/05/2024] [Indexed: 08/16/2024] Open
Abstract
Introduction Accumulating evidence shows that human health and disease are closely related to the microbes in the human body. Methods In this manuscript, a new computational model based on graph attention networks and sparse autoencoders, called GCANCAE, was proposed for inferring possible microbe-disease associations. In GCANCAE, we first constructed a heterogeneous network by combining known microbe-disease relationships, disease similarity, and microbial similarity. Then, we adopted the improved GCN and the CSAE to extract neighbor relations in the adjacency matrix and novel feature representations in heterogeneous networks. After that, in order to estimate the likelihood of a potential microbe associated with a disease, we integrated these two types of representations to create unique eigenmatrices for diseases and microbes, respectively, and obtained predicted scores for potential microbe-disease associations by calculating the inner product of these two types of eigenmatrices. Results and discussion Based on the baseline databases such as the HMDAD and the Disbiome, intensive experiments were conducted to evaluate the prediction ability of GCANCAE, and the experimental results demonstrated that GCANCAE achieved better performance than state-of-the-art competitive methods under the frameworks of both 2-fold and 5-fold CV. Furthermore, case studies of three categories of common diseases, such as asthma, irritable bowel syndrome (IBS), and type 2 diabetes (T2D), confirmed the efficiency of GCANCAE.
Collapse
Affiliation(s)
| | - Zhen Zhang
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China
| | | | | | - Xin Liu
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China
| | - Lei Wang
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China
| |
Collapse
|
2
|
Das Baksi K, Pokhrel V, Pudavar AE, Mande SS, Kuntal BK. BactInt: A domain driven transfer learning approach for extracting inter-bacterial associations from biomedical text. Comput Biol Chem 2024; 109:108012. [PMID: 38198963 DOI: 10.1016/j.compbiolchem.2023.108012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 12/15/2023] [Accepted: 12/30/2023] [Indexed: 01/12/2024]
Abstract
BACKGROUND The healthy as well as dysbiotic state of an ecosystem like human body is known to be influenced not only by the presence of the bacterial groups in it, but also with respect to the associations within themselves. Evidence reported in biomedical text serves as a reliable source for identifying and ascertaining such inter bacterial associations. However, the complexity of the reported text as well as the ever-increasing volume of information necessitates development of methods for automated and accurate extraction of such knowledge. METHODS A BioBERT (biomedical domain specific language model) based information extraction model for bacterial associations is presented that utilizes learning patterns from other publicly available datasets. Additionally, a specialized sentence corpus has been developed to significantly improve the prediction accuracy of the 'transfer learned' model using a fine-tuning approach. RESULTS The final model was seen to outperform all other variations (non-transfer learned and non-fine-tuned models) as well as models trained on BioGPT (a domain trained Generative Pre-trained Transformer). To further demonstrate the utility, a case study was performed using bacterial association network data obtained from experimental studies. CONCLUSION This study attempts to demonstrate the applicability of transfer learning in a niche field of life sciences where understanding of inter bacterial relationships is crucial to obtain meaningful insights in comprehending microbial community structures across different ecosystems. The study further discusses how such a model can be further improved by fine tuning using limited training data. The results presented and the datasets made available are expected to be a valuable addition in the field of medical informatics and bioinformatics.
Collapse
Affiliation(s)
| | - Vatsala Pokhrel
- TCS Research, Tata Consultancy Services Ltd, Pune 411057, India
| | | | | | - Bhusan K Kuntal
- TCS Research, Tata Consultancy Services Ltd, Pune 411057, India.
| |
Collapse
|
3
|
Karkera N, Acharya S, Palaniappan SK. Leveraging pre-trained language models for mining microbiome-disease relationships. BMC Bioinformatics 2023; 24:290. [PMID: 37468830 PMCID: PMC10357883 DOI: 10.1186/s12859-023-05411-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 07/13/2023] [Indexed: 07/21/2023] Open
Abstract
BACKGROUND The growing recognition of the microbiome's impact on human health and well-being has prompted extensive research into discovering the links between microbiome dysbiosis and disease (healthy) states. However, this valuable information is scattered in unstructured form within biomedical literature. The structured extraction and qualification of microbe-disease interactions are important. In parallel, recent advancements in deep-learning-based natural language processing algorithms have revolutionized language-related tasks such as ours. This study aims to leverage state-of-the-art deep-learning language models to extract microbe-disease relationships from biomedical literature. RESULTS In this study, we first evaluate multiple pre-trained large language models within a zero-shot or few-shot learning context. In this setting, the models performed poorly out of the box, emphasizing the need for domain-specific fine-tuning of these language models. Subsequently, we fine-tune multiple language models (specifically, GPT-3, BioGPT, BioMedLM, BERT, BioMegatron, PubMedBERT, BioClinicalBERT, and BioLinkBERT) using labeled training data and evaluate their performance. Our experimental results demonstrate the state-of-the-art performance of these fine-tuned models ( specifically GPT-3, BioMedLM, and BioLinkBERT), achieving an average F1 score, precision, and recall of over [Formula: see text] compared to the previous best of 0.74. CONCLUSION Overall, this study establishes that pre-trained language models excel as transfer learners when fine-tuned with domain and problem-specific data, enabling them to achieve state-of-the-art results even with limited training data for extracting microbiome-disease interactions from scientific publications.
Collapse
Affiliation(s)
| | - Sathwik Acharya
- The Systems Biology Institute, Tokyo, Japan
- PES University, Bengaluru, India
| | - Sucheendra K Palaniappan
- The Systems Biology Institute, Tokyo, Japan.
- Iom Bioworks Pvt Ltd., Bengaluru, India.
- SBX Corporation, Tokyo, Japan.
| |
Collapse
|
4
|
Cai L, Li J, Lv H, Liu W, Niu H, Wang Z. Integrating domain knowledge for biomedical text analysis into deep learning: A survey. J Biomed Inform 2023; 143:104418. [PMID: 37290540 DOI: 10.1016/j.jbi.2023.104418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 04/24/2023] [Accepted: 05/31/2023] [Indexed: 06/10/2023]
Abstract
The past decade has witnessed an explosion of textual information in the biomedical field. Biomedical texts provide a basis for healthcare delivery, knowledge discovery, and decision-making. Over the same period, deep learning has achieved remarkable performance in biomedical natural language processing, however, its development has been limited by well-annotated datasets and interpretability. To solve this, researchers have considered combining domain knowledge (such as biomedical knowledge graph) with biomedical data, which has become a promising means of introducing more information into biomedical datasets and following evidence-based medicine. This paper comprehensively reviews more than 150 recent literature studies on incorporating domain knowledge into deep learning models to facilitate typical biomedical text analysis tasks, including information extraction, text classification, and text generation. We eventually discuss various challenges and future directions.
Collapse
Affiliation(s)
- Linkun Cai
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China
| | - Jia Li
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China
| | - Han Lv
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China
| | - Wenjuan Liu
- Aerospace Center Hospital, 100049 Beijing, China
| | - Haijun Niu
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China
| | - Zhenchang Wang
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China; Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China.
| |
Collapse
|
5
|
Cheng P, Xiao W, Ning P, Li L, Rao Z, Yang L, Schwebel DC, Yang Y, Huang Y, Hu G. ARTCDP: An automated data platform for monitoring emerging patterns concerning road traffic crashes in China. ACCIDENT; ANALYSIS AND PREVENTION 2022; 174:106727. [PMID: 35667199 DOI: 10.1016/j.aap.2022.106727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 04/02/2022] [Accepted: 05/30/2022] [Indexed: 06/15/2023]
Abstract
Online media reports provide valuable information for road traffic injury prevention, but technical challenges concerning data acquisition and processing limit analysis and interpretation of such data. Integrating injury epidemiology theory and big data technology, we developed a data platform consisting of four layers (data acquisition, data processing, application and data storage) to automatically collect reports from online Chinese media concerning road traffic crashes every 24 h. We built a text classification model using 20,000 manually annotated news stories based on the Bidirectional Encoder Representations from Transformers (BERT) and then used natural language processing algorithms to extract data concerning 27 structured variables from the news sources. The accuracy of the BERT-based text classification model was 0.9271, with information extraction accuracy exceeding 80% for 22 variables. As of November 30, 2021, the data platform collected 244,650 eligible media reports covering all 333 prefecture-level divisions in China. These reports were from 37,073 websites or social media accounts, which were geographically located in all 31 provinces and over 98% of prefecture-level divisions. Data availability varied greatly from 0.9% to 100% across the 27 structured variables. Additionally, the platform identified 645,787 potentially relevant keywords when applying natural language processing techniques to the textual media reports. Platform data were highly correlated with road police data in province-based road traffic crash statistics (crashes, rs = 0.799; non-fatal injuries, rs = 0.802; deaths, rs = 0.775). In particular, the platform offers valuable data (like crashes involving electric vehicles) that are not included in official road traffic crash statistics. The new automated data platform shows great potential for timely detection of emerging characteristics of road traffic crashes. Further research is needed to improve the platform and apply it to real-time monitoring and analysis of road traffic injuries.
Collapse
Affiliation(s)
- Peixia Cheng
- Department of Epidemiology and Health Statistics, Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, Changsha, China.
| | - Wangxin Xiao
- Department of Epidemiology and Health Statistics, Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, Changsha, China.
| | - Peishan Ning
- Department of Epidemiology and Health Statistics, Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, Changsha, China.
| | - Li Li
- Department of Epidemiology and Health Statistics, Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, Changsha, China.
| | - Zhenzhen Rao
- Department of Epidemiology and Health Statistics, Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, Changsha, China.
| | - Lei Yang
- Department of Epidemiology and Health Statistics, Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, Changsha, China.
| | - David C Schwebel
- Department of Psychology, University of Alabama at Birmingham, Birmingham, AL, United States.
| | - Yang Yang
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL, United States.
| | - Yun Huang
- Department of Occupational and Environmental Health, Xiangya School of Public Health, Central South University, Changsha, China.
| | - Guoqing Hu
- Department of Epidemiology and Health Statistics, Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, Changsha, China; National Clinical Research Center for Geriatric Disorders (Xiangya Hospital), Central South University, Changsha, China.
| |
Collapse
|
6
|
Ahmed SAJA, Bapatdhar N, Kumar BP, Ghosh S, Yachie A, Palaniappan SK. Large scale text mining for deriving useful insights: A case study focused on microbiome. Front Physiol 2022; 13:933069. [PMID: 36117696 PMCID: PMC9473635 DOI: 10.3389/fphys.2022.933069] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 07/18/2022] [Indexed: 11/23/2022] Open
Abstract
Text mining has been shown to be an auxiliary but key driver for modeling, data harmonization, and interpretation in bio-medicine. Scientific literature holds a wealth of information and embodies cumulative knowledge and remains the core basis on which mechanistic pathways, molecular databases, and models are built and refined. Text mining provides the necessary tools to automatically harness the potential of text. In this study, we show the potential of large-scale text mining for deriving novel insights, with a focus on the growing field of microbiome. We first collected the complete set of abstracts relevant to the microbiome from PubMed and used our text mining and intelligence platform Taxila for analysis. We drive the usefulness of text mining using two case studies. First, we analyze the geographical distribution of research and study locations for the field of microbiome by extracting geo mentions from text. Using this analysis, we were able to draw useful insights on the state of research in microbiome w. r.t geographical distributions and economic drivers. Next, to understand the relationships between diseases, microbiome, and food which are central to the field, we construct semantic relationship networks between these different concepts central to the field of microbiome. We show how such networks can be useful to derive useful insight with no prior knowledge encoded.
Collapse
Affiliation(s)
| | | | | | - Samik Ghosh
- SBX Corporation Inc., Tokyo, Japan
- The NLP Group, The Systems Biology Institute, Tokyo, Japan
| | - Ayako Yachie
- SBX Corporation Inc., Tokyo, Japan
- The NLP Group, The Systems Biology Institute, Tokyo, Japan
| | - Sucheendra K. Palaniappan
- SBX Corporation Inc., Tokyo, Japan
- The NLP Group, The Systems Biology Institute, Tokyo, Japan
- *Correspondence: Sucheendra K. Palaniappan,
| |
Collapse
|
7
|
Xie D, He S, Han L, Wu L, Huang H, Tao H, Zhou P, Shi X, Bai H, Bo X. Systematic optimization of host-directed therapeutic targets and preclinical validation of repositioned antiviral drugs. Brief Bioinform 2022; 23:bbac047. [PMID: 35238349 PMCID: PMC9116211 DOI: 10.1093/bib/bbac047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 01/26/2022] [Accepted: 01/28/2022] [Indexed: 11/12/2022] Open
Abstract
Inhibition of host protein functions using established drugs produces a promising antiviral effect with excellent safety profiles, decreased incidence of resistant variants and favorable balance of costs and risks. Genomic methods have produced a large number of robust host factors, providing candidates for identification of antiviral drug targets. However, there is a lack of global perspectives and systematic prioritization of known virus-targeted host proteins (VTHPs) and drug targets. There is also a need for host-directed repositioned antivirals. Here, we integrated 6140 VTHPs and grouped viral infection modes from a new perspective of enriched pathways of VTHPs. Clarifying the superiority of nonessential membrane and hub VTHPs as potential ideal targets for repositioned antivirals, we proposed 543 candidate VTHPs. We then presented a large-scale drug-virus network (DVN) based on matching these VTHPs and drug targets. We predicted possible indications for 703 approved drugs against 35 viruses and explored their potential as broad-spectrum antivirals. In vitro and in vivo tests validated the efficacy of bosutinib, maraviroc and dextromethorphan against human herpesvirus 1 (HHV-1), hepatitis B virus (HBV) and influenza A virus (IAV). Their drug synergy with clinically used antivirals was evaluated and confirmed. The results proved that low-dose dextromethorphan is better than high-dose in both single and combined treatments. This study provides a comprehensive landscape and optimization strategy for druggable VTHPs, constructing an innovative and potent pipeline to discover novel antiviral host proteins and repositioned drugs, which may facilitate their delivery to clinical application in translational medicine to combat fatal and spreading viral infections.
Collapse
Affiliation(s)
- Dafei Xie
- Beijing Institute of Radiation Medicine, Beijing, China, 100850
| | - Song He
- Beijing Institute of Radiation Medicine, Beijing, China, 100850
| | - Lu Han
- Beijing Institute of Pharmacology and Toxicology, Beijing, China, 100850
| | - Lianlian Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China, 300072
| | - Hai Huang
- Department of Biological Medicines, School of Pharmacy, Fudan University, Shanghai, China, 201203
| | - Huan Tao
- Beijing Institute of Radiation Medicine, Beijing, China, 100850
| | - Pingkun Zhou
- Beijing Institute of Radiation Medicine, Beijing, China, 100850
| | - Xunlong Shi
- Department of Biological Medicines, School of Pharmacy, Fudan University, Shanghai, China, 201203
| | - Hui Bai
- BioMap (Beijing) Intelligence Technology Limited, Beijing, China, 100005
| | - Xiaochen Bo
- Beijing Institute of Radiation Medicine, Beijing, China, 100850
| |
Collapse
|
8
|
Wang L, Tan Y, Yang X, Kuang L, Ping P. Review on predicting pairwise relationships between human microbes, drugs and diseases: from biological data to computational models. Brief Bioinform 2022; 23:6553604. [PMID: 35325024 DOI: 10.1093/bib/bbac080] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 02/14/2022] [Accepted: 02/15/2022] [Indexed: 12/11/2022] Open
Abstract
In recent years, with the rapid development of techniques in bioinformatics and life science, a considerable quantity of biomedical data has been accumulated, based on which researchers have developed various computational approaches to discover potential associations between human microbes, drugs and diseases. This paper provides a comprehensive overview of recent advances in prediction of potential correlations between microbes, drugs and diseases from biological data to computational models. Firstly, we introduced the widely used datasets relevant to the identification of potential relationships between microbes, drugs and diseases in detail. And then, we divided a series of a lot of representative computing models into five major categories including network, matrix factorization, matrix completion, regularization and artificial neural network for in-depth discussion and comparison. Finally, we analysed possible challenges and opportunities in this research area, and at the same time we outlined some suggestions for further improvement of predictive performances as well.
Collapse
Affiliation(s)
- Lei Wang
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China.,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, Hunan, China
| | - Yaqin Tan
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China.,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, Hunan, China
| | - Xiaoyu Yang
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China.,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, Hunan, China
| | - Linai Kuang
- Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, Hunan, China
| | - Pengyao Ping
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China
| |
Collapse
|
9
|
Chai Z, Jin H, Shi S, Zhan S, Zhuo L, Yang Y. Hierarchical shared transfer learning for biomedical named entity recognition. BMC Bioinformatics 2022; 23:8. [PMID: 34983362 PMCID: PMC8729142 DOI: 10.1186/s12859-021-04551-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 12/22/2021] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Biomedical named entity recognition (BioNER) is a basic and important medical information extraction task to extract medical entities with special meaning from medical texts. In recent years, deep learning has become the main research direction of BioNER due to its excellent data-driven context coding ability. However, in BioNER task, deep learning has the problem of poor generalization and instability. RESULTS we propose the hierarchical shared transfer learning, which combines multi-task learning and fine-tuning, and realizes the multi-level information fusion between the underlying entity features and the upper data features. We select 14 datasets containing 4 types of entities for training and evaluate the model. The experimental results showed that the F1-scores of the five gold standard datasets BC5CDR-chemical, BC5CDR-disease, BC2GM, BC4CHEMD, NCBI-disease and LINNAEUS were increased by 0.57, 0.90, 0.42, 0.77, 0.98 and - 2.16 compared to the single-task XLNet-CRF model. BC5CDR-chemical, BC5CDR-disease and BC4CHEMD achieved state-of-the-art results.The reasons why LINNAEUS's multi-task results are lower than single-task results are discussed at the dataset level. CONCLUSION Compared with using multi-task learning and fine-tuning alone, the model has more accurate recognition ability of medical entities, and has higher generalization and stability.
Collapse
Affiliation(s)
- Zhaoying Chai
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Han Jin
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Shenghui Shi
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China.
| | - Siyan Zhan
- School of Public Health, Peking University, Beijing, China.
| | - Lin Zhuo
- Research Center of Clinical Epidemiology, Peking University Third Hospital, Beijing, China
| | - Yu Yang
- National Institute of Health Data Science, Peking University, Beijing, China
| |
Collapse
|