1
|
Purpura A, Mulligan N, Kartoun U, Koski E, Anand V, Bettencourt-Silva J. Investigating Cross-Domain Binary Relation Classification in Biomedical Natural Language Processing. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:384-390. [PMID: 38827064 PMCID: PMC11141837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
This paper addresses the challenge of binary relation classification in biomedical Natural Language Processing (NLP), focusing on diverse domains including gene-disease associations, compound protein interactions, and social determinants of health (SDOH). We evaluate different approaches, including fine-tuning Bidirectional Encoder Representations from Transformers (BERT) models and generative Large Language Models (LLMs), and examine their performance in zero and few-shot settings. We also introduce a novel dataset of biomedical text annotated with social and clinical entities to facilitate research into relation classification. Our results underscore the continued complexity of this task for both humans and models. BERT-based models trained on domain-specific data excelled in certain domains and achieved comparable performance and generalization power to generative LLMs in others. Despite these encouraging results, these models are still far from achieving human-level performance. We also highlight the significance of high-quality training data and domain-specific fine-tuning on the performance of all the considered models.
Collapse
|
2
|
Liu H, Song C, Wang J, Chen Z, Zhang X, Zhou H, Yao L, Chen D, Gu W, Huang RK, Huang BK, Han BW, Du J. Development of fecal microbial diagnostic marker sets of colorectal cancer using natural language processing method. Int J Biol Markers 2024; 39:31-39. [PMID: 38128926 DOI: 10.1177/03936155231210881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
BACKGROUND Cancer screening and early detection greatly increase the chances of successful treatment. However, most cancer types lack effective early screening biomarkers. In recent years, natural language processing (NLP)-based text-mining methods have proven effective in searching the scientific literature and identifying promising associations between potential biomarkers and disease, but unfortunately few are widely used. METHODS In this study, we used an NLP-enabled text-mining system, MarkerGenie, to identify potential stool bacterial markers for early detection and screening of colorectal cancer. After filtering markers based on text-mining results, we validated bacterial markers using multiplex digital droplet polymerase chain reaction (ddPCR). Classifiers were built based on ddPCR results, and sensitivity, specificity, and area under the curve (AUC) were used to evaluate the performance. RESULTS A total of 7 of the 14 bacterial markers showed significantly increased abundance in the stools of colorectal cancer patients. A five-bacteria classifier for colorectal cancer diagnosis was built, and achieved an AUC of 0.852, with a sensitivity of 0.692 and specificity of 0.935. When combined with the fecal immunochemical test (FIT), our classifier achieved an AUC of 0.959 and increased the sensitivity of FIT (0.929 vs. 0.872) at a specificity of 0.900. CONCLUSIONS Our study provides a valuable case example of the use of NLP-based marker mining for biomarker identification.
Collapse
Affiliation(s)
- Houcong Liu
- Research Center for Clinical and Translational Medicine, Huazhong University of Science and Technology Union Shenzhen Hospital, and the 6th Affiliated Hospital of Shenzhen University Medical School, Shenzhen, Guangdong, China
| | - Changpu Song
- Guangdong Jiyin Biotech Co. Ltd, Shenzhen, Guangdong, China
| | - Jidong Wang
- Research Center for Clinical and Translational Medicine, Huazhong University of Science and Technology Union Shenzhen Hospital, and the 6th Affiliated Hospital of Shenzhen University Medical School, Shenzhen, Guangdong, China
| | - Zhufang Chen
- Research Center for Clinical and Translational Medicine, Huazhong University of Science and Technology Union Shenzhen Hospital, and the 6th Affiliated Hospital of Shenzhen University Medical School, Shenzhen, Guangdong, China
| | - Xiaohong Zhang
- Research Center for Clinical and Translational Medicine, Huazhong University of Science and Technology Union Shenzhen Hospital, and the 6th Affiliated Hospital of Shenzhen University Medical School, Shenzhen, Guangdong, China
| | - Hekai Zhou
- Research Center for Clinical and Translational Medicine, Huazhong University of Science and Technology Union Shenzhen Hospital, and the 6th Affiliated Hospital of Shenzhen University Medical School, Shenzhen, Guangdong, China
| | - Linhong Yao
- Research Center for Clinical and Translational Medicine, Huazhong University of Science and Technology Union Shenzhen Hospital, and the 6th Affiliated Hospital of Shenzhen University Medical School, Shenzhen, Guangdong, China
| | - Dan Chen
- Guangdong Jiyin Biotech Co. Ltd, Shenzhen, Guangdong, China
| | - Wenhao Gu
- Guangdong Jiyin Biotech Co. Ltd, Shenzhen, Guangdong, China
| | - Rui-Kun Huang
- Guangdong Jiyin Biotech Co. Ltd, Shenzhen, Guangdong, China
| | - Bing-Kun Huang
- Guangdong Jiyin Biotech Co. Ltd, Shenzhen, Guangdong, China
| | - Bo-Wei Han
- Guangdong Jiyin Biotech Co. Ltd, Shenzhen, Guangdong, China
| | - Jihui Du
- Research Center for Clinical and Translational Medicine, Huazhong University of Science and Technology Union Shenzhen Hospital, and the 6th Affiliated Hospital of Shenzhen University Medical School, Shenzhen, Guangdong, China
| |
Collapse
|