1
|
Chen P, Wang J, Luo L, Lin H, Yang Z. Learning to explain is a good biomedical few-shot learner. Bioinformatics 2024; 40:btae589. [PMID: 39360976 PMCID: PMC11483110 DOI: 10.1093/bioinformatics/btae589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 09/24/2024] [Accepted: 10/01/2024] [Indexed: 10/05/2024] Open
Abstract
MOTIVATION Significant progress has been achieved in biomedical text mining using deep learning methods, which rely heavily on large amounts of high-quality data annotated by human experts. However, the reality is that obtaining high-quality annotated data is extremely challenging due to data scarcity (e.g. rare or new diseases), data privacy and security concerns, and the high cost of data annotation. Additionally, nearly all researches focus on predicting labels without providing corresponding explanations. Therefore, in this paper, we investigate a more realistic scenario, biomedical few-shot learning, and explore the impact of interpretability on biomedical few-shot learning. RESULTS We present LetEx-Learning to explain-a novel multi-task generative approach that leverages reasoning explanations from large language models (LLMs) to enhance the inductive reasoning ability of few-shot learning. Our approach includes (1) collecting high-quality explanations by devising a suite of complete workflow based on LLMs through CoT prompting and self-training strategies, (2) converting various biomedical NLP tasks into a text-to-text generation task in a unified manner, where collected explanations serve as additional supervision between text-label pairs by multi-task training. Experiments are conducted on three few-shot settings across six biomedical benchmark datasets. The results show that learning to explain improves the performances of diverse biomedical NLP tasks in low-resource scenario, outperforming strong baseline models significantly by up to 6.41%. Notably, the proposed method makes the 220M LetEx perform superior reasoning explanation ability against LLMs. AVAILABILITY AND IMPLEMENTATION Our source code and data are available at https://github.com/cpmss521/LetEx.
Collapse
Affiliation(s)
- Peng Chen
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Jian Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Ling Luo
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Hongfei Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zhihao Yang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
2
|
Zhou H, Li M, Xiao Y, Yang H, Zhang R. LEAP: LLM instruction-example adaptive prompting framework for biomedical relation extraction. J Am Med Inform Assoc 2024; 31:2010-2018. [PMID: 38904416 PMCID: PMC11339510 DOI: 10.1093/jamia/ocae147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 05/26/2024] [Accepted: 06/03/2024] [Indexed: 06/22/2024] Open
Abstract
OBJECTIVE To investigate the demonstration in large language models (LLMs) for biomedical relation extraction. This study introduces a framework comprising three types of adaptive tuning methods to assess their impacts and effectiveness. MATERIALS AND METHODS Our study was conducted in two phases. Initially, we analyzed a range of demonstration components vital for LLMs' biomedical data capabilities, including task descriptions and examples, experimenting with various combinations. Subsequently, we introduced the LLM instruction-example adaptive prompting (LEAP) framework, including instruction adaptive tuning, example adaptive tuning, and instruction-example adaptive tuning methods. This framework aims to systematically investigate both adaptive task descriptions and adaptive examples within the demonstration. We assessed the performance of the LEAP framework on the DDI, ChemProt, and BioRED datasets, employing LLMs such as Llama2-7b, Llama2-13b, and MedLLaMA_13B. RESULTS Our findings indicated that Instruction + Options + Example and its expanded form substantially improved F1 scores over the standard Instruction + Options mode for zero-shot LLMs. The LEAP framework, particularly through its example adaptive prompting, demonstrated superior performance over conventional instruction tuning across all models. Notably, the MedLLAMA_13B model achieved an exceptional F1 score of 95.13 on the ChemProt dataset using this method. Significant improvements were also observed in the DDI 2013 and BioRED datasets, confirming the method's robustness in sophisticated data extraction scenarios. CONCLUSION The LEAP framework offers a compelling strategy for enhancing LLM training strategies, steering away from extensive fine-tuning towards more dynamic and contextually enriched prompting methodologies, showcasing in biomedical relation extraction.
Collapse
Affiliation(s)
- Huixue Zhou
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Mingchen Li
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| | - Yongkang Xiao
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Han Yang
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Rui Zhang
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| |
Collapse
|
3
|
Li M, Zhou H, Yang H, Zhang R. RT: a Retrieving and Chain-of-Thought framework for few-shot medical named entity recognition. J Am Med Inform Assoc 2024; 31:1929-1938. [PMID: 38708849 PMCID: PMC11339512 DOI: 10.1093/jamia/ocae095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 04/10/2024] [Accepted: 04/15/2024] [Indexed: 05/07/2024] Open
Abstract
OBJECTIVES This article aims to enhance the performance of larger language models (LLMs) on the few-shot biomedical named entity recognition (NER) task by developing a simple and effective method called Retrieving and Chain-of-Thought (RT) framework and to evaluate the improvement after applying RT framework. MATERIALS AND METHODS Given the remarkable advancements in retrieval-based language model and Chain-of-Thought across various natural language processing tasks, we propose a pioneering RT framework designed to amalgamate both approaches. The RT approach encompasses dedicated modules for information retrieval and Chain-of-Thought processes. In the retrieval module, RT discerns pertinent examples from demonstrations during instructional tuning for each input sentence. Subsequently, the Chain-of-Thought module employs a systematic reasoning process to identify entities. We conducted a comprehensive comparative analysis of our RT framework against 16 other models for few-shot NER tasks on BC5CDR and NCBI corpora. Additionally, we explored the impacts of negative samples, output formats, and missing data on performance. RESULTS Our proposed RT framework outperforms other LMs for few-shot NER tasks with micro-F1 scores of 93.50 and 91.76 on BC5CDR and NCBI corpora, respectively. We found that using both positive and negative samples, Chain-of-Thought (vs Tree-of-Thought) performed better. Additionally, utilization of a partially annotated dataset has a marginal effect of the model performance. DISCUSSION This is the first investigation to combine a retrieval-based LLM and Chain-of-Thought methodology to enhance the performance in biomedical few-shot NER. The retrieval-based LLM aids in retrieving the most relevant examples of the input sentence, offering crucial knowledge to predict the entity in the sentence. We also conducted a meticulous examination of our methodology, incorporating an ablation study. CONCLUSION The RT framework with LLM has demonstrated state-of-the-art performance on few-shot NER tasks.
Collapse
Affiliation(s)
- Mingchen Li
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| | - Huixue Zhou
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Han Yang
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Rui Zhang
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| |
Collapse
|
4
|
Xiao Y, Hou Y, Zhou H, Diallo G, Fiszman M, Wolfson J, Zhou L, Kilicoglu H, Chen Y, Su C, Xu H, Mantyh WG, Zhang R. Repurposing non-pharmacological interventions for Alzheimer's disease through link prediction on biomedical literature. Sci Rep 2024; 14:8693. [PMID: 38622164 PMCID: PMC11018822 DOI: 10.1038/s41598-024-58604-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Accepted: 04/01/2024] [Indexed: 04/17/2024] Open
Abstract
Non-pharmaceutical interventions (NPI) have great potential to improve cognitive function but limited investigation to discover NPI repurposing for Alzheimer's Disease (AD). This is the first study to develop an innovative framework to extract and represent NPI information from biomedical literature in a knowledge graph (KG), and train link prediction models to repurpose novel NPIs for AD prevention. We constructed a comprehensive KG, called ADInt, by extracting NPI information from biomedical literature. We used the previously-created SuppKG and NPI lexicon to identify NPI entities. Four KG embedding models (i.e., TransE, RotatE, DistMult and ComplEX) and two novel graph convolutional network models (i.e., R-GCN and CompGCN) were trained and compared to learn the representation of ADInt. Models were evaluated and compared on two test sets (time slice and clinical trial ground truth) and the best performing model was used to predict novel NPIs for AD. Discovery patterns were applied to generate mechanistic pathways for high scoring candidates. The ADInt has 162,212 nodes and 1,017,284 edges. R-GCN performed best in time slice (MR = 5.2054, Hits@10 = 0.8496) and clinical trial ground truth (MR = 3.4996, Hits@10 = 0.9192) test sets. After evaluation by domain experts, 10 novel dietary supplements and 10 complementary and integrative health were proposed from the score table calculated by R-GCN. Among proposed novel NPIs, we found plausible mechanistic pathways for photodynamic therapy and Choerospondias axillaris to prevent AD, and validated psychotherapy and manual therapy techniques using real-world data analysis. The proposed framework shows potential for discovering new NPIs for AD prevention and understanding their mechanistic pathways.
Collapse
Affiliation(s)
- Yongkang Xiao
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
| | - Yu Hou
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, USA
| | - Huixue Zhou
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
| | - Gayo Diallo
- INRIA SISTM, Team AHeaD - INSERM 1219 Bordeaux Population Health, University of Bordeaux, 33000, Bordeaux, France
| | - Marcelo Fiszman
- NITES - Núcleo de Inovação e Tecnologia Em Saúde, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil
- Semedy Inc, Needham, MA, USA
| | - Julian Wolfson
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Li Zhou
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Halil Kilicoglu
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - You Chen
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Chang Su
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, USA
| | - William G Mantyh
- Department of Neurology, University of Minnesota, Minneapolis, MN, USA
| | - Rui Zhang
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, USA.
| |
Collapse
|
5
|
Yang H, Li M, Zhou H, Xiao Y, Fang Q, Zhang R. One LLM is not Enough: Harnessing the Power of Ensemble Learning for Medical Question Answering. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.21.23300380. [PMID: 38196648 PMCID: PMC10775333 DOI: 10.1101/2023.12.21.23300380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
Objective To enhance the accuracy and reliability of diverse medical question-answering (QA) tasks and investigate efficient approaches deploying the Large Language Models (LLM) technologies, We developed a novel ensemble learning pipeline by utilizing state-of-the-art LLMs, focusing on improving performance on diverse medical QA datasets. Materials and Methods Our study employs three medical QA datasets: PubMedQA, MedQA-USMLE, and MedMCQA, each presenting unique challenges in biomedical question-answering. The proposed LLM-Synergy framework, focusing exclusively on zero-shot cases using LLMs, incorporates two primary ensemble methods. The first is a Boosting-based weighted majority vote ensemble, where decision-making is expedited and refined by assigning variable weights to different LLMs through a boosting algorithm. The second method is Cluster-based Dynamic Model Selection, which dynamically selects the most suitable LLM votes for each query, based on the characteristics of question contexts, using a clustering approach. Results The Majority Weighted Vote and Dynamic Model Selection methods demonstrate superior performance compared to individual LLMs across three medical QA datasets. Specifically, the accuracies are 35.84%, 96.21%, and 37.26% for MedMCQA, PubMedQA, and MedQA-USMLE, respectively, with the Majority Weighted Vote. Correspondingly, the Dynamic Model Selection yields slightly higher accuracies of 38.01%, 96.36%, and 38.13%. Conclusion The LLM-Synergy framework with two ensemble methods, represents a significant advancement in leveraging LLMs for medical QA tasks and provides an innovative way of efficiently utilizing the development with LLM Technologies, customing for both existing and potentially future challenge tasks in biomedical and health informatics research.
Collapse
Affiliation(s)
- Han Yang
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Mingchen Li
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, USA
| | - Huixue Zhou
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Yongkang Xiao
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Qian Fang
- H. Milton Stewart School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Rui Zhang
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
6
|
Zhou H, Li M, Xiao Y, Yang H, Zhang R. LLM Instruction-Example Adaptive Prompting (LEAP) Framework for Clinical Relation Extraction. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.15.23300059. [PMID: 38168203 PMCID: PMC10760264 DOI: 10.1101/2023.12.15.23300059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Objective To investigate the demonstration in Large Language Models (LLMs) for clinical relation extraction. We focus on examining two types of adaptive demonstration: instruction adaptive prompting, and example adaptive prompting to understand their impacts and effectiveness. Materials and Methods The study unfolds in two stages. Initially, we explored a range of demonstration components vital to LLMs' clinical data extraction, such as task descriptions and examples, and tested their combinations. Subsequently, we introduced the Instruction-Example Adaptive Prompting (LEAP) Framework, a system that integrates two types of adaptive prompts: one preceding instruction and another before examples. This framework is designed to systematically explore both adaptive task description and adaptive examples within the demonstration. We evaluated LEAP framework's performance on the DDI and BC5CDR chemical interaction datasets, applying it across LLMs such as Llama2-7b, Llama2-13b, and MedLLaMA_13B. Results The study revealed that Instruction + Options + Examples and its expanded form substantially raised F1-scores over the standard Instruction + Options mode. LEAP framework excelled, especially with example adaptive prompting that outdid traditional instruction tuning across models. Notably, the MedLLAMA-13b model scored an impressive 95.13 F1 on the BC5CDR dataset with this method. Significant improvements were also seen in the DDI 2013 dataset, confirming the method's robustness in sophisticated data extraction. Conclusion The LEAP framework presents a promising avenue for refining LLM training strategies, steering away from extensive finetuning towards more contextually rich and dynamic prompting methodologies.
Collapse
Affiliation(s)
- Huixue Zhou
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Mingchen Li
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, Minnesota, USA
| | - Yongkang Xiao
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Han Yang
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Rui Zhang
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|