1
|
Gao Z, Li L, Ma S, Wang Q, Hemphill L, Xu R. Examining the Potential of ChatGPT on Biomedical Information Retrieval: Fact-Checking Drug-Disease Associations. Ann Biomed Eng 2023:10.1007/s10439-023-03385-w. [PMID: 37855948 DOI: 10.1007/s10439-023-03385-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 10/09/2023] [Indexed: 10/20/2023]
Abstract
Large language models (LLMs) such as ChatGPT have recently attracted significant attention due to their impressive performance on many real-world tasks. These models have also demonstrated the potential in facilitating various biomedical tasks. However, little is known of their potential in biomedical information retrieval, especially identifying drug-disease associations. This study aims to explore the potential of ChatGPT, a popular LLM, in discerning drug-disease associations. We collected 2694 true drug-disease associations and 5662 false drug-disease pairs. Our approach involved creating various prompts to instruct ChatGPT in identifying these associations. Under varying prompt designs, ChatGPT's capability to identify drug-disease associations with an accuracy of 74.6-83.5% and 96.2-97.6% for the true and false pairs, respectively. This study shows that ChatGPT has the potential in identifying drug-disease associations and may serve as a helpful tool in searching pharmacy-related information. However, the accuracy of its insights warrants comprehensive examination before its implementation in medical practice.
Collapse
Affiliation(s)
- Zhenxiang Gao
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Lingyao Li
- School of Information, University of Michigan, Ann Arbor, MI, USA
| | - Siyuan Ma
- Vanderbilt University Medical Center, Nashville, TN, USA
| | - Qinyong Wang
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Libby Hemphill
- School of Information, University of Michigan, Ann Arbor, MI, USA
| | - Rong Xu
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH, USA.
| |
Collapse
|
2
|
Wong M, Previde P, Cole J, Thomas B, Laxmeshwar N, Mallory E, Lever J, Petkovic D, Altman RB, Kulkarni A. Search and visualization of gene-drug-disease interactions for pharmacogenomics and precision medicine research using GeneDive. J Biomed Inform 2021; 117:103732. [PMID: 33737208 DOI: 10.1016/j.jbi.2021.103732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 12/10/2020] [Accepted: 02/28/2021] [Indexed: 10/21/2022]
Abstract
BACKGROUND Understanding the relationships between genes, drugs, and disease states is at the core of pharmacogenomics. Two leading approaches for identifying these relationships in medical literature are: human expert led manual curation efforts, and modern data mining based automated approaches. The former generates small amounts of high-quality data, and the latter offers large volumes of mixed quality data. The algorithmically extracted relationships are often accompanied by supporting evidence, such as, confidence scores, source articles, and surrounding contexts (excerpts) from the articles, that can be used as data quality indicators. Tools that can leverage these quality indicators to help the user gain access to larger and high-quality data are needed. APPROACH We introduce GeneDive, a web application for pharmacogenomics researchers and precision medicine practitioners that makes gene, disease, and drug interactions data easily accessible and usable. GeneDive is designed to meet three key objectives: (1) provide functionality to manage information-overload problem and facilitate easy assimilation of supporting evidence, (2) support longitudinal and exploratory research investigations, and (3) offer integration of user-provided interactions data without requiring data sharing. RESULTS GeneDive offers multiple search modalities, visualizations, and other features that guide the user efficiently to the information of their interest. To facilitate exploratory research, GeneDive makes the supporting evidence and context for each interaction readily available and allows the data quality threshold to be controlled by the user as per their risk tolerance level. The interactive search-visualization loop enables relationship discoveries between diseases, genes, and drugs that might not be explicitly described in literature but are emergent from the source medical corpus and deductive reasoning. The ability to utilize user's data either in combination with the GeneDive native datasets or in isolation promotes richer data-driven exploration and discovery. These functionalities along with GeneDive's applicability for precision medicine, bringing the knowledge contained in biomedical literature to bear on particular clinical situations and improving patient care, are illustrated through detailed use cases. CONCLUSION GeneDive is a comprehensive, broad-use biological interactions browser. The GeneDive application and information about its underlying system architecture are available at http://www.genedive.net. GeneDive Docker image is also available for download at this URL, allowing users to (1) import their own interaction data securely and privately; and (2) generate and test hypotheses across their own and other datasets.
Collapse
Affiliation(s)
- Mike Wong
- COSE Computing for Life Sciences, San Francisco State University, San Francisco, CA, United States
| | - Paul Previde
- Department of Computer Science, San Francisco State University, San Francisco, CA, United States
| | - Jack Cole
- Department of Computer Science, San Francisco State University, San Francisco, CA, United States
| | - Brook Thomas
- Department of Computer Science, San Francisco State University, San Francisco, CA, United States
| | - Nayana Laxmeshwar
- Department of Computer Science, San Francisco State University, San Francisco, CA, United States
| | - Emily Mallory
- Biomedical Informatics Training Program, Stanford University, Palo Alto, CA, United States
| | - Jake Lever
- Postdoctoral Scholar, Stanford University, Palo Alto, CA, United States
| | - Dragutin Petkovic
- Department of Computer Science, San Francisco State University, San Francisco, CA, United States; COSE Computing for Life Sciences, San Francisco State University, San Francisco, CA, United States
| | - Russ B Altman
- Department of Bioengineering, Department of Genetics, and School of Medicine, Stanford University, Palo Alto, CA, United States
| | - Anagha Kulkarni
- Department of Computer Science, San Francisco State University, San Francisco, CA, United States.
| |
Collapse
|
3
|
Xu B, Lin H, Yang L, Xu K, Zhang Y, Zhang D, Yang Z, Wang J, Lin Y, Yin F. A supervised term ranking model for diversity enhanced biomedical information retrieval. BMC Bioinformatics 2019; 20:590. [PMID: 31787087 PMCID: PMC6886246 DOI: 10.1186/s12859-019-3080-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Background The number of biomedical research articles have increased exponentially with the advancement of biomedicine in recent years. These articles have thus brought a great difficulty in obtaining the needed information of researchers. Information retrieval technologies seek to tackle the problem. However, information needs cannot be completely satisfied by directly introducing the existing information retrieval techniques. Therefore, biomedical information retrieval not only focuses on the relevance of search results, but also aims to promote the completeness of the results, which is referred as the diversity-oriented retrieval. Results We address the diversity-oriented biomedical retrieval task using a supervised term ranking model. The model is learned through a supervised query expansion process for term refinement. Based on the model, the most relevant and diversified terms are selected to enrich the original query. The expanded query is then fed into a second retrieval to improve the relevance and diversity of search results. To this end, we propose three diversity-oriented optimization strategies in our model, including the diversified term labeling strategy, the biomedical resource-based term features and a diversity-oriented group sampling learning method. Experimental results on TREC Genomics collections demonstrate the effectiveness of the proposed model in improving the relevance and the diversity of search results. Conclusions The proposed three strategies jointly contribute to the improvement of biomedical retrieval performance. Our model yields more relevant and diversified results than the state-of-the-art baseline models. Moreover, our method provides a general framework for improving biomedical retrieval performance, and can be used as the basis for future work.
Collapse
Affiliation(s)
- Bo Xu
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China. .,State Key Laboratory of Cognitive Intelligence,iFLYTEK, Hefei, People's Republic of China.
| | - Hongfei Lin
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China.
| | - Liang Yang
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China
| | - Kan Xu
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China
| | - Yijia Zhang
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China
| | - Dongyu Zhang
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China
| | - Zhihao Yang
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China
| | - Jian Wang
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China
| | - Yuan Lin
- WISE Lab, School of Public Administration and Law, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China
| | - Fuliang Yin
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China
| |
Collapse
|
4
|
Ševa J, Wiegandt DL, Götze J, Lamping M, Rieke D, Schäfer R, Jähnichen P, Kittner M, Pallarz S, Starlinger J, Keilholz U, Leser U. VIST - a Variant-Information Search Tool for precision oncology. BMC Bioinformatics 2019; 20:429. [PMID: 31419935 PMCID: PMC6697931 DOI: 10.1186/s12859-019-2958-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 06/18/2019] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Diagnosis and treatment decisions in cancer increasingly depend on a detailed analysis of the mutational status of a patient's genome. This analysis relies on previously published information regarding the association of variations to disease progression and possible interventions. Clinicians to a large degree use biomedical search engines to obtain such information; however, the vast majority of scientific publications focus on basic science and have no direct clinical impact. We develop the Variant-Information Search Tool (VIST), a search engine designed for the targeted search of clinically relevant publications given an oncological mutation profile. RESULTS VIST indexes all PubMed abstracts and content from ClinicalTrials.gov. It applies advanced text mining to identify mentions of genes, variants and drugs and uses machine learning based scoring to judge the clinical relevance of indexed abstracts. Its functionality is available through a fast and intuitive web interface. We perform several evaluations, showing that VIST's ranking is superior to that of PubMed or a pure vector space model with regard to the clinical relevance of a document's content. CONCLUSION Different user groups search repositories of scientific publications with different intentions. This diversity is not adequately reflected in the standard search engines, often leading to poor performance in specialized settings. We develop a search engine for the specific case of finding documents that are clinically relevant in the course of cancer treatment. We believe that the architecture of our engine, heavily relying on machine learning algorithms, can also act as a blueprint for search engines in other, equally specific domains. VIST is freely available at https://vist.informatik.hu-berlin.de/.
Collapse
Affiliation(s)
- Jurica Ševa
- Knowledge Management in Bioinformatics, Department of Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, Berlin, 12489, Germany
| | - David Luis Wiegandt
- Knowledge Management in Bioinformatics, Department of Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, Berlin, 12489, Germany
| | - Julian Götze
- University Hospital Tübingen, Hoppe-Seyler-Straße 3, Tübingen, 72076, Germany
| | - Mario Lamping
- Charité Comprehensive Cancer Center, Charitéplatz 1, Berlin, 10117, Germany
| | - Damian Rieke
- Charité Comprehensive Cancer Center, Charitéplatz 1, Berlin, 10117, Germany
- Department of Hematology and Medical Oncology, Campus Benjamin Franklin, Charité Unviersitätsmedizin Berlin, Hindenburgdamm 30, Berlin, 12203, Germany
- Berlin Institute of Health, Kapelle-Ufer 2, Berlin, 10117, Germany
| | - Reinhold Schäfer
- Charité Comprehensive Cancer Center, Charitéplatz 1, Berlin, 10117, Germany
- German Cancer Consortium (DKTK), DKFZ Heidelberg, Im Neuenheimer Feld 280, Heidelberg, 69120, Germany
| | - Patrick Jähnichen
- Knowledge Management in Bioinformatics, Department of Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, Berlin, 12489, Germany
| | - Madeleine Kittner
- Knowledge Management in Bioinformatics, Department of Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, Berlin, 12489, Germany
| | - Steffen Pallarz
- Knowledge Management in Bioinformatics, Department of Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, Berlin, 12489, Germany
| | - Johannes Starlinger
- Knowledge Management in Bioinformatics, Department of Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, Berlin, 12489, Germany
| | - Ulrich Keilholz
- Charité Comprehensive Cancer Center, Charitéplatz 1, Berlin, 10117, Germany
| | - Ulf Leser
- Knowledge Management in Bioinformatics, Department of Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, Berlin, 12489, Germany.
| |
Collapse
|
5
|
Wang Y, Wu S, Li D, Mehrabi S, Liu H. A Part-Of-Speech term weighting scheme for biomedical information retrieval. J Biomed Inform 2016; 63:379-389. [PMID: 27593166 DOI: 10.1016/j.jbi.2016.08.026] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Revised: 08/30/2016] [Accepted: 08/31/2016] [Indexed: 11/24/2022]
Abstract
In the era of digitalization, information retrieval (IR), which retrieves and ranks documents from large collections according to users' search queries, has been popularly applied in the biomedical domain. Building patient cohorts using electronic health records (EHRs) and searching literature for topics of interest are some IR use cases. Meanwhile, natural language processing (NLP), such as tokenization or Part-Of-Speech (POS) tagging, has been developed for processing clinical documents or biomedical literature. We hypothesize that NLP can be incorporated into IR to strengthen the conventional IR models. In this study, we propose two NLP-empowered IR models, POS-BoW and POS-MRF, which incorporate automatic POS-based term weighting schemes into bag-of-word (BoW) and Markov Random Field (MRF) IR models, respectively. In the proposed models, the POS-based term weights are iteratively calculated by utilizing a cyclic coordinate method where golden section line search algorithm is applied along each coordinate to optimize the objective function defined by mean average precision (MAP). In the empirical experiments, we used the data sets from the Medical Records track in Text REtrieval Conference (TREC) 2011 and 2012 and the Genomics track in TREC 2004. The evaluation on TREC 2011 and 2012 Medical Records tracks shows that, for the POS-BoW models, the mean improvement rates for IR evaluation metrics, MAP, bpref, and P@10, are 10.88%, 4.54%, and 3.82%, compared to the BoW models; and for the POS-MRF models, these rates are 13.59%, 8.20%, and 8.78%, compared to the MRF models. Additionally, we experimentally verify that the proposed weighting approach is superior to the simple heuristic and frequency based weighting approaches, and validate our POS category selection. Using the optimal weights calculated in this experiment, we tested the proposed models on the TREC 2004 Genomics track and obtained average of 8.63% and 10.04% improvement rates for POS-BoW and POS-MRF, respectively. These significant improvements verify the effectiveness of leveraging POS tagging for biomedical IR tasks.
Collapse
Affiliation(s)
- Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| | - Stephen Wu
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health and Science University, Portland, OR, USA.
| | - Dingcheng Li
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| | - Saeed Mehrabi
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
6
|
Abdulla AAA, Lin H, Xu B, Banbhrani SK. Improving biomedical information retrieval by linear combinations of different query expansion techniques. BMC Bioinformatics 2016; 17 Suppl 7:238. [PMID: 27455377 PMCID: PMC4965722 DOI: 10.1186/s12859-016-1092-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user’s needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user’s information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries. Results The retrieval performance is measured by some variants of MAP (Mean Average Precision) and according to our experimental results, the combination of best results of query expansion is enhanced the retrieved documents and outperforms our baseline by 21.06 %, even it outperforms a previous study by 7.12 %. Conclusions We propose several query expansion techniques and their combinations (linearly) to make user queries more cognizable to search engines and to produce higher-quality search results.
Collapse
Affiliation(s)
- Ahmed AbdoAziz Ahmed Abdulla
- School of Computer Science and Technology, Dalian University of Technology, No.2 Linggong Rd., Dalian, 116024, People's Republic of China
| | - Hongfei Lin
- School of Computer Science and Technology, Dalian University of Technology, No.2 Linggong Rd., Dalian, 116024, People's Republic of China.
| | - Bo Xu
- School of Computer Science and Technology, Dalian University of Technology, No.2 Linggong Rd., Dalian, 116024, People's Republic of China
| | - Santosh Kumar Banbhrani
- School of Computer Science and Technology, Dalian University of Technology, No.2 Linggong Rd., Dalian, 116024, People's Republic of China
| |
Collapse
|