1
|
Aftab W, Apostolou Z, Bouazoune K, Straub T. Optimizing biomedical information retrieval with a keyword frequency-driven prompt enhancement strategy. BMC Bioinformatics 2024; 25:281. [PMID: 39192204 DOI: 10.1186/s12859-024-05902-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 08/15/2024] [Indexed: 08/29/2024] Open
Abstract
BACKGROUND Mining the vast pool of biomedical literature to extract accurate responses and relevant references is challenging due to the domain's interdisciplinary nature, specialized jargon, and continuous evolution. Early natural language processing (NLP) approaches often led to incorrect answers as they failed to comprehend the nuances of natural language. However, transformer models have significantly advanced the field by enabling the creation of large language models (LLMs), enhancing question-answering (QA) tasks. Despite these advances, current LLM-based solutions for specialized domains like biology and biomedicine still struggle to generate up-to-date responses while avoiding "hallucination" or generating plausible but factually incorrect responses. RESULTS Our work focuses on enhancing prompts using a retrieval-augmented architecture to guide LLMs in generating meaningful responses for biomedical QA tasks. We evaluated two approaches: one relying on text embedding and vector similarity in a high-dimensional space, and our proposed method, which uses explicit signals in user queries to extract meaningful contexts. For robust evaluation, we tested these methods on 50 specific and challenging questions from diverse biomedical topics, comparing their performance against a baseline model, BM25. Retrieval performance of our method was significantly better than others, achieving a median Precision@10 of 0.95, which indicates the fraction of the top 10 retrieved chunks that are relevant. We used GPT-4, OpenAI's most advanced LLM to maximize the answer quality and manually accessed LLM-generated responses. Our method achieved a median answer quality score of 2.5, surpassing both the baseline model and the text embedding-based approach. We developed a QA bot, WeiseEule ( https://github.com/wasimaftab/WeiseEule-LocalHost ), which utilizes these methods for comparative analysis and also offers advanced features for review writing and identifying relevant articles for citation. CONCLUSIONS Our findings highlight the importance of prompt enhancement methods that utilize explicit signals in user queries over traditional text embedding-based approaches to improve LLM-generated responses for specialized queries in specialized domains such as biology and biomedicine. By providing users complete control over the information fed into the LLM, our approach addresses some of the major drawbacks of existing web-based chatbots and LLM-based QA systems, including hallucinations and the generation of irrelevant or outdated responses.
Collapse
Affiliation(s)
- Wasim Aftab
- Core Facility Bioinformatics, Biomedical Center, LMU Munich, Grosshaderner Str. 9, 82152, Martinsried, Germany.
| | - Zivkos Apostolou
- Molecular Biology Division, Biomedical Center, LMU Munich, Grosshaderner Str. 9, 82152, Martinsried, Germany
| | - Karim Bouazoune
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Tobias Straub
- Core Facility Bioinformatics, Biomedical Center, LMU Munich, Grosshaderner Str. 9, 82152, Martinsried, Germany.
| |
Collapse
|
2
|
Rojas-Carabali W, Agrawal R, Gutierrez-Sinisterra L, Baxter SL, Cifuentes-González C, Wei YC, Abisheganaden J, Kannapiran P, Wong S, Lee B, de-la-Torre A, Agrawal R. Natural Language Processing in medicine and ophthalmology: A review for the 21st-century clinician. Asia Pac J Ophthalmol (Phila) 2024; 13:100084. [PMID: 39059557 DOI: 10.1016/j.apjo.2024.100084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 07/08/2024] [Accepted: 07/19/2024] [Indexed: 07/28/2024] Open
Abstract
Natural Language Processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and human language, enabling computers to understand, generate, and derive meaning from human language. NLP's potential applications in the medical field are extensive and vary from extracting data from Electronic Health Records -one of its most well-known and frequently exploited uses- to investigating relationships among genetics, biomarkers, drugs, and diseases for the proposal of new medications. NLP can be useful for clinical decision support, patient monitoring, or medical image analysis. Despite its vast potential, the real-world application of NLP is still limited due to various challenges and constraints, meaning that its evolution predominantly continues within the research domain. However, with the increasingly widespread use of NLP, particularly with the availability of large language models, such as ChatGPT, it is crucial for medical professionals to be aware of the status, uses, and limitations of these technologies.
Collapse
Affiliation(s)
- William Rojas-Carabali
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore; Tan Tock Seng Hospital, National Healthcare Group Eye Institute, Singapore
| | - Rajdeep Agrawal
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | | | - Sally L Baxter
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, CA, USA; Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Carlos Cifuentes-González
- Neuroscience Research Group (NEUROS), Neurovitae Center for Neuroscience, Institute of Translational Medicine (IMT), Escuela de Medicina y Ciencias de la Salud, Universidad del Rosario, Bogotá, Colombia
| | - Yap Chun Wei
- Health Services and Outcomes Research, National Healthcare Group, Singapore
| | - John Abisheganaden
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore; Health Services and Outcomes Research, National Healthcare Group, Singapore; Department of Respiratory Medicine, Tan Tock Seng Hospital, Singapore
| | | | - Sunny Wong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | - Bernett Lee
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | - Alejandra de-la-Torre
- Neuroscience Research Group (NEUROS), Neurovitae Center for Neuroscience, Institute of Translational Medicine (IMT), Escuela de Medicina y Ciencias de la Salud, Universidad del Rosario, Bogotá, Colombia
| | - Rupesh Agrawal
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore; Tan Tock Seng Hospital, National Healthcare Group Eye Institute, Singapore; Singapore Eye Research Institute, Singapore; Duke NUS Medical School, National University of Singapore, Singapore.
| |
Collapse
|
3
|
Kell G, Roberts A, Umansky S, Qian L, Ferrari D, Soboczenski F, Wallace BC, Patel N, Marshall IJ. Question answering systems for health professionals at the point of care-a systematic review. J Am Med Inform Assoc 2024; 31:1009-1024. [PMID: 38366879 PMCID: PMC10990539 DOI: 10.1093/jamia/ocae015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 01/11/2024] [Accepted: 01/15/2024] [Indexed: 02/18/2024] Open
Abstract
OBJECTIVES Question answering (QA) systems have the potential to improve the quality of clinical care by providing health professionals with the latest and most relevant evidence. However, QA systems have not been widely adopted. This systematic review aims to characterize current medical QA systems, assess their suitability for healthcare, and identify areas of improvement. MATERIALS AND METHODS We searched PubMed, IEEE Xplore, ACM Digital Library, ACL Anthology, and forward and backward citations on February 7, 2023. We included peer-reviewed journal and conference papers describing the design and evaluation of biomedical QA systems. Two reviewers screened titles, abstracts, and full-text articles. We conducted a narrative synthesis and risk of bias assessment for each study. We assessed the utility of biomedical QA systems. RESULTS We included 79 studies and identified themes, including question realism, answer reliability, answer utility, clinical specialism, systems, usability, and evaluation methods. Clinicians' questions used to train and evaluate QA systems were restricted to certain sources, types and complexity levels. No system communicated confidence levels in the answers or sources. Many studies suffered from high risks of bias and applicability concerns. Only 8 studies completely satisfied any criterion for clinical utility, and only 7 reported user evaluations. Most systems were built with limited input from clinicians. DISCUSSION While machine learning methods have led to increased accuracy, most studies imperfectly reflected real-world healthcare information needs. Key research priorities include developing more realistic healthcare QA datasets and considering the reliability of answer sources, rather than merely focusing on accuracy.
Collapse
Affiliation(s)
- Gregory Kell
- Department of Population Health Sciences, King’s College London, London, Greater London, SE1 1UL, United Kingdom
| | - Angus Roberts
- Department of Biostatistics and Health Informatics, King’s College London, London, Greater London, SE5 8AB, United Kingdom
| | - Serge Umansky
- Metadvice Ltd, London, Greater London, SW1Y 5JG, United Kingdom
| | - Linglong Qian
- Department of Biostatistics and Health Informatics, King’s College London, London, Greater London, SE5 8AB, United Kingdom
| | - Davide Ferrari
- Department of Population Health Sciences, King’s College London, London, Greater London, SE1 1UL, United Kingdom
| | - Frank Soboczenski
- Department of Population Health Sciences, King’s College London, London, Greater London, SE1 1UL, United Kingdom
| | - Byron C Wallace
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Nikhil Patel
- Department of Population Health Sciences, King’s College London, London, Greater London, SE1 1UL, United Kingdom
| | - Iain J Marshall
- Department of Population Health Sciences, King’s College London, London, Greater London, SE1 1UL, United Kingdom
| |
Collapse
|
4
|
Wijesinghe YV, Xu Y, Li Y, Zhang Q. A phrase-based questionnaire-answering approach for automatic initial frailty assessment based on clinical notes. Comput Biol Med 2024; 170:108043. [PMID: 38330821 DOI: 10.1016/j.compbiomed.2024.108043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Revised: 12/31/2023] [Accepted: 01/26/2024] [Indexed: 02/10/2024]
Abstract
Frailty stands out as a particularly challenging multidimensional geriatric syndrome in the elderly population, often resulting in diminished quality of life and heightened mortality risk. Negative consequences encompass a heightened likelihood of hospitalization and institutionalization, as well as suboptimal post-hospitalization outcomes and elevated mortality rates. Using a questionnaire-based approach for assessing frailty has been shown to be an effective method for early diagnosis of frailty. Nonetheless, the majority of current frailty assessment tools necessitate in-person consultations. This poses a significant challenge for elderly patients residing in rural areas, who often encounter difficulties in accessing healthcare compared to their urban or suburban counterparts. Additionally, elderly patients face an elevated risk of contracting diseases as a result of frequent hospital visits, given that many of them are immunocompromised. An automated initial frailty assessment approach can help mitigate the challenges mentioned above and conserve clinical resources by circumventing the need for extensive manual assessments. The primary aim of this paper is to introduce an automatic initial frailty assessment method. This method efficiently identifies individuals who may necessitate further frailty evaluation by automatically extracting relevant information from a patient's clinical notes and using it to complete the Tillburg Frailty Indicator (TFI) questionnaire. The introduced phrase-based query expansion technique is designed to identify the most pertinent phrases related to the frailty assessment questionnaire using Unified Medical Language System (UMLS) ontology and incorporates information from clinical notes to enhance its accuracy. Additionally, a method for retrieving pertinent clinical notes to automatically facilitate the frailty assessment process based on the identified phrases was also proposed. The proposed approaches are evaluated using a dataset containing a collection of clinical notes from elderly patients, assessing their effectiveness in terms of automating frailty assessment and question-answering tasks. This research underscores the significance of incorporating phrases as features in the automated frailty assessment process using clinical notes. The research empowers clinicians to conduct automatic frailty assessments utilizing medical data, thereby reducing the need for frequent hospital visits and in-patient consultations. This becomes particularly valuable during unusual or unexpected situations, such as the COVID-19 pandemic, where minimizing in-person interactions is crucial.
Collapse
Affiliation(s)
- Yashodhya V Wijesinghe
- Queensland University of Technology, School of Computer Science, Brisbane, 4000, QLD, Australia
| | - Yue Xu
- Queensland University of Technology, School of Computer Science, Brisbane, 4000, QLD, Australia.
| | - Yuefeng Li
- Queensland University of Technology, School of Computer Science, Brisbane, 4000, QLD, Australia
| | - Qing Zhang
- The Australian e-Health Research Centre, CSIRO, Brisbane, 4029, QLD, Australia
| |
Collapse
|
5
|
Oliveira Dos Santos Á, Sergio da Silva E, Machado Couto L, Valadares Labanca Reis G, Silva Belo V. The use of artificial intelligence for automating or semi-automating biomedical literature analyses: a scoping review. J Biomed Inform 2023; 142:104389. [PMID: 37187321 DOI: 10.1016/j.jbi.2023.104389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 04/11/2023] [Accepted: 05/08/2023] [Indexed: 05/17/2023]
Abstract
OBJECTIVE Evidence-based medicine (EBM) is a decision-making process based on the conscious and judicious use of the best available scientific evidence. However, the exponential increase in the amount of information currently available likely exceeds the capacity of human-only analysis. In this context, artificial intelligence (AI) and its branches such as machine learning (ML) can be used to facilitate human efforts in analyzing the literature to foster EBM. The present scoping review aimed to examine the use of AI in the automation of biomedical literature survey and analysis with a view to establishing the state-of-the-art and identifying knowledge gaps. MATERIALS AND METHODS Comprehensive searches of the main databases were performed for articles published up to June 2022 and studies were selected according to inclusion and exclusion criteria. Data were extracted from the included articles and the findings categorized. RESULTS The total number of records retrieved from the databases was 12,145, of which 273 were included in the review. Classification of the studies according to the use of AI in evaluating the biomedical literature revealed three main application groups, namely assembly of scientific evidence (n=127; 47%), mining the biomedical literature (n=112; 41%) and quality analysis (n=34; 12%). Most studies addressed the preparation of systematic reviews, while articles focusing on the development of guidelines and evidence synthesis were the least frequent. The biggest knowledge gap was identified within the quality analysis group, particularly regarding methods and tools that assess the strength of recommendation and consistency of evidence. CONCLUSION Our review shows that, despite significant progress in the automation of biomedical literature surveys and analyses in recent years, intense research is needed to fill knowledge gaps on more difficult aspects of ML, deep learning and natural language processing, and to consolidate the use of automation by end-users (biomedical researchers and healthcare professionals).
Collapse
Affiliation(s)
| | - Eduardo Sergio da Silva
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| | - Letícia Machado Couto
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| | | | - Vinícius Silva Belo
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| |
Collapse
|
6
|
Humayun MA, Yassin H, Shuja J, Alourani A, Abas PE. A transformer fine-tuning strategy for text dialect identification. Neural Comput Appl 2023; 35:6115-6124. [PMID: 36408287 PMCID: PMC9665018 DOI: 10.1007/s00521-022-07944-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Accepted: 10/11/2022] [Indexed: 11/16/2022]
Abstract
Online medical consultation can significantly improve the efficiency of primary health care. Recently, many online medical question-answer services have been developed that connect the patients with relevant medical consultants based on their questions. Considering the linguistic variety in their question, social background identification of patients can improve the referral system by selecting a medical consultant with a similar social origin for efficient communication. This paper has proposed a novel fine-tuning strategy for the pre-trained transformers to identify the social origin of text authors. When fused with the existing adapter model, the proposed methods achieve an overall accuracy of 53.96% for the Arabic dialect identification task on the Nuanced Arabic Dialect Identification (NADI) dataset. The overall accuracy is 0.54% higher than the previous best for the same dataset, which establishes the utility of custom fine-tuning strategies for pre-trained transformer models.
Collapse
Affiliation(s)
- Mohammad Ali Humayun
- Faculty of Integrated Technologies, Universiti Brunei Darussalam, Jalan Tungku Link, Gadong, Brunei Darussalam
| | - Hayati Yassin
- Faculty of Integrated Technologies, Universiti Brunei Darussalam, Jalan Tungku Link, Gadong, Brunei Darussalam
| | - Junaid Shuja
- Department of Computer Science, National University of Computer and Emerging Sciences, Karachi, Pakistan
| | - Abdullah Alourani
- Department of Computer Science and Information, Majmaah University, Al Majma’ah, Saudi Arabia
| | - Pg Emeroylariffion Abas
- Faculty of Integrated Technologies, Universiti Brunei Darussalam, Jalan Tungku Link, Gadong, Brunei Darussalam
| |
Collapse
|
7
|
Bai J, Yin C, Zhang J, Wang Y, Dong Y, Rong W, Xiong Z. Adversarial Knowledge Distillation Based Biomedical Factoid Question Answering. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:106-118. [PMID: 35316189 DOI: 10.1109/tcbb.2022.3161032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Biomedical factoid question answering is an essential application for biomedical information sharing. Recently, neural network based approaches have shown remarkable performance for this task. However, due to the scarcity of annotated data which requires intensive knowledge of expertise, training a robust model on limited-scale biomedical datasets remains a challenge. Previous works solve this problem by introducing useful knowledge. It is found that the interaction between question and answer (QA-interaction) is also a kind of knowledge which could help extract answer accurately. This research develops a knowledge distillation framework for biomedical factoid question answering, in which a teacher model as the knowledge source of QA-interaction is designed to enhance the student model. In addition, to further alleviate the problem of limited-scale dataset, a novel adversarial knowledge distillation technique is proposed to robustly distill the knowledge from teacher model to student model by constructing perturbed examples as additional training data. By forcing the student model to mimic the predicted distributions of teacher model on both original examples and perturbed examples, the knowledge of QA-interaction can be learned by student model. We evaluate the proposed framework on the widely used BioASQ datasets, and experimental results have shown the proposed method's promising potential.
Collapse
|
8
|
Liu S, Zhang X, Zhou X, Yang J. BPI-MVQA: a bi-branch model for medical visual question answering. BMC Med Imaging 2022; 22:79. [PMID: 35488285 PMCID: PMC9052498 DOI: 10.1186/s12880-022-00800-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 04/13/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Visual question answering in medical domain (VQA-Med) exhibits great potential for enhancing confidence in diagnosing diseases and helping patients better understand their medical conditions. One of the challenges in VQA-Med is how to better understand and combine the semantic features of medical images (e.g., X-rays, Magnetic Resonance Imaging(MRI)) and answer the corresponding questions accurately in unlabeled medical datasets. METHOD We propose a novel Bi-branched model based on Parallel networks and Image retrieval for Medical Visual Question Answering (BPI-MVQA). The first branch of BPI-MVQA is a transformer structure based on a parallel network to achieve complementary advantages in image sequence feature and spatial feature extraction, and multi-modal features are implicitly fused by using the multi-head self-attention mechanism. The second branch is retrieving the similarity of image features generated by the VGG16 network to obtain similar text descriptions as labels. RESULT The BPI-MVQA model achieves state-of-the-art results on three VQA-Med datasets, and the main metric scores exceed the best results so far by 0.2[Formula: see text], 1.4[Formula: see text], and 1.1[Formula: see text]. CONCLUSION The evaluation results support the effectiveness of the BPI-MVQA model in VQA-Med. The design of the bi-branch structure helps the model answer different types of visual questions. The parallel network allows for multi-angle image feature extraction, a unique feature extraction method that helps the model better understand the semantic information of the image and achieve greater accuracy in the multi-classification of VQA-Med. In addition, image retrieval helps the model answer irregular, open-ended type questions from the perspective of understanding the information provided by images. The comparison of our method with state-of-the-art methods on three datasets also shows that our method can bring substantial improvement to the VQA-Med system.
Collapse
Affiliation(s)
- Shengyan Liu
- Kunming Shipborne Equipment Research and Test Center, Kunming, 650106, People's Republic of China
| | - Xuejie Zhang
- School of Information Science and Engineering, Yunnan University, No. 2, North Cuihu Road, Kunming, 650091, People's Republic of China
| | - Xiaobing Zhou
- School of Information Science and Engineering, Yunnan University, No. 2, North Cuihu Road, Kunming, 650091, People's Republic of China.
| | - Jian Yang
- School of Information Science and Engineering, Yunnan University, No. 2, North Cuihu Road, Kunming, 650091, People's Republic of China
| |
Collapse
|
9
|
Yin Y, Zhang L, Wang Y, Wang M, Zhang Q, Li GZ. Question Answering System Based on Knowledge Graph in Traditional Chinese Medicine Diagnosis and Treatment of Viral Hepatitis B. BIOMED RESEARCH INTERNATIONAL 2022; 2022:7139904. [PMID: 35198638 PMCID: PMC8860556 DOI: 10.1155/2022/7139904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 12/31/2021] [Indexed: 11/30/2022]
Abstract
This article uses the real medical records and web pages of Chinese medicine diagnosis and treatment of hepatitis B to extract structured medical knowledge, and obtains a total of 8,563 entities, 96,896 relationships, 32 entity types, and 40 relationship types. The structured data was stored in the Neo4j graph structure database, and a knowledge graph of Chinese medical diagnosis and treatment of hepatitis B was constructed. The knowledge map is used as a structured data source to provide high-quality knowledge information for the medical question and answer system based on hepatitis B disease. Applying the deep learning method to the question identification and knowledge response of the question answering system makes the hepatitis B medical intelligent question answering system has important research and application significance. The question-and-answer system takes aim at hepatitis B, a public health problem in the world and leverages the advantages of traditional Chinese medicine for diagnosis and treatment. It provides a reference for doctors' disease diagnosis, treatment, and patient self-care. Its value is important for the treatment of hepatitis B disease.
Collapse
Affiliation(s)
- Yating Yin
- Institute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical Science, Beijing 100700, China
| | - Lei Zhang
- National Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical, China
| | - Yiguo Wang
- Experimental Research Center, China Academy of Chinese Medical Science, Beijing 100700, China
| | - Mingqiang Wang
- Institute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical Science, Beijing 100700, China
| | - Qiming Zhang
- Experimental Research Center, China Academy of Chinese Medical Science, Beijing 100700, China
| | - Guo-zheng Li
- Information Office, Henan University of Chinese Medicine, Zhengzhou 450046, China
| |
Collapse
|
10
|
Kell G, Marshall IJ, Wallace BC, Jaun A. What Would it Take to get Biomedical QA Systems into Practice? PROCEEDINGS OF THE CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING. CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING 2021; 2021:28-41. [PMID: 35663506 PMCID: PMC9162079 DOI: 10.18653/v1/2021.mrqa-1.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Medical question answering (QA) systems have the potential to answer clinicians' uncertainties about treatment and diagnosis on-demand, informed by the latest evidence. However, despite the significant progress in general QA made by the NLP community, medical QA systems are still not widely used in clinical environments. One likely reason for this is that clinicians may not readily trust QA system outputs, in part because transparency, trustworthiness, and provenance have not been key considerations in the design of such models. In this paper we discuss a set of criteria that, if met, we argue would likely increase the utility of biomedical QA systems, which may in turn lead to adoption of such systems in practice. We assess existing models, tasks, and datasets with respect to these criteria, highlighting shortcomings of previously proposed approaches and pointing toward what might be more usable QA systems.
Collapse
|
11
|
Sarker A, Yang YC, Al-Garadi MA, Abbas A. A Light-Weight Text Summarization System for Fast Access to Medical Evidence. Front Digit Health 2021; 2:585559. [PMID: 34713057 PMCID: PMC8521877 DOI: 10.3389/fdgth.2020.585559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 11/13/2020] [Indexed: 11/13/2022] Open
Abstract
As the volume of published medical research continues to grow rapidly, staying up-to-date with the best-available research evidence regarding specific topics is becoming an increasingly challenging problem for medical experts and researchers. The current COVID19 pandemic is a good example of a topic on which research evidence is rapidly evolving. Automatic query-focused text summarization approaches may help researchers to swiftly review research evidence by presenting salient and query-relevant information from newly-published articles in a condensed manner. Typical medical text summarization approaches require domain knowledge, and the performances of such systems rely on resource-heavy medical domain-specific knowledge sources and pre-processing methods (e.g., text classification) for deriving semantic information. Consequently, these systems are often difficult to speedily customize, extend, or deploy in low-resource settings, and they are often operationally slow. In this paper, we propose a fast and simple extractive summarization approach that can be easily deployed and run, and may thus aid medical experts and researchers obtain fast access to the latest research evidence. At runtime, our system utilizes similarity measurements derived from pre-trained medical domain-specific word embeddings in addition to simple features, rather than computationally-expensive pre-processing and resource-heavy knowledge bases. Automatic evaluation using ROUGE-a summary evaluation tool-on a public dataset for evidence-based medicine shows that our system's performance, despite the simple implementation, is statistically comparable with the state-of-the-art. Extrinsic manual evaluation based on recently-released COVID19 articles demonstrates that the summarizer performance is close to human agreement, which is generally low, for extractive summarization.
Collapse
Affiliation(s)
- Abeed Sarker
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, United States.,Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, United States
| | - Yuan-Chi Yang
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, United States
| | - Mohammed Ali Al-Garadi
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, United States
| | - Aamir Abbas
- Heinz College of Information Systems and Public Policy, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
12
|
Applying text similarity algorithm to analyze the triangular citation behavior of scientists. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107362] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
13
|
Zhang Y, Zhang Y, Qi P, Manning CD, Langlotz CP. Biomedical and clinical English model packages for the Stanza Python NLP library. J Am Med Inform Assoc 2021; 28:1892-1899. [PMID: 34157094 PMCID: PMC8363782 DOI: 10.1093/jamia/ocab090] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 04/05/2021] [Accepted: 05/03/2021] [Indexed: 11/13/2022] Open
Abstract
Objective The study sought to develop and evaluate neural natural language processing (NLP) packages for the syntactic analysis and named entity recognition of biomedical and clinical English text. Materials and Methods We implement and train biomedical and clinical English NLP pipelines by extending the widely used Stanza library originally designed for general NLP tasks. Our models are trained with a mix of public datasets such as the CRAFT treebank as well as with a private corpus of radiology reports annotated with 5 radiology-domain entities. The resulting pipelines are fully based on neural networks, and are able to perform tokenization, part-of-speech tagging, lemmatization, dependency parsing, and named entity recognition for both biomedical and clinical text. We compare our systems against popular open-source NLP libraries such as CoreNLP and scispaCy, state-of-the-art models such as the BioBERT models, and winning systems from the BioNLP CRAFT shared task. Results For syntactic analysis, our systems achieve much better performance compared with the released scispaCy models and CoreNLP models retrained on the same treebanks, and are on par with the winning system from the CRAFT shared task. For NER, our systems substantially outperform scispaCy, and are better or on par with the state-of-the-art performance from BioBERT, while being much more computationally efficient. Conclusions We introduce biomedical and clinical NLP packages built for the Stanza library. These packages offer performance that is similar to the state of the art, and are also optimized for ease of use. To facilitate research, we make all our models publicly available. We also provide an online demonstration (http://stanza.run/bio).
Collapse
Affiliation(s)
- Yuhao Zhang
- Biomedical Informatics Training Program, Stanford University, Stanford, California, USA
| | - Yuhui Zhang
- Computer Science Department, Stanford University, Stanford, California, USA
| | - Peng Qi
- Computer Science Department, Stanford University, Stanford, California, USA
| | - Christopher D Manning
- Computer Science and Linguistics Departments, Stanford University, Stanford, California, USA
| | - Curtis P Langlotz
- Department of Radiology, Stanford University, Stanford, California, USA
| |
Collapse
|
14
|
A Review on Medical Textual Question Answering Systems Based on Deep Learning Approaches. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11125456] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The advent of Question Answering Systems (QASs) has been envisaged as a promising solution and an efficient approach for retrieving significant information over the Internet. A considerable amount of research work has focused on open domain QASs based on deep learning techniques due to the availability of data sources. However, the medical domain receives less attention due to the shortage of medical datasets. Although Electronic Health Records (EHRs) are empowering the field of Medical Question-Answering (MQA) by providing medical information to answer user questions, the gap is still large in the medical domain, especially for textual-based sources. Therefore, in this study, the medical textual question-answering systems based on deep learning approaches were reviewed, and recent architectures of MQA systems were thoroughly explored. Furthermore, an in-depth analysis of deep learning approaches used in different MQA system tasks was provided. Finally, the different critical challenges posed by MQA systems were highlighted, and recommendations to effectively address them in forthcoming MQA systems were given out.
Collapse
|
15
|
Wallace BC, Saha S, Soboczenski F, Marshall IJ. Generating (Factual?) Narrative Summaries of RCTs: Experiments with Neural Multi-Document Summarization. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2021; 2021:605-614. [PMID: 34457176 PMCID: PMC8378607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We consider the problem of automatically generating a narrative biomedical evidence summary from multiple trial reports. We evaluate modern neural models for abstractive summarization of relevant article abstracts from systematic reviews previously conducted by members of the Cochrane collaboration, using the authors conclusions section of the review abstract as our target. We enlist medical professionals to evaluate generated summaries, and we find that summarization systems yield consistently fluent and relevant synopses, but these often contain factual inaccuracies. We propose new approaches that capitalize on domain-specific models to inform summarization, e.g., by explicitly demarcating snippets of inputs that convey key findings, and emphasizing the reports of large and high-quality trials. We find that these strategies modestly improve the factual accuracy of generated summaries. Finally, we propose a new method for automatically evaluating the factuality of generated narrative evidence syntheses using models that infer the directionality of reported findings.
Collapse
|
16
|
Dai D, Tang J, Yu Z, Wong HS, You J, Cao W, Hu Y, Chen CLP. An Inception Convolutional Autoencoder Model for Chinese Healthcare Question Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2019-2031. [PMID: 31180903 DOI: 10.1109/tcyb.2019.2916580] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Healthcare question answering (HQA) system plays a vital role in encouraging patients to inquire for professional consultation. However, there are some challenging factors in learning and representing the question corpus of HQA datasets, such as high dimensionality, sparseness, noise, nonprofessional expression, etc. To address these issues, we propose an inception convolutional autoencoder model for Chinese healthcare question clustering (ICAHC). First, we select a set of kernels with different sizes using convolutional autoencoder networks to explore both the diversity and quality in the clustering ensemble. Thus, these kernels encourage to capture diverse representations. Second, we design four ensemble operators to merge representations based on whether they are independent, and input them into the encoder using different skip connections. Third, it maps features from the encoder into a lower-dimensional space, followed by clustering. We conduct comparative experiments against other clustering algorithms on a Chinese healthcare dataset. Experimental results show the effectiveness of ICAHC in discovering better clustering solutions. The results can be used in the prediction of patients' conditions and the development of an automatic HQA system.
Collapse
|
17
|
Li J, Ji C, Yan G, You L, Chen J. An Ensemble Net of Convolutional Auto-Encoder and Graph Auto-Encoder for Auto-Diagnosis. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2020.2984335] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
18
|
Xiong Y, Chen S, Chen Q, Yan J, Tang B. Using Character-Level and Entity-Level Representations to Enhance Bidirectional Encoder Representation From Transformers-Based Clinical Semantic Textual Similarity Model: ClinicalSTS Modeling Study. JMIR Med Inform 2020; 8:e23357. [PMID: 33372664 PMCID: PMC7803475 DOI: 10.2196/23357] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 11/10/2020] [Accepted: 11/16/2020] [Indexed: 12/03/2022] Open
Abstract
Background With the popularity of electronic health records (EHRs), the quality of health care has been improved. However, there are also some problems caused by EHRs, such as the growing use of copy-and-paste and templates, resulting in EHRs of low quality in content. In order to minimize data redundancy in different documents, Harvard Medical School and Mayo Clinic organized a national natural language processing (NLP) clinical challenge (n2c2) on clinical semantic textual similarity (ClinicalSTS) in 2019. The task of this challenge is to compute the semantic similarity among clinical text snippets. Objective In this study, we aim to investigate novel methods to model ClinicalSTS and analyze the results. Methods We propose a semantically enhanced text matching model for the 2019 n2c2/Open Health NLP (OHNLP) challenge on ClinicalSTS. The model includes 3 representation modules to encode clinical text snippet pairs at different levels: (1) character-level representation module based on convolutional neural network (CNN) to tackle the out-of-vocabulary problem in NLP; (2) sentence-level representation module that adopts a pretrained language model bidirectional encoder representation from transformers (BERT) to encode clinical text snippet pairs; and (3) entity-level representation module to model clinical entity information in clinical text snippets. In the case of entity-level representation, we compare 2 methods. One encodes entities by the entity-type label sequence corresponding to text snippet (called entity I), whereas the other encodes entities by their representation in MeSH, a knowledge graph in the medical domain (called entity II). Results We conduct experiments on the ClinicalSTS corpus of the 2019 n2c2/OHNLP challenge for model performance evaluation. The model only using BERT for text snippet pair encoding achieved a Pearson correlation coefficient (PCC) of 0.848. When character-level representation and entity-level representation are individually added into our model, the PCC increased to 0.857 and 0.854 (entity I)/0.859 (entity II), respectively. When both character-level representation and entity-level representation are added into our model, the PCC further increased to 0.861 (entity I) and 0.868 (entity II). Conclusions Experimental results show that both character-level information and entity-level information can effectively enhance the BERT-based STS model.
Collapse
Affiliation(s)
- Ying Xiong
- Harbin Institute of Technology, Shenzhen, China
| | - Shuai Chen
- Harbin Institute of Technology, Shenzhen, China
| | - Qingcai Chen
- Harbin Institute of Technology, Shenzhen, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Jun Yan
- Yidu Cloud Technology Company Limited, Beijing, China
| | - Buzhou Tang
- Harbin Institute of Technology, Shenzhen, China.,Peng Cheng Laboratory, Shenzhen, China
| |
Collapse
|
19
|
Méndez-Cruz CF, Blanchet A, Godínez A, Arroyo-Fernández I, Gama-Castro S, Martínez-Luna SB, González-Colín C, Collado-Vides J. Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties. Database (Oxford) 2020; 2020:baaa109. [PMID: 33306798 PMCID: PMC7731926 DOI: 10.1093/database/baaa109] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 11/18/2020] [Accepted: 11/26/2020] [Indexed: 11/21/2022]
Abstract
Transcription factors (TFs) play a main role in transcriptional regulation of bacteria, as they regulate transcription of the genetic information encoded in DNA. Thus, the curation of the properties of these regulatory proteins is essential for a better understanding of transcriptional regulation. However, traditional manual curation of article collections to compile descriptions of TF properties takes significant time and effort due to the overwhelming amount of biomedical literature, which increases every day. The development of automatic approaches for knowledge extraction to assist curation is therefore critical. Here, we show an effective approach for knowledge extraction to assist curation of summaries describing bacterial TF properties based on an automatic text summarization strategy. We were able to recover automatically a median 77% of the knowledge contained in manual summaries describing properties of 177 TFs of Escherichia coli K-12 by processing 5961 scientific articles. For 71% of the TFs, our approach extracted new knowledge that can be used to expand manual descriptions. Furthermore, as we trained our predictive model with manual summaries of E. coli, we also generated summaries for 185 TFs of Salmonella enterica serovar Typhimurium from 3498 articles. According to the manual curation of 10 of these Salmonella typhimurium summaries, 96% of their sentences contained relevant knowledge. Our results demonstrate the feasibility to assist manual curation to expand manual summaries with new knowledge automatically extracted and to create new summaries of bacteria for which these curation efforts do not exist. Database URL: The automatic summaries of the TFs of E. coli and Salmonella and the automatic summarizer are available in GitHub (https://github.com/laigen-unam/tf-properties-summarizer.git).
Collapse
Affiliation(s)
- Carlos-Francisco Méndez-Cruz
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico
| | - Antonio Blanchet
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico
| | - Alan Godínez
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico
| | - Ignacio Arroyo-Fernández
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico
- División de Posgrado, Universidad Tecnológica de la Mixteca, Carretera a Acatlima Km. 2.5, Huajuapan de León, 69000, Oaxaca, Mexico
| | - Socorro Gama-Castro
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico
| | - Sara Berenice Martínez-Luna
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico
| | - Cristian González-Colín
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico
| | - Julio Collado-Vides
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Room 403, Boston, 02215 MA, USA
| |
Collapse
|
20
|
Abstract
A question answering system is a type of information retrieval that takes a question from a user in natural language as the input and returns the best answer to it as the output. In this paper, a medical question answering system in the Persian language is designed and implemented. During this research, a dataset of diseases and drugs is collected and structured. The proposed system includes three main modules: question processing, document retrieval, and answer extraction. For the question processing module, a sequential architecture is designed which retrieves the main concept of a question by using different components. In these components, rule-based methods, natural language processing, and dictionary-based techniques are used. In the document retrieval module, the documents are indexed and searched using the Lucene library. The retrieved documents are ranked using similarity detection algorithms and the highest-ranked document is selected to be used by the answer extraction module. This module is responsible for extracting the most relevant section of the text in the retrieved document. During this research, different customized language processing tools such as part of speech tagger and lemmatizer are also developed for Persian. Evaluation results show that this system performs well for answering different questions about diseases and drugs. The accuracy of the system for 500 sample questions is 83.6%.
Collapse
Affiliation(s)
- Hadi Veisi
- Faculty of New Sciences and Technologies, University of Tehran, Tehran, Iran
| | - Hamed Fakour Shandi
- Faculty of New Sciences and Technologies, University of Tehran, Tehran, Iran
| |
Collapse
|
21
|
Medical speciality classification system based on binary particle swarms and ensemble of one vs. rest support vector machines. J Biomed Inform 2020; 109:103525. [PMID: 32781030 DOI: 10.1016/j.jbi.2020.103525] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Revised: 07/19/2020] [Accepted: 07/29/2020] [Indexed: 11/21/2022]
Abstract
Nowadays, artificial intelligence plays an integral role in medical and healthcare informatics. Developing an automatic question classification and answering system is essential for coping with constant advancements in science and technology. However, efficient online medical services are required to promote offline medical services. This article proposes a system that automatically classifies medical questions of patients into medical specialities and supports the Arabic language in the MENA region. Text classification is not trivial, especially when dealing with a highly morphologically complex language, the dialectical form of which is the dominant form on the Internet. This work utilizes 15,000 medical questions asked by the clients of Altibbi telemedicine company. The questions are classified into 15 medical specialities. As the number of medical questions received daily by the company has increased, a need has arisen for an automatic classification system that can save the medical personnel much time and effort. Therefore, this article presents an efficient medical speciality classification system based on swarm intelligence (SI) and an ensemble of support vector machines (SVMs). Particle swarm optimization (PSO) is an SI-based and stochastic metaheuristic algorithm that is adopted to search for the optimal number of features and tune the hyperparameters of the SVM classifiers, which are deployed as one-versus-rest for multi-class classification. In addition, PSO is integrated with various binarization techniques to boost its performance. The experimental results show that the proposed approach accomplished remarkable performance as it achieved an accuracy of 85% and a features reduction rate of 95.9%.
Collapse
|
22
|
Wen A, Elwazir MY, Moon S, Fan J. Adapting and evaluating a deep learning language model for clinical why-question answering. JAMIA Open 2020; 3:16-20. [PMID: 32607483 PMCID: PMC7309262 DOI: 10.1093/jamiaopen/ooz072] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2019] [Revised: 12/02/2019] [Accepted: 12/21/2019] [Indexed: 11/14/2022] Open
Abstract
Objectives To adapt and evaluate a deep learning language model for answering why-questions based on patient-specific clinical text. Materials and Methods Bidirectional encoder representations from transformers (BERT) models were trained with varying data sources to perform SQuAD 2.0 style why-question answering (why-QA) on clinical notes. The evaluation focused on: (1) comparing the merits from different training data and (2) error analysis. Results The best model achieved an accuracy of 0.707 (or 0.760 by partial match). Training toward customization for the clinical language helped increase 6% in accuracy. Discussion The error analysis suggested that the model did not really perform deep reasoning and that clinical why-QA might warrant more sophisticated solutions. Conclusion The BERT model achieved moderate accuracy in clinical why-QA and should benefit from the rapidly evolving technology. Despite the identified limitations, it could serve as a competent proxy for question-driven clinical information extraction.
Collapse
Affiliation(s)
- Andrew Wen
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Mohamed Y Elwazir
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, Minnesota, USA.,Department of Cardiology, Faculty of Medicine, Suez Canal University, Ismailia, Egypt
| | - Sungrim Moon
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Jungwei Fan
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA.,Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
23
|
Calijorne Soares MA, Parreiras FS. A literature review on question answering techniques, paradigms and systems. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2020. [DOI: 10.1016/j.jksuci.2018.08.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
24
|
|
25
|
|
26
|
Sarrouti M, Ouatik El Alaoui S. SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions. Artif Intell Med 2019; 102:101767. [PMID: 31980104 DOI: 10.1016/j.artmed.2019.101767] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 11/19/2019] [Accepted: 11/19/2019] [Indexed: 12/11/2022]
Abstract
BACKGROUND AND OBJECTIVE Question answering (QA), the identification of short accurate answers to users questions written in natural language expressions, is a longstanding issue widely studied over the last decades in the open-domain. However, it still remains a real challenge in the biomedical domain as the most of the existing systems support a limited amount of question and answer types as well as still require further efforts in order to improve their performance in terms of precision for the supported questions. Here, we present a semantic biomedical QA system named SemBioNLQA which has the ability to handle the kinds of yes/no, factoid, list, and summary natural language questions. METHODS This paper describes the system architecture and an evaluation of the developed end-to-end biomedical QA system named SemBioNLQA, which consists of question classification, document retrieval, passage retrieval and answer extraction modules. It takes natural language questions as input, and outputs both short precise answers and summaries as results. The SemBioNLQA system, dealing with four types of questions, is based on (1) handcrafted lexico-syntactic patterns and a machine learning algorithm for question classification, (2) PubMed search engine and UMLS similarity for document retrieval, (3) the BM25 model, stemmed words and UMLS concepts for passage retrieval, and (4) UMLS metathesaurus, BioPortal synonyms, sentiment analysis and term frequency metric for answer extraction. RESULTS AND CONCLUSION Compared with the current state-of-the-art biomedical QA systems, SemBioNLQA, a fully automated system, has the potential to deal with a large amount of question and answer types. SemBioNLQA retrieves quickly users' information needs by returning exact answers (e.g., "yes", "no", a biomedical entity name, etc.) and ideal answers (i.e., paragraph-sized summaries of relevant information) for yes/no, factoid and list questions, whereas it provides only the ideal answers for summary questions. Moreover, experimental evaluations performed on biomedical questions and answers provided by the BioASQ challenge especially in 2015, 2016 and 2017 (as part of our participation), show that SemBioNLQA achieves good performances compared with the most current state-of-the-art systems and allows a practical and competitive alternative to help information seekers find exact and ideal answers to their biomedical questions. The SemBioNLQA source code is publicly available at https://github.com/sarrouti/sembionlqa.
Collapse
Affiliation(s)
- Mourad Sarrouti
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, U.S. National Institutes of Health, Bethesda, MD.
| | - Said Ouatik El Alaoui
- National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco; Laboratory of Informatics and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez, Morocco
| |
Collapse
|
27
|
Liu YH, Song X, Chen SF. Long story short: finding health advice with informative summaries on health social media. ASLIB J INFORM MANAG 2019. [DOI: 10.1108/ajim-02-2019-0048] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
Whether automatically generated summaries of health social media can aid users in managing their diseases appropriately is an important question. The purpose of this paper is to introduce a novel text summarization approach for acquiring the most informative summaries from online patient posts accurately and effectively.
Design/methodology/approach
The data set regarding diabetes and HIV posts was, respectively, collected from two online disease forums. The proposed summarizer is based on the graph-based method to generate summaries by considering social network features, text sentiment and sentence features. Representative health-related summaries were identified and summarization performance as well as user judgments were analyzed.
Findings
The findings show that awarding sentences without using all the incorporating features decreases summarization performance compared with the classic summarization method and comparison approaches. The proposed summarizer significantly outperformed the comparison baseline.
Originality/value
This study contributes to the literature on health knowledge management by analyzing patients’ experiences and opinions through the health summarization model. The research additionally develops a new mindset to design abstractive summarization weighting schemes from the health user-generated content.
Collapse
|
28
|
Jin ZX, Zhang BW, Fang F, Zhang LL, Yin XC. Health assistant: answering your questions anytime from biomedical literature. Bioinformatics 2019; 35:4129-4139. [PMID: 30887023 DOI: 10.1093/bioinformatics/btz195] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Revised: 11/28/2018] [Accepted: 03/16/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION With the abundant medical resources, especially literature available online, it is possible for people to understand their own health status and relevant problems autonomously. However, how to obtain the most appropriate answer from the increasingly large-scale database, remains a great challenge. Here, we present a biomedical question answering framework and implement a system, Health Assistant, to enable the search process. METHODS In Health Assistant, a search engine is firstly designed to rank biomedical documents based on contents. Then various query processing and search techniques are utilized to find the relevant documents. Afterwards, the titles and abstracts of top-N documents are extracted to generate candidate snippets. Finally, our own designed query processing and retrieval approaches for short text are applied to locate the relevant snippets to answer the questions. RESULTS Our system is evaluated on the BioASQ benchmark datasets, and experimental results demonstrate the effectiveness and robustness of our system, compared to BioASQ participant systems and some state-of-the-art methods on both document retrieval and snippet retrieval tasks. AVAILABILITY AND IMPLEMENTATION A demo of our system is available at https://github.com/jinzanxia/biomedical-QA.
Collapse
Affiliation(s)
- Zan-Xia Jin
- Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| | | | - Fan Fang
- Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| | - Le-Le Zhang
- Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| | - Xu-Cheng Yin
- Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| |
Collapse
|
29
|
Kraus M, Niedermeier J, Jankrift M, Tietböhl S, Stachewicz T, Folkerts H, Uflacker M, Neves M. Olelo: a web application for intuitive exploration of biomedical literature. Nucleic Acids Res 2019; 45:W478-W483. [PMID: 28472397 PMCID: PMC5570143 DOI: 10.1093/nar/gkx363] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 04/25/2017] [Indexed: 11/12/2022] Open
Abstract
Researchers usually query the large biomedical literature in PubMed via keywords, logical operators and filters, none of which is very intuitive. Question answering systems are an alternative to keyword searches. They allow questions in natural language as input and results reflect the given type of question, such as short answers and summaries. Few of those systems are available online but they experience drawbacks in terms of long response times and they support a limited amount of question and result types. Additionally, user interfaces are usually restricted to only displaying the retrieved information. For our Olelo web application, we combined biomedical literature and terminologies in a fast in-memory database to enable real-time responses to researchers’ queries. Further, we extended the built-in natural language processing features of the database with question answering and summarization procedures. Combined with a new explorative approach of document filtering and a clean user interface, Olelo enables a fast and intelligent search through the ever-growing biomedical literature. Olelo is available at http://www.hpi.de/plattner/olelo.
Collapse
Affiliation(s)
- Milena Kraus
- Department of Enterprise Platforms and Integration Concepts, Hasso Plattner Institute, August-Bebel-Strasse 88, Potsdam 14482, Germany
| | - Julian Niedermeier
- Department of Enterprise Platforms and Integration Concepts, Hasso Plattner Institute, August-Bebel-Strasse 88, Potsdam 14482, Germany
| | - Marcel Jankrift
- Department of Enterprise Platforms and Integration Concepts, Hasso Plattner Institute, August-Bebel-Strasse 88, Potsdam 14482, Germany
| | - Sören Tietböhl
- Department of Enterprise Platforms and Integration Concepts, Hasso Plattner Institute, August-Bebel-Strasse 88, Potsdam 14482, Germany
| | - Toni Stachewicz
- Department of Enterprise Platforms and Integration Concepts, Hasso Plattner Institute, August-Bebel-Strasse 88, Potsdam 14482, Germany
| | - Hendrik Folkerts
- Department of Enterprise Platforms and Integration Concepts, Hasso Plattner Institute, August-Bebel-Strasse 88, Potsdam 14482, Germany
| | - Matthias Uflacker
- Department of Enterprise Platforms and Integration Concepts, Hasso Plattner Institute, August-Bebel-Strasse 88, Potsdam 14482, Germany
| | - Mariana Neves
- Department of Enterprise Platforms and Integration Concepts, Hasso Plattner Institute, August-Bebel-Strasse 88, Potsdam 14482, Germany
| |
Collapse
|
30
|
Demenkov PS, Saik OV, Ivanisenko TV, Kolchanov NA, Kochetov AV, Ivanisenko VA. Prioritization of potato genes involved in the formation of agronomically valuable traits using the SOLANUM TUBEROSUM knowledge base. Vavilovskii Zhurnal Genet Selektsii 2019. [DOI: 10.18699/vj19.501] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The development of highly efficient technologies in genomics, transcriptomics, proteomics and metabolomics, as well as new technologies in agriculture has led to an “information explosion” in plant biology and crop production, including potato production. Only a small part of the information reaches formalized databases (for example, Uniprot, NCBI Gene, BioGRID, IntAct, etc.). One of the main sources of reliable biological data is the scientific literature. The well-known PubMed database contains more than 18 thousand abstracts of articles on potato. The effective use of knowledge presented in such a number of non-formalized documents in natural language requires the use of modern intellectual methods of analysis. However, in the literature, there is no evidence of a widespread use of intelligent methods for automatically extracting knowledge from scientific publications on cultures such as potatoes. Earlier we developed the SOLANUM TUBEROSUM knowledge base (http://www-bionet.sysbio.cytogen. ru/and/plant/). Integrated into the knowledge base information about the molecular genetic mechanisms underlying the selection of significant traits helps to accelerate the identification of candidate genes for the breeding characteristics of potatoes and the development of diagnostic markers for breeding. The article searches for new potential participants of the molecular genetic mechanisms of resistance to adverse factors in plants. Prioritizing candidate genes has shown that the PHYA, GF14, CNIH1, RCI1A, ABI5, CPK1, RGS1, NHL3, GRF8, and CYP21-4 genes are the most promising for further testing of their relationships with resistance to adverse factors. As a result of the analysis, it was shown that the molecular genetic relationships responsible for the formation of significant agricultural traits are complex and include many direct and indirect interactions. The construction of associative gene networks and their analysis using the SOLANUM TUBEROSUM knowledge base is the basis for searching for target genes for targeted mutagenesis and marker-oriented selection of potato varieties with valuable agricultural characteristics.
Collapse
Affiliation(s)
- P. S. Demenkov
- Institute of Cytology and Genetics, SB RAS; Novosibirsk State University
| | - O. V. Saik
- Institute of Cytology and Genetics, SB RAS
| | | | | | | | | |
Collapse
|
31
|
Kratzwald B, Feuerriegel S. Putting Question-Answering Systems into Practice. ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS 2019. [DOI: 10.1145/3309706] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Traditional information retrieval (such as that offered by web search engines) impedes users with information overload from extensive result pages and the need to manually locate the desired information therein. Conversely, question-answering systems change how humans interact with information systems: users can now ask specific questions and obtain a tailored answer—both conveniently in natural language. Despite obvious benefits, their use is often limited to an academic context, largely because of expensive domain customizations, which means that the performance in domain-specific applications often fails to meet expectations. This article proposes cost-efficient remedies: (i) we leverage metadata through a filtering mechanism, which increases the precision of document retrieval, and (ii) we develop a novel fuse-and-oversample approach for transfer learning to improve the performance of answer extraction. Here, knowledge is inductively transferred from related, yet different, tasks to the domain-specific application, while accounting for potential differences in the sample sizes across both tasks. The resulting performance is demonstrated with actual use cases from a finance company and the film industry, where fewer than 400 question-answer pairs had to be annotated to yield significant performance gains. As a direct implication to management, this presents a promising path to better leveraging of knowledge stored in information systems.
Collapse
|
32
|
Saha SK, Prakash A, Majumder M. "Similar query was answered earlier": processing of patient authored text for retrieving relevant contents from health discussion forum. Health Inf Sci Syst 2019; 7:4. [PMID: 30863540 DOI: 10.1007/s13755-019-0067-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Accepted: 02/01/2019] [Indexed: 11/28/2022] Open
Abstract
Online remedy finders and health-related discussion forums have become increasingly popular in recent years. Common web users write their health problems there and request suggestion from experts or other users. As a result, these forums became a huge repository of information and discussions on various health issues. An intelligent information retrieval system can help to utilize this repository in various applications. In this paper, we propose a system for the automatic identification of existing similar forum posts given a new post. The system is based on computing similarity between two patient authored texts. For computing the similarity between the current post and existing posts, the system uses a hybrid strategy based on template information, topic modelling, and latent semantic indexing. The system is tested using a set of real questions collected from a homeopathy forum namely abchomeopathy.com. The relevance of the posts retrieved by the system is evaluated by human experts. The evaluation results demonstrate that the precision of the system is 88.87%.
Collapse
Affiliation(s)
- Sujan Kumar Saha
- 1Department of Computer Science and Engineering, Birla Institute of Technology Mesra, Ranchi, 835215 India
| | - Amit Prakash
- 1Department of Computer Science and Engineering, Birla Institute of Technology Mesra, Ranchi, 835215 India
| | - Mukta Majumder
- 2Department of Computer Science and Application, University of North Bengal, West Bengal, India
| |
Collapse
|
33
|
Ivanisenko VA, Demenkov PS, Ivanisenko TV, Mishchenko EL, Saik OV. A new version of the ANDSystem tool for automatic extraction of knowledge from scientific publications with expanded functionality for reconstruction of associative gene networks by considering tissue-specific gene expression. BMC Bioinformatics 2019; 20:34. [PMID: 30717676 PMCID: PMC6362586 DOI: 10.1186/s12859-018-2567-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Consideration of tissue-specific gene expression in reconstruction and analysis of molecular genetic networks is necessary for a proper description of the processes occurring in a specified tissue. Currently, there are a number of computer systems that allow the user to reconstruct molecular-genetic networks using the data automatically extracted from the texts of scientific publications. Examples of such systems are STRING, Pathway Commons, MetaCore and Ingenuity. The MetaCore and Ingenuity systems permit taking into account tissue-specific gene expression during the reconstruction of gene networks. Previously, we developed the ANDSystem tool, which also provides an automated extraction of knowledge from scientific texts and allows the reconstruction of gene networks. The main difference between our system and other tools is in the different types of interactions between objects, which makes the ANDSystem complementary to existing well-known systems. However, previous versions of the ANDSystem did not contain any information on tissue-specific expression. RESULTS A new version of the ANDSystem has been developed. It offers the reconstruction of associative gene networks while taking into account the tissue-specific gene expression. The ANDSystem knowledge base features information on tissue-specific expression for 272 tissues. The system allows the reconstruction of combined gene networks, as well as performing the filtering of genes from such networks using the information on their tissue-specific expression. As an example of the application of such filtering, the gene network of the extrinsic apoptotic signaling pathway was analyzed. It was shown that considering different tissues can lead to changes in gene network structure, including changes in such indicators as betweenness centrality of vertices, clustering coefficient, network centralization, network density, etc. CONCLUSIONS: The consideration of tissue specificity can play an important role in the analysis of gene networks, in particular solving the problem of finding the most significant central genes. Thus, the new version of ANDSystem can be employed for a wide range of tasks related to biomedical studies of individual tissues. It is available at http://www-bionet.sscc.ru/and/cell /.
Collapse
Affiliation(s)
- Vladimir A. Ivanisenko
- Laboratory of Computer-Assisted Proteomics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090 Russia
- Novosibirsk State University, st. Pirogova 1, Novosibirsk, 630090 Russia
| | - Pavel S. Demenkov
- Laboratory of Computer-Assisted Proteomics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090 Russia
- Novosibirsk State University, st. Pirogova 1, Novosibirsk, 630090 Russia
| | - Timofey V. Ivanisenko
- Laboratory of Computer-Assisted Proteomics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090 Russia
- Novosibirsk State University, st. Pirogova 1, Novosibirsk, 630090 Russia
| | - Elena L. Mishchenko
- Laboratory of Computer-Assisted Proteomics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090 Russia
| | - Olga V. Saik
- Laboratory of Computer-Assisted Proteomics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090 Russia
- Novosibirsk State University, st. Pirogova 1, Novosibirsk, 630090 Russia
| |
Collapse
|
34
|
Kearns WR, Thomas JA. Resource and Response Type Classification for Consumer Health Question Answering. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2018:634-643. [PMID: 30815105 PMCID: PMC6371272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Health question answering systems often depend on the initial step of question type classification. Practitioners face several modeling choices for this component alone. We evaluate the effectiveness of different modeling choices in both the embeddings and architectural hyper-parameters of the classifier. In the process, we achieve improved performance over previous methods, achieving a new best 5-fold accuracy of 85.3% on the GARD dataset. The contribution of this work is to evaluate the performance of sentence classification methods on the task of consumer health question type classification and to contribute a dataset of 2,882 medical questions annotated for question type.
Collapse
|
35
|
Moradi M. CIBS: A biomedical text summarizer using topic-based sentence clustering. J Biomed Inform 2018; 88:53-61. [DOI: 10.1016/j.jbi.2018.11.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2018] [Revised: 09/26/2018] [Accepted: 11/12/2018] [Indexed: 12/21/2022]
|
36
|
|
37
|
Fiorini N, Leaman R, Lipman DJ, Lu Z. How user intelligence is improving PubMed. Nat Biotechnol 2018; 36:nbt.4267. [PMID: 30272675 DOI: 10.1038/nbt.4267] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Accepted: 09/06/2018] [Indexed: 11/09/2022]
Abstract
PubMed is a widely used search engine for biomedical literature. It is developed and maintained by the US National Library of Medicine/National Center for Biotechnology Information and is visited daily by millions of users around the world. For decades, PubMed has used advanced artificial intelligence technologies that extract patterns of collective user activity, such as machine learning and natural language processing, to inform the algorithmic changes that ultimately improve a user's search experience. Although these efforts have led to objective improvements in search quality, the technical underpinnings remain largely invisible and go largely unnoticed by most users. Here we describe how these 'under-the-hood' techniques work within PubMed and report how their effectiveness and usage is assessed in real-world scenarios. In doing so, we hope to increase the transparency of the PubMed system and enable users to make more effective use of the search engine. We also identify open challenges and new opportunities for computational researchers to explore the potential of future improvements.
Collapse
Affiliation(s)
- Nicolas Fiorini
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, USA
| | - Robert Leaman
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, USA
| | - David J Lipman
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, USA
| |
Collapse
|
38
|
Méndez-Cruz CF, Gama-Castro S, Mejía-Almonte C, Castillo-Villalba MP, Muñiz-Rascado LJ, Collado-Vides J. First steps in automatic summarization of transcription factor properties for RegulonDB: classification of sentences about structural domains and regulated processes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2017:4237584. [PMID: 29220462 PMCID: PMC5737074 DOI: 10.1093/database/bax070] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Accepted: 08/15/2017] [Indexed: 11/17/2022]
Abstract
The RegulonDB (http://regulondb.ccg.unam.mx) team generates manually elaborated summaries about transcription factors (TFs) of Escherichia coli K-12. These texts involve considerable effort, since they summarize a diverse collection of structural, mechanistic and physiological properties of TFs and, due to constant new research, ideally they require frequent updating. In natural language processing, several techniques for automatic summarization have been developed. Therefore, our proposal is to extract, by using those techniques, relevant information about TFs for assisting the curation and elaboration of the manual summaries. Here, we present the results of the automatic classification of sentences about the biological processes regulated by a TF and the information about the structural domains constituting the TF. We tested two classical classifiers, Naïve Bayes and Support Vector Machines (SVMs), with the sentences of the manual summaries as training data. The best classifier was an SVM employing lexical, grammatical, and terminological features (F-score, 0.8689). The sentences of articles analyzed by this classifier were frequently true, but many sentences were set aside (high precision with low recall); consequently, some improvement is required. Nevertheless, automatic summaries of complete articles about five TFs, generated with this classifier, included much of the relevant information of the summaries written by curators (high ROUGE-1 recall). In fact, a manual comparison confirmed that the best summary encompassed 100% of the relevant information. Hence, our empirical results suggest that our proposal is promising for covering more properties of TFs to generate suggested sentences with relevant information to help the curation work without losing quality.
Collapse
Affiliation(s)
- Carlos-Francisco Méndez-Cruz
- Computational Genomics Program, Center for Genomic Sciences, National Autonomous University of Mexico, Av. Universidad, s/n, Colonia Chamilpa, Cuernavaca, Morelos 62100, Mexico
| | - Socorro Gama-Castro
- Computational Genomics Program, Center for Genomic Sciences, National Autonomous University of Mexico, Av. Universidad, s/n, Colonia Chamilpa, Cuernavaca, Morelos 62100, Mexico
| | - Citlalli Mejía-Almonte
- Computational Genomics Program, Center for Genomic Sciences, National Autonomous University of Mexico, Av. Universidad, s/n, Colonia Chamilpa, Cuernavaca, Morelos 62100, Mexico
| | - Marco-Polo Castillo-Villalba
- Computational Genomics Program, Center for Genomic Sciences, National Autonomous University of Mexico, Av. Universidad, s/n, Colonia Chamilpa, Cuernavaca, Morelos 62100, Mexico
| | - Luis-José Muñiz-Rascado
- Computational Genomics Program, Center for Genomic Sciences, National Autonomous University of Mexico, Av. Universidad, s/n, Colonia Chamilpa, Cuernavaca, Morelos 62100, Mexico
| | - Julio Collado-Vides
- Computational Genomics Program, Center for Genomic Sciences, National Autonomous University of Mexico, Av. Universidad, s/n, Colonia Chamilpa, Cuernavaca, Morelos 62100, Mexico
| |
Collapse
|
39
|
Iatraki G, Kondylakis H, Koumakis L, Chatzimina M, Kazantzaki E, Marias K, Tsiknakis M. Personal Health Information Recommender: implementing a tool for the empowerment of cancer patients. Ecancermedicalscience 2018; 12:851. [PMID: 30079113 PMCID: PMC6057655 DOI: 10.3332/ecancer.2018.851] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2017] [Indexed: 11/25/2022] Open
Abstract
Nowadays, patients have a wealth of information available on the Internet. Despite the potential benefits of Internet health information seeking, several concerns have been raised about the quality of information and about the patient’s capability to evaluate medical information and to relate it to their own disease and treatment. As such, novel tools are required to effectively guide patients and provide high-quality medical information in an intelligent and personalised manner. With this aim, this paper presents the Personal Health Information Recommender (PHIR), a system to empower patients by enabling them to search in a high-quality document repository selected by experts, avoiding the information overload of the Internet. In addition, the information provided to the patients is personalised, based on individual preferences, medical conditions and other profiling information. Despite the generality of our approach, we apply the PHIR to a personal health record system constructed for cancer patients and we report on the design, the implementation and a preliminary validation of the platform. To the best of our knowledge, our platform is the only one combining natural language processing, ontologies and personal information to offer a unique user experience.
Collapse
Affiliation(s)
- Galatia Iatraki
- Computational BioMedicine Laboratory, FORTH-ICS, Heraklion GR70013, Greece
| | | | - Lefteris Koumakis
- Computational BioMedicine Laboratory, FORTH-ICS, Heraklion GR70013, Greece
| | - Maria Chatzimina
- Computational BioMedicine Laboratory, FORTH-ICS, Heraklion GR70013, Greece
| | - Eleni Kazantzaki
- Computational BioMedicine Laboratory, FORTH-ICS, Heraklion GR70013, Greece
| | - Kostas Marias
- Computational BioMedicine Laboratory, FORTH-ICS, Heraklion GR70013, Greece.,Department of Informatics Engineering, Technological Educational Institute of Crete, Heraklion GR71004, Greece
| | - Manolis Tsiknakis
- Computational BioMedicine Laboratory, FORTH-ICS, Heraklion GR70013, Greece.,Department of Informatics Engineering, Technological Educational Institute of Crete, Heraklion GR71004, Greece
| |
Collapse
|
40
|
Hu Y, Wen G, Ma J, Li D, Wang C, Li H, Huan E. Label-indicator morpheme growth on LSTM for Chinese healthcare question department classification. J Biomed Inform 2018; 82:154-168. [DOI: 10.1016/j.jbi.2018.04.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Revised: 02/05/2018] [Accepted: 04/24/2018] [Indexed: 12/15/2022]
|
41
|
Abstract
BACKGROUND Health question-answering (QA) systems have become a typical application scenario of Artificial Intelligent (AI). An annotated question corpus is prerequisite for training machines to understand health information needs of users. Thus, we aimed to develop an annotated classification corpus of Chinese health questions (Qcorp) and make it openly accessible. METHODS We developed a two-layered classification schema and corresponding annotation rules on basis of our previous work. Using the schema, we annotated 5000 questions that were randomly selected from 5 Chinese health websites within 6 broad sections. 8 annotators participated in the annotation task, and the inter-annotator agreement was evaluated to ensure the corpus quality. Furthermore, the distribution and relationship of the annotated tags were measured by descriptive statistics and social network map. RESULTS The questions were annotated using 7101 tags that covers 29 topic categories in the two-layered schema. In our released corpus, the distribution of questions on the top-layered categories was treatment of 64.22%, diagnosis of 37.14%, epidemiology of 14.96%, healthy lifestyle of 10.38%, and health provider choice of 4.54% respectively. Both the annotated health questions and annotation schema were openly accessible on the Qcorp website. Users can download the annotated Chinese questions in CSV, XML, and HTML format. CONCLUSIONS We developed a Chinese health question corpus including 5000 manually annotated questions. It is openly accessible and would contribute to the intelligent health QA system development.
Collapse
Affiliation(s)
- Haihong Guo
- Institute of Medical Information / Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Xu Na
- Institute of Medical Information / Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Jiao Li
- Institute of Medical Information / Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.
| |
Collapse
|
42
|
Kilicoglu H, Ben Abacha A, Mrabet Y, Shooshan SE, Rodriguez L, Masterton K, Demner-Fushman D. Semantic annotation of consumer health questions. BMC Bioinformatics 2018; 19:34. [PMID: 29409442 PMCID: PMC5802048 DOI: 10.1186/s12859-018-2045-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 01/24/2018] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap. A major challenge in developing consumer health QA systems is extracting relevant semantic content from the natural language questions (question understanding). To develop effective question understanding tools, question corpora semantically annotated for relevant question elements are needed. In this paper, we present a two-part consumer health question corpus annotated with several semantic categories: named entities, question triggers/types, question frames, and question topic. The first part (CHQA-email) consists of relatively long email requests received by the U.S. National Library of Medicine (NLM) customer service, while the second part (CHQA-web) consists of shorter questions posed to MedlinePlus search engine as queries. Each question has been annotated by two annotators. The annotation methodology is largely the same between the two parts of the corpus; however, we also explain and justify the differences between them. Additionally, we provide information about corpus characteristics, inter-annotator agreement, and our attempts to measure annotation confidence in the absence of adjudication of annotations. RESULTS The resulting corpus consists of 2614 questions (CHQA-email: 1740, CHQA-web: 874). Problems are the most frequent named entities, while treatment and general information questions are the most common question types. Inter-annotator agreement was generally modest: question types and topics yielded highest agreement, while the agreement for more complex frame annotations was lower. Agreement in CHQA-web was consistently higher than that in CHQA-email. Pairwise inter-annotator agreement proved most useful in estimating annotation confidence. CONCLUSIONS To our knowledge, our corpus is the first focusing on annotation of uncurated consumer health questions. It is currently used to develop machine learning-based methods for question understanding. We make the corpus publicly available to stimulate further research on consumer health QA.
Collapse
Affiliation(s)
- Halil Kilicoglu
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD USA
| | - Asma Ben Abacha
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD USA
| | - Yassine Mrabet
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD USA
| | - Sonya E. Shooshan
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD USA
| | - Laritza Rodriguez
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD USA
| | - Kate Masterton
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD USA
| | - Dina Demner-Fushman
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD USA
| |
Collapse
|
43
|
Hu Z, Zhang Z, Yang H, Chen Q, Zhu R, Zuo D. Predicting the quality of online health expert question-answering services with temporal features in a deep learning framework. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.11.039] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
44
|
Garcia-Gathright JI, Matiasz NJ, Adame C, Sarma KV, Sauer L, Smedley NF, Spiegel ML, Strunck J, Garon EB, Taira RK, Aberle DR, Bui AAT. Evaluating Casama: Contextualized semantic maps for summarization of lung cancer studies. Comput Biol Med 2018; 92:55-63. [PMID: 29149658 PMCID: PMC5762403 DOI: 10.1016/j.compbiomed.2017.10.034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Revised: 10/28/2017] [Accepted: 10/29/2017] [Indexed: 01/15/2023]
Abstract
OBJECTIVE It is crucial for clinicians to stay up to date on current literature in order to apply recent evidence to clinical decision making. Automatic summarization systems can help clinicians quickly view an aggregated summary of literature on a topic. Casama, a representation and summarization system based on "contextualized semantic maps," captures the findings of biomedical studies as well as the contexts associated with patient population and study design. This paper presents a user-oriented evaluation of Casama in comparison to a context-free representation, SemRep. MATERIALS AND METHODS The effectiveness of the representation was evaluated by presenting users with manually annotated Casama and SemRep summaries of ten articles on driver mutations in cancer. Automatic annotations were evaluated on a collection of articles on EGFR mutation in lung cancer. Seven users completed a questionnaire rating the summarization quality for various topics and applications. RESULTS Casama had higher median scores than SemRep for the majority of the topics (p≤ 0.00032), all of the applications (p≤ 0.00089), and in overall summarization quality (p≤ 1.5e-05). Casama's manual annotations outperformed Casama's automatic annotations (p = 0.00061). DISCUSSION Casama performed particularly well in the representation of strength of evidence, which was highly rated both quantitatively and qualitatively. Users noted that Casama's less granular, more targeted representation improved usability compared to SemRep. CONCLUSION This evaluation demonstrated the benefits of a contextualized representation for summarizing biomedical literature on cancer. Iteration on specific areas of Casama's representation, further development of its algorithms, and a clinically-oriented evaluation are warranted.
Collapse
Affiliation(s)
- Jean I Garcia-Gathright
- University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA.
| | - Nicholas J Matiasz
- University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
| | - Carlos Adame
- University of California, Los Angeles, Department of Medicine - Division of Hematology-Oncology, 924 Westwood Boulevard, Suite 200, Los Angeles, CA, 90024, USA
| | - Karthik V Sarma
- University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
| | - Lauren Sauer
- University of California, Los Angeles, Department of Medicine - Division of Hematology-Oncology, 924 Westwood Boulevard, Suite 200, Los Angeles, CA, 90024, USA
| | - Nova F Smedley
- University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
| | - Marshall L Spiegel
- University of California, Los Angeles, Department of Medicine - Division of Hematology-Oncology, 924 Westwood Boulevard, Suite 200, Los Angeles, CA, 90024, USA
| | - Jennifer Strunck
- University of California, Los Angeles, Department of Medicine - Division of Hematology-Oncology, 924 Westwood Boulevard, Suite 200, Los Angeles, CA, 90024, USA
| | - Edward B Garon
- University of California, Los Angeles, Department of Medicine - Division of Hematology-Oncology, 924 Westwood Boulevard, Suite 200, Los Angeles, CA, 90024, USA
| | - Ricky K Taira
- University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA; University of California, Los Angeles, Department of Radiological Sciences, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
| | - Denise R Aberle
- University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA; University of California, Los Angeles, Department of Radiological Sciences, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
| | - Alex A T Bui
- University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA; University of California, Los Angeles, Department of Radiological Sciences, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
| |
Collapse
|
45
|
VanDam C, Kanthawala S, Pratt W, Chai J, Huh J. Detecting clinically related content in online patient posts. J Biomed Inform 2017; 75:96-106. [PMID: 28986329 PMCID: PMC5685920 DOI: 10.1016/j.jbi.2017.09.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 09/14/2017] [Accepted: 09/30/2017] [Indexed: 10/18/2022]
Abstract
Patients with chronic health conditions use online health communities to seek support and information to help manage their condition. For clinically related topics, patients can benefit from getting opinions from clinical experts, and many are concerned about misinformation and biased information being spread online. However, a large volume of community posts makes it challenging for moderators and clinical experts, if there are any, to provide necessary information. Automatically identifying forum posts that need validated clinical resources can help online health communities efficiently manage content exchange. This automation can also assist patients in need of clinical expertise by getting proper help. We present our results on testing text classification models that efficiently and accurately identify community posts containing clinical topics. We annotated 1817 posts comprised of 4966 sentences of an existing online diabetes community. We found that our classifier performed the best (F-measure: 0.83, Precision: 0.79, Recall:0.86) when using Naïve Bayes algorithm, unigrams, bigrams, trigrams, and MetaMap Symantic Types. Training took 5 s. The classification process took a fraction of 1 s. We applied our classifier to another online diabetes community, and the results were: F-measure: 0.63, Precision: 0.57, Recall: 0.71. Our results show our model is feasible to scale to other forums on identifying posts containing clinical topic with common errors properly addressed.
Collapse
Affiliation(s)
| | | | - Wanda Pratt
- University of Washington, Seattle, United States.
| | - Joyce Chai
- Michigan State University, United States.
| | - Jina Huh
- University of California San Diego, United States.
| |
Collapse
|
46
|
Kang T, Zhang S, Tang Y, Hruby GW, Rusanov A, Elhadad N, Weng C. EliIE: An open-source information extraction system for clinical trial eligibility criteria. J Am Med Inform Assoc 2017; 24:1062-1071. [PMID: 28379377 PMCID: PMC6259668 DOI: 10.1093/jamia/ocx019] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Revised: 01/31/2017] [Accepted: 03/02/2017] [Indexed: 12/22/2022] Open
Abstract
OBJECTIVE To develop an open-source information extraction system called Eligibility Criteria Information Extraction (EliIE) for parsing and formalizing free-text clinical research eligibility criteria (EC) following Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) version 5.0. MATERIALS AND METHODS EliIE parses EC in 4 steps: (1) clinical entity and attribute recognition, (2) negation detection, (3) relation extraction, and (4) concept normalization and output structuring. Informaticians and domain experts were recruited to design an annotation guideline and generate a training corpus of annotated EC for 230 Alzheimer's clinical trials, which were represented as queries against the OMOP CDM and included 8008 entities, 3550 attributes, and 3529 relations. A sequence labeling-based method was developed for automatic entity and attribute recognition. Negation detection was supported by NegEx and a set of predefined rules. Relation extraction was achieved by a support vector machine classifier. We further performed terminology-based concept normalization and output structuring. RESULTS In task-specific evaluations, the best F1 score for entity recognition was 0.79, and for relation extraction was 0.89. The accuracy of negation detection was 0.94. The overall accuracy for query formalization was 0.71 in an end-to-end evaluation. CONCLUSIONS This study presents EliIE, an OMOP CDM-based information extraction system for automatic structuring and formalization of free-text EC. According to our evaluation, machine learning-based EliIE outperforms existing systems and shows promise to improve.
Collapse
Affiliation(s)
- Tian Kang
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Shaodian Zhang
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Youlan Tang
- Institute of Human Nutrition, Columbia University, New York, NY, USA
| | - Gregory W Hruby
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Alexander Rusanov
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
47
|
Dunn K, Marshall JG, Wells AL, Backus JEB. Examining the role of MEDLINE as a patient care information resource: an analysis of data from the Value of Libraries study. J Med Libr Assoc 2017; 105:336-346. [PMID: 28983197 PMCID: PMC5624423 DOI: 10.5195/jmla.2017.87] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
OBJECTIVE This study analyzed data from a study on the value of libraries to understand the specific role that the MEDLINE database plays in relation to other information resources that are available to health care providers and its role in positively impacting patient care. METHODS A previous study on the use of health information resources for patient care obtained 16,122 responses from health care providers in 56 hospitals about how providers make decisions affecting patient care and the role of information resources in that process. Respondents indicated resources used in answering a specific clinical question from a list of 19 possible resources, including MEDLINE. Study data were examined using descriptive statistics and regression analysis to determine the number of information resources used and how they were used in combination with one another. RESULTS Health care professionals used 3.5 resources, on average, to aid in patient care. The 2 most frequently used resources were journals (print and online) and the MEDLINE database. Using a higher number of information resources was significantly associated with a higher probability of making changes to patient care and avoiding adverse events. MEDLINE was the most likely to be among consulted resources compared to any other information resource other than journals. CONCLUSIONS MEDLINE is a critical clinical care tool that health care professionals use to avoid adverse events, make changes to patient care, and answer clinical questions.
Collapse
|
48
|
Sarrouti M, El Alaoui SO. A Yes/No Answer Generator Based on Sentiment-Word Scores in Biomedical Question Answering. INTERNATIONAL JOURNAL OF HEALTHCARE INFORMATION SYSTEMS AND INFORMATICS 2017. [DOI: 10.4018/ijhisi.2017070104] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Background and Objective: Yes/no question answering (QA) in open-domain is a longstanding challenge widely studied over the last decades. However, it still requires further efforts in the biomedical domain. Yes/no QA aims at answering yes/no questions, which are seeking for a clear “yes” or “no” answer. In this paper, we present a novel yes/no answer generator based on sentiment-word scores in biomedical QA. Methods: In the proposed method, we first use the Stanford CoreNLP for tokenization and part-of-speech tagging all relevant passages to a given yes/no question. We then assign a sentiment score based on SentiWordNet to each word of the passages. Finally, the decision on either the answers “yes” or “no” is based on the obtained sentiment-passages score: “yes” for a positive final sentiment-passages score and “no” for a negative one. Results: Experimental evaluations performed on BioASQ collections show that the proposed method is more effective as compared with the current state-of-the-art method, and significantly outperforms it by an average of 15.68% in terms of accuracy.
Collapse
Affiliation(s)
- Mourad Sarrouti
- Laboratory of Computer Science and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez, Morocco
| | - Said Ouatik El Alaoui
- Laboratory of Computer Science and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez, Morocco
| |
Collapse
|
49
|
Deardorff A, Masterton K, Roberts K, Kilicoglu H, Demner-Fushman D. A protocol-driven approach to automatically finding authoritative answers to consumer health questions in online resources. J Assoc Inf Sci Technol 2017. [DOI: 10.1002/asi.23806] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Ariel Deardorff
- National Library of Medicine; 8600 Rockville Pike Bethesda MD 20894
| | - Kate Masterton
- National Library of Medicine; 8600 Rockville Pike Bethesda MD 20894
| | - Kirk Roberts
- National Library of Medicine; 8600 Rockville Pike Bethesda MD 20894
- School of Biomedical Informatics; University of Texas Health Science Center at Houston; 7000 Fannin Street #875 Houston TX 77030
| | - Halil Kilicoglu
- National Library of Medicine; 8600 Rockville Pike Bethesda MD 20894
| | | |
Collapse
|
50
|
A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering. J Biomed Inform 2017; 68:96-103. [DOI: 10.1016/j.jbi.2017.03.001] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2016] [Revised: 03/03/2017] [Accepted: 03/05/2017] [Indexed: 11/17/2022]
|