Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Magge A, Weissenbacher D, Sarker A, Scotch M, Gonzalez-Hernandez G. Deep neural networks and distant supervision for geographic location mention extraction. Bioinformatics 2019;34:i565-i573. [PMID: 29950020 PMCID: PMC6022665 DOI: 10.1093/bioinformatics/bty273] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

For:	Magge A, Weissenbacher D, Sarker A, Scotch M, Gonzalez-Hernandez G. Deep neural networks and distant supervision for geographic location mention extraction. Bioinformatics 2019;34:i565-i573. [PMID: 29950020 PMCID: PMC6022665 DOI: 10.1093/bioinformatics/bty273] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

Number

Cited by Other Article(s)

Ge Y, Guo Y, Yang YC, Al-Garadi MA, Sarker A. A comparison of few-shot and traditional named entity recognition models for medical text. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2022;2022:84-89. [PMID: 37641590 PMCID: PMC10462421 DOI: 10.1109/ichi54592.2022.00024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]

Abstract

Many research problems involving medical texts have limited amounts of annotated data available (e.g., expressions of rare diseases). Traditional supervised machine learning algorithms, particularly those based on deep neural networks, require large volumes of annotated data, and they underperform when only small amounts of labeled data are available. Few-shot learning (FSL) is a category of machine learning models that are designed with the intent of solving problems that have small annotated datasets available. However, there is no current study that compares the performances of FSL models with traditional models (e.g., conditional random fields) for medical text at different training set sizes. In this paper, we attempted to fill this gap in research by comparing multiple FSL models with traditional models for the task of named entity recognition (NER) from medical texts. Using five health-related annotated NER datasets, we benchmarked three traditional NER models based on BERT-BERT-Linear Classifier (BLC), BERT-CRF (BC) and SANER; and three FSL NER models-StructShot & NNShot, Few-Shot Slot Tagging (FS-ST) and ProtoNER. Our benchmarking results show that almost all models, whether traditional or FSL, achieve significantly lower performances compared to the state-of-the-art with small amounts of training data. For the NER experiments we executed, the F1-scores were very low with small training sets, typically below 30%. FSL models that were reported to perform well on non-medical texts significantly underperformed, compared to their reported best, on medical texts. Our experiments also suggest that FSL methods tend to perform worse on data sets from noisy sources of medical texts, such as social media (which includes misspellings and colloquial expressions), compared to less noisy sources such as medical literature. Our experiments demonstrate that the current state-of-the-art FSL systems are not yet suitable for effective NER in medical natural language processing tasks, and further research needs to be carried out to improve their performances. Creation of specialized, standardized datasets replicating real-world scenarios may help to move this category of methods forward.

Collapse

Wang H, Zang Y, Zhao Y, Hao D, Kang Y, Zhang J, Zhang Z, Zhang L, Yang Z, Zhang S. Sequence Matching between Hemagglutinin and Neuraminidase through Sequence Analysis Using Machine Learning. Viruses 2022;14:v14030469. [PMID: 35336876 PMCID: PMC8950662 DOI: 10.3390/v14030469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 02/15/2022] [Accepted: 02/17/2022] [Indexed: 01/27/2023] Open

Peterson KS, Lewis J, Patterson OV, Chapman AB, Denhalter DW, Lye PA, Stevens VW, Gamage SD, Roselle GA, Wallace KS, Jones M. Automated Travel History Extraction From Clinical Notes for Informing the Detection of Emergent Infectious Disease Events: Algorithm Development and Validation. JMIR Public Health Surveill 2021;7:e26719. [PMID: 33759790 PMCID: PMC7993087 DOI: 10.2196/26719] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 02/05/2021] [Accepted: 02/12/2021] [Indexed: 02/02/2023] Open

Abstract

Background

Patient travel history can be crucial in evaluating evolving infectious disease events. Such information can be challenging to acquire in electronic health records, as it is often available only in unstructured text.

Objective

This study aims to assess the feasibility of annotating and automatically extracting travel history mentions from unstructured clinical documents in the Department of Veterans Affairs across disparate health care facilities and among millions of patients. Information about travel exposure augments existing surveillance applications for increased preparedness in responding quickly to public health threats.

Methods

Clinical documents related to arboviral disease were annotated following selection using a semiautomated bootstrapping process. Using annotated instances as training data, models were developed to extract from unstructured clinical text any mention of affirmed travel locations outside of the continental United States. Automated text processing models were evaluated, involving machine learning and neural language models for extraction accuracy.

Results

Among 4584 annotated instances, 2659 (58%) contained an affirmed mention of travel history, while 347 (7.6%) were negated. Interannotator agreement resulted in a document-level Cohen kappa of 0.776. Automated text processing accuracy (F1 85.6, 95% CI 82.5-87.9) and computational burden were acceptable such that the system can provide a rapid screen for public health events.

Conclusions

Automated extraction of patient travel history from clinical documents is feasible for enhanced passive surveillance public health systems. Without such a system, it would usually be necessary to manually review charts to identify recent travel or lack of travel, use an electronic health record that enforces travel history documentation, or ignore this potential source of information altogether. The development of this tool was initially motivated by emergent arboviral diseases. More recently, this system was used in the early phases of response to COVID-19 in the United States, although its utility was limited to a relatively brief window due to the rapid domestic spread of the virus. Such systems may aid future efforts to prevent and contain the spread of infectious diseases.

Collapse

Affiliation(s)

Kelly S Peterson VA Salt Lake City Health Care System, US Department of Veterans Affairs, Salt Lake City, UT, United States.,Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, UT, United States
Julia Lewis VA Salt Lake City Health Care System, US Department of Veterans Affairs, Salt Lake City, UT, United States.,Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, UT, United States
Olga V Patterson VA Salt Lake City Health Care System, US Department of Veterans Affairs, Salt Lake City, UT, United States.,Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, UT, United States
Alec B Chapman VA Salt Lake City Health Care System, US Department of Veterans Affairs, Salt Lake City, UT, United States.,Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, UT, United States
Daniel W Denhalter VA Salt Lake City Health Care System, US Department of Veterans Affairs, Salt Lake City, UT, United States.,Department of Rocky Mountain Cancer Data Systems, University of Utah, Salt Lake City, UT, United States
Patricia A Lye National Infectious Diseases Service, Specialty Care Services, US Department of Veterans Affairs, Cincinnati, OH, United States
Vanessa W Stevens VA Salt Lake City Health Care System, US Department of Veterans Affairs, Salt Lake City, UT, United States.,Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, UT, United States
Shantini D Gamage National Infectious Diseases Service, Specialty Care Services, US Department of Veterans Affairs, Cincinnati, OH, United States.,Division of Infectious Diseases, Department of Internal Medicine, University of Cincinnati College of Medicine, Cincinnati, OH, United States
Gary A Roselle National Infectious Diseases Service, Specialty Care Services, US Department of Veterans Affairs, Cincinnati, OH, United States.,Division of Infectious Diseases, Department of Internal Medicine, University of Cincinnati College of Medicine, Cincinnati, OH, United States.,Cincinnati VA Medical Center, US Department of Veterans Affairs, Cincinnati, OH, United States
Katherine S Wallace Office of Biosurveillance, Veterans Affairs Central Office, US Department of Veterans Affairs, Washington, DC, United States.,National Biosurveillance Integration Center, Countering Weapons of Mass Destruction, Department of Homeland Security, Washington, DC, United States
Makoto Jones VA Salt Lake City Health Care System, US Department of Veterans Affairs, Salt Lake City, UT, United States.,Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, UT, United States

Collapse

Magge A, Weissenbacher D, O'Connor K, Tahsin T, Gonzalez-Hernandez G, Scotch M. GeoBoost2: a natural languageprocessing pipeline for GenBank metadata enrichment for virus phylogeography. Bioinformatics 2021;36:5120-5121. [PMID: 32683454 PMCID: PMC7755405 DOI: 10.1093/bioinformatics/btaa647] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 07/03/2020] [Accepted: 07/13/2020] [Indexed: 12/27/2022] Open

Vaiente MA, Scotch M. Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2020;85:104501. [PMID: 32798768 PMCID: PMC7686256 DOI: 10.1016/j.meegid.2020.104501] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 08/06/2020] [Accepted: 08/09/2020] [Indexed: 01/14/2023]

Junge A, Jensen LJ. CoCoScore: context-aware co-occurrence scoring for text mining applications using distant supervision. Bioinformatics 2020;36:264-271. [PMID: 31199464 PMCID: PMC6956794 DOI: 10.1093/bioinformatics/btz490] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Revised: 05/30/2019] [Accepted: 06/10/2019] [Indexed: 12/18/2022] Open

Magge A, Weissenbacher D, Sarker A, Scotch M, Gonzalez-Hernandez G. Bi-directional Recurrent Neural Network Models for Geographic Location Extraction in Biomedical Literature. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019;24:100-111. [PMID: 30864314 PMCID: PMC6417823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]