1
|
Bhogal AN, Berrocal VJ, Romero DM, Willis MA, Vydiswaran VGV, Veinot TC. Social Acceptability of Health Behavior Posts on Social Media: An Experiment. Am J Prev Med 2024; 66:870-876. [PMID: 38191003 DOI: 10.1016/j.amepre.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 01/03/2024] [Accepted: 01/03/2024] [Indexed: 01/10/2024]
Abstract
INTRODUCTION Social media sites like Twitter (now X) are increasingly used to create health behavior metrics for public health surveillance. Yet little is known about social norms that may bias the content of posts about health behaviors. Social norms for posts about four health behaviors (smoking tobacco, drinking alcohol, physical activity, eating food) on Twitter/X were evaluated. METHODS This was a randomized experiment delivered via web-based survey to adult, English-speaking Twitter/X users in three Michigan, USA, counties from 2020 to 2022 (n=559). Each participant viewed 24 posts presenting experimental manipulations regarding four health behaviors and answered questions about each post's social acceptability. Principal component analysis was used to combine survey responses into one perceived social acceptability measure. Linear mixed models with the Benjamini-Hochberg correction were implemented to test seven study hypotheses in 2023. RESULTS Supporting six hypotheses, posts presenting healthier (CI: 0.028, 0.454), less stigmatized behaviors (CI: 0.552, 0.157) were more socially acceptable than posts regarding unhealthier, stigmatized behaviors. Unhealthy (CI: -0.268, -0.109) and stigmatized behavior (CI: -0.261, -0.103) posts were less acceptable for more educated participants. Posts about collocated activities (CI: 0.410, 0.573) and accompanied by expressions of liking (CI: 0.906, 1.11) were more acceptable than activities undertaken alone or disliked. Contrary to one hypothesis, posts reporting unusual activities were less acceptable than usual ones (CI: -0.472, 0.312). CONCLUSIONS Perceived social acceptability may be associated with the frequency and content of health behavior posts. Users of Twitter/X and other social media platform posts to estimate health behavior prevalence should account for potential estimation biases from perceived social acceptability of posts.
Collapse
Affiliation(s)
- Ashley N Bhogal
- School of Information, University of Michigan, Ann Arbor, Michigan
| | - Veronica J Berrocal
- Department of Statistics, University of California Irvine Donald Bren School of Information and Computer Sciences, Irvine, California
| | - Daniel M Romero
- School of Information, University of Michigan, Ann Arbor, Michigan; Center for the Study of Complex Systems, University of Michigan College of Literature, Science, and the Arts, Ann Arbor, Michigan; Department of Electrical Engineering and Computer Science, College of Engineering, University of Michigan, Ann Arbor, Michigan
| | - Matthew A Willis
- School of Information, University of Michigan, Ann Arbor, Michigan
| | - V G Vinod Vydiswaran
- School of Information, University of Michigan, Ann Arbor, Michigan; Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan
| | - Tiffany C Veinot
- School of Information, University of Michigan, Ann Arbor, Michigan; Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan; Department of Health Behavior and Health Education, University of Michigan School of Public Health, Ann Arbor, Michigan.
| |
Collapse
|
2
|
Ritchie O, Koptyra E, Marquis LB, Kadri R, Laurie AR, Vydiswaran VGV, Li J, Brown LK, Veinot TC, Buis LR, Guetterman TC. Virtual Care: Perspectives From Family Physicians. Fam Med 2024; 56:321-324. [PMID: 38652849 DOI: 10.22454/fammed.2024.592756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
BACKGROUND During the COVID-19 pandemic, virtual care expanded rapidly at Michigan Medicine and other health systems. From family physicians' perspectives, this shift to virtual care has the potential to affect workflow, job satisfaction, and patient communication. As clinics reopened and care delivery models shifted to a combination of in-person and virtual care, the need to understand physician experiences with virtual care arose in order to improve both patient and provider experiences. This study investigated Michigan Medicine family medicine physicians' perceptions of virtual care through qualitative interviews to better understand how to improve the quality and effectiveness of virtual care for both patients and physicians. METHODS We employed a qualitative descriptive design to examine physician perspectives through semistructured interviews. We coded and analyzed transcripts using thematic analysis, facilitated by MAXQDA (VERBI) software. RESULTS The results of the analysis identified four major themes: (a) chief concerns that are appropriate for virtual evaluation, (b) physician perceptions of patient benefits, (c) focused but contextually enriched patient-physician communication, and (d) structural support needed for high-quality virtual care. CONCLUSIONS These findings can help further direct the discussion of how to make use of resources to improve the quality and effectiveness of virtual care.
Collapse
Affiliation(s)
- Olivia Ritchie
- Department of Family Medicine, University of Michigan, Ann Arbor, MI
| | - Emily Koptyra
- Department of Family Medicine, University of Michigan, Ann Arbor, MI
| | - Liz B Marquis
- School of Information, University of Michigan, Ann Arbor, MI
| | - Reema Kadri
- Department of Family Medicine, University of Michigan, Ann Arbor, MI
| | - Anna R Laurie
- Department of Family Medicine, University of Michigan Medical School, Ann Arbor, MI
| | - V G Vinod Vydiswaran
- School of Information, University of Michigan, Ann Arbor, MI
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI
| | - Jiazhao Li
- School of Information, University of Michigan, Ann Arbor, MI
| | - Lindsay K Brown
- School of Information, University of Michigan, Ann Arbor, MI
| | - Tiffany C Veinot
- School of Information, University of Michigan, Ann Arbor, MI
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI
- Department of Health Behavior and Health Education, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Lorraine R Buis
- Department of Family Medicine, University of Michigan, Ann Arbor, MI
- School of Information, University of Michigan, Ann Arbor, MI
| | | |
Collapse
|
3
|
Yu D, Stidham RW, Vydiswaran VGV. A Systematic Temporal Extraction Pipeline for Medical Concepts in Clinical Notes. AMIA Annu Symp Proc 2024; 2023:1314-1323. [PMID: 38222360 PMCID: PMC10785919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
With increased application of natural language processing (NLP) in medicine, many NLP models are being developed for uncovering relevant clinical features from electronic health records. Temporal information plays a key role in understanding the context, significance, and interpretation of medical concepts extracted from clinical notes. This is particularly true in situations where the behavior, value, or status of a medical concept changes over time. In this paper, we introduce a systematic framework, NLP annotation-Relaxation-Generation (NRG). NRG compiles incidents of medical concept changes from status annotations and timestamps of multiple clinical notes. We demonstrate the effectiveness of the NRG pipeline by applying it to two medical concepts related to patients with inflammatory bowel disease: extra-intestinal manifestations and medications. We show that the NRG pipeline offers not only insights into medical concept changes over time, but can help convey longitudinal changes in clinical features at both individual and population level.
Collapse
Affiliation(s)
- Deahan Yu
- University of Michigan, Ann Arbor, MI, USA
| | | | | |
Collapse
|
4
|
Joo H, Mathis MR, Tam M, James C, Han P, Mangrulkar RS, Friedman CP, Vydiswaran VGV. Applying AI and Guidelines to Assist Medical Students in Recognizing Patients With Heart Failure: Protocol for a Randomized Trial. JMIR Res Protoc 2023; 12:e49842. [PMID: 37874618 PMCID: PMC10630872 DOI: 10.2196/49842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 09/16/2023] [Accepted: 09/20/2023] [Indexed: 10/25/2023] Open
Abstract
BACKGROUND The integration of artificial intelligence (AI) into clinical practice is transforming both clinical practice and medical education. AI-based systems aim to improve the efficacy of clinical tasks, enhancing diagnostic accuracy and tailoring treatment delivery. As it becomes increasingly prevalent in health care for high-quality patient care, it is critical for health care providers to use the systems responsibly to mitigate bias, ensure effective outcomes, and provide safe clinical practices. In this study, the clinical task is the identification of heart failure (HF) prior to surgery with the intention of enhancing clinical decision-making skills. HF is a common and severe disease, but detection remains challenging due to its subtle manifestation, often concurrent with other medical conditions, and the absence of a simple and effective diagnostic test. While advanced HF algorithms have been developed, the use of these AI-based systems to enhance clinical decision-making in medical education remains understudied. OBJECTIVE This research protocol is to demonstrate our study design, systematic procedures for selecting surgical cases from electronic health records, and interventions. The primary objective of this study is to measure the effectiveness of interventions aimed at improving HF recognition before surgery, the second objective is to evaluate the impact of inaccurate AI recommendations, and the third objective is to explore the relationship between the inclination to accept AI recommendations and their accuracy. METHODS Our study used a 3 × 2 factorial design (intervention type × order of prepost sets) for this randomized trial with medical students. The student participants are asked to complete a 30-minute e-learning module that includes key information about the intervention and a 5-question quiz, and a 60-minute review of 20 surgical cases to determine the presence of HF. To mitigate selection bias in the pre- and posttests, we adopted a feature-based systematic sampling procedure. From a pool of 703 expert-reviewed surgical cases, 20 were selected based on features such as case complexity, model performance, and positive and negative labels. This study comprises three interventions: (1) a direct AI-based recommendation with a predicted HF score, (2) an indirect AI-based recommendation gauged through the area under the curve metric, and (3) an HF guideline-based intervention. RESULTS As of July 2023, 62 of the enrolled medical students have fulfilled this study's participation, including the completion of a short quiz and the review of 20 surgical cases. The subject enrollment commenced in August 2022 and will end in December 2023, with the goal of recruiting 75 medical students in years 3 and 4 with clinical experience. CONCLUSIONS We demonstrated a study protocol for the randomized trial, measuring the effectiveness of interventions using AI and HF guidelines among medical students to enhance HF recognition in preoperative care with electronic health record data. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/49842.
Collapse
Affiliation(s)
- Hyeon Joo
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
| | - Michael R Mathis
- Department of Anesthesiology, University of Michigan, Ann Arbor, MI, United States
| | - Marty Tam
- Department of Internal Medicine, Cardiology, University of Michigan, Ann Arbor, MI, United States
| | - Cornelius James
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
- Department of Pediatrics, University of Michigan, Ann Arbor, MI, United States
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Peijin Han
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States
| | - Rajesh S Mangrulkar
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Charles P Friedman
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
- School of Information, University of Michigan, Ann Arbor, MI, United States
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
- School of Information, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
5
|
Buis LR, Brown LK, Plegue MA, Kadri R, Laurie AR, Guetterman TC, Vydiswaran VGV, Li J, Veinot TC. Identifying Inequities in Video and Audio Telehealth Services for Primary Care Encounters During COVID-19: Repeated Cross-Sectional, Observational Study. J Med Internet Res 2023; 25:e49804. [PMID: 37773609 PMCID: PMC10544805 DOI: 10.2196/49804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 08/04/2023] [Accepted: 08/07/2023] [Indexed: 10/01/2023] Open
Abstract
BACKGROUND The COVID-19 pandemic resulted in rapid changes in how patient care was provided, particularly through the expansion of telehealth and audio-only phone-based care. OBJECTIVE The goal of this study was to evaluate inequities in video and audio-only care during various time points including the initial wave of the COVID-19 pandemic, later stages of the pandemic, and a historical control. We sought to understand the characteristics of care during this time for a variety of different groups of patients that may experience health care inequities. METHODS We conducted a retrospective analysis of electronic health record (EHR) data from encounters from 34 family medicine and internal medicine primary care clinics in a large, Midwestern health system, using a repeated cross-sectional, observational study design. These data included patient demographic data, as well as encounter, diagnosis, and procedure records. Data were obtained for all in-person and telehealth encounters (including audio-only phone-based care) that occurred during 3 separate time periods: an initial COVID-19 period (T2: March 16, 2020, to May 3, 2020), a later COVID-19 period (T3: May 4, 2020, to September 30, 2020), and a historical control period from the previous year (T1: March 16, 2019, to September 30, 2019). Primary analysis focused on the status of each encounter in terms of whether it was completed as scheduled, it was canceled, or the patient missed the appointment. A secondary analysis was performed to evaluate the likelihood of an encounter being completed based on visit modality (phone, video, in-person). RESULTS In total, there were 938,040 scheduled encounters during the 3 time periods, with 178,747 unique patients, that were included for analysis. Patients with completed encounters were more likely to be younger than 65 years old (71.8%-74.1%), be female (58.8%-61.8%), be White (75.6%-76.7%), and have no significant comorbidities (63.2%-66.8%) or disabilities (53.2%-61.1%) in all time periods than those who had only canceled or missed encounters. Effects on different subpopulations are discussed herein. CONCLUSIONS Findings from this study demonstrate that primary care utilization across delivery modalities (in person, video, and phone) was not equivalent across all groups before and during the COVID-19 pandemic and different groups were differentially impacted at different points. Understanding how different groups of patients responded to these rapid changes and how health care inequities may have been affected is an important step in better understanding implementation strategies for digital solutions in the future.
Collapse
Affiliation(s)
- Lorraine R Buis
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States
- School of Information, University of Michigan, Ann Arbor, MI, United States
| | - Lindsay K Brown
- School of Information, University of Michigan, Ann Arbor, MI, United States
| | - Melissa A Plegue
- Department of Pediatrics, University of Michigan, Ann Arbor, MI, United States
| | - Reema Kadri
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Anna R Laurie
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Timothy C Guetterman
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States
| | - V G Vinod Vydiswaran
- School of Information, University of Michigan, Ann Arbor, MI, United States
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, United States
| | - Jiazhao Li
- School of Information, University of Michigan, Ann Arbor, MI, United States
| | - Tiffany C Veinot
- School of Information, University of Michigan, Ann Arbor, MI, United States
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, United States
- Department of Health Behavior and Health Education, School of Public Health, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
6
|
Wu DTY, Hanauer D, Murdock P, Vydiswaran VGV, Mei Q, Zheng K. Developing a Semantically Based Query Recommendation for an Electronic Medical Record Search Engine: Query Log Analysis and Design Implications. JMIR Form Res 2023; 7:e45376. [PMID: 37713239 PMCID: PMC10541636 DOI: 10.2196/45376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 07/19/2023] [Accepted: 08/03/2023] [Indexed: 09/16/2023] Open
Abstract
BACKGROUND An effective and scalable information retrieval (IR) system plays a crucial role in enabling clinicians and researchers to harness the valuable information present in electronic health records. In a previous study, we developed a prototype medical IR system, which incorporated a semantically based query recommendation (SBQR) feature. The system was evaluated empirically and demonstrated high perceived performance by end users. To delve deeper into the factors contributing to this perceived performance, we conducted a follow-up study using query log analysis. OBJECTIVE One of the primary challenges faced in IR is that users often have limited knowledge regarding their specific information needs. Consequently, an IR system, particularly its user interface, needs to be thoughtfully designed to assist users through the iterative process of refining their queries as they encounter relevant documents during their search. To address these challenges, we incorporated "query recommendation" into our Electronic Medical Record Search Engine (EMERSE), drawing inspiration from the success of similar features in modern IR systems for general purposes. METHODS The query log data analyzed in this study were collected during our previous experimental study, where we developed EMERSE with the SBQR feature. We implemented a logging mechanism to capture user query behaviors and the output of the IR system (retrieved documents). In this analysis, we compared the initial query entered by users with the query formulated with the assistance of the SBQR. By examining the results of this comparison, we could examine whether the use of SBQR helped in constructing improved queries that differed from the original ones. RESULTS Our findings revealed that the first query entered without SBQR and the final query with SBQR assistance were highly similar (Jaccard similarity coefficient=0.77). This suggests that the perceived positive performance of the system was primarily attributed to the automatic query expansion facilitated by the SBQR rather than users manually manipulating their queries. In addition, through entropy analysis, we observed that search results converged in scenarios of moderate difficulty, and the degree of convergence correlated strongly with the perceived system performance. CONCLUSIONS The study demonstrated the potential contribution of the SBQR in shaping participants' positive perceptions of system performance, contingent upon the difficulty of the search scenario. Medical IR systems should therefore consider incorporating an SBQR as a user-controlled option or a semiautomated feature. Future work entails redesigning the experiment in a more controlled manner and conducting multisite studies to demonstrate the effectiveness of EMERSE with SBQR for patient cohort identification. By further exploring and validating these findings, we can enhance the usability and functionality of medical IR systems in real-world settings.
Collapse
Affiliation(s)
- Danny T Y Wu
- Department of Biomedical Informatics, University of Cincinnati College of Medicine, Cincinnati, OH, United States
- School of Information, University of Michigan, Ann Arbor, MI, United States
| | - David Hanauer
- School of Information, University of Michigan, Ann Arbor, MI, United States
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
| | - Paul Murdock
- Burnett School of Medicine, Texas Christian University, Fort Worth, TX, United States
- Department of Biomedical Informatics, University of Cincinnati, Cincinnati, OH, United States
| | - V G Vinod Vydiswaran
- School of Information, University of Michigan, Ann Arbor, MI, United States
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
| | - Qiaozhu Mei
- School of Information, University of Michigan, Ann Arbor, MI, United States
| | - Kai Zheng
- School of Information, University of Michigan, Ann Arbor, MI, United States
- Department of Informatics, University of California, Irvine, CA, United States
| |
Collapse
|
7
|
Walling AM, Pevnick J, Bennett AV, Vydiswaran VGV, Ritchie CS. Dementia and electronic health record phenotypes: a scoping review of available phenotypes and opportunities for future research. J Am Med Inform Assoc 2023:7186523. [PMID: 37252836 DOI: 10.1093/jamia/ocad086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 04/27/2023] [Accepted: 05/16/2023] [Indexed: 06/01/2023] Open
Abstract
OBJECTIVE We performed a scoping review of algorithms using electronic health record (EHR) data to identify patients with Alzheimer's disease and related dementias (ADRD), to advance their use in research and clinical care. MATERIALS AND METHODS Starting with a previous scoping review of EHR phenotypes, we performed a cumulative update (April 2020 through March 1, 2023) using Pubmed, PheKB, and expert review with exclusive focus on ADRD identification. We included algorithms using EHR data alone or in combination with non-EHR data and characterized whether they identified patients at high risk of or with a current diagnosis of ADRD. RESULTS For our cumulative focused update, we reviewed 271 titles meeting our search criteria, 49 abstracts, and 26 full text papers. We identified 8 articles from the original systematic review, 8 from our new search, and 4 recommended by an expert. We identified 20 papers describing 19 unique EHR phenotypes for ADRD: 7 algorithms identifying patients with diagnosed dementia and 12 algorithms identifying patients at high risk of dementia that prioritize sensitivity over specificity. Reference standards range from only using other EHR data to in-person cognitive screening. CONCLUSION A variety of EHR-based phenotypes are available for use in identifying populations with or at high-risk of developing ADRD. This review provides comparative detail to aid in choosing the best algorithm for research, clinical care, and population health projects based on the use case and available data. Future research may further improve the design and use of algorithms by considering EHR data provenance.
Collapse
Affiliation(s)
- Anne M Walling
- Department of Medicine, VA Greater Los Angeles Health System, Los Angeles, California, USA
- Department of Medicine, University of California, Los Angeles, Los Angeles, California, USA
| | - Joshua Pevnick
- Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - Antonia V Bennett
- Department of Health Policy and Management, University of North Carolina, Chapel Hill, North Carolina, USA
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, Michigan, USA
| | - Christine S Ritchie
- Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
8
|
Weissenbacher D, O’Connor K, Rawal S, Zhang Y, Tsai RTH, Miller T, Xu D, Anderson C, Liu B, Han Q, Zhang J, Kulev I, Köprü B, Rodriguez-Esteban R, Ozkirimli E, Ayach A, Roller R, Piccolo S, Han P, Vydiswaran VGV, Tekumalla R, Banda JM, Bagherzadeh P, Bergler S, Silva JF, Almeida T, Martinez P, Rivera-Zavala R, Wang CK, Dai HJ, Alberto Robles Hernandez L, Gonzalez-Hernandez G. Automatic Extraction of Medication Mentions from Tweets-Overview of the BioCreative VII Shared Task 3 Competition. Database (Oxford) 2023; 2023:baac108. [PMID: 36734300 PMCID: PMC9896308 DOI: 10.1093/database/baac108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 10/28/2022] [Accepted: 12/13/2022] [Indexed: 02/04/2023]
Abstract
This study presents the outcomes of the shared task competition BioCreative VII (Task 3) focusing on the extraction of medication names from a Twitter user's publicly available tweets (the user's 'timeline'). In general, detecting health-related tweets is notoriously challenging for natural language processing tools. The main challenge, aside from the informality of the language used, is that people tweet about any and all topics, and most of their tweets are not related to health. Thus, finding those tweets in a user's timeline that mention specific health-related concepts such as medications requires addressing extreme imbalance. Task 3 called for detecting tweets in a user's timeline that mentions a medication name and, for each detected mention, extracting its span. The organizers made available a corpus consisting of 182 049 tweets publicly posted by 212 Twitter users with all medication mentions manually annotated. The corpus exhibits the natural distribution of positive tweets, with only 442 tweets (0.2%) mentioning a medication. This task was an opportunity for participants to evaluate methods that are robust to class imbalance beyond the simple lexical match. A total of 65 teams registered, and 16 teams submitted a system run. This study summarizes the corpus created by the organizers and the approaches taken by the participating teams for this challenge. The corpus is freely available at https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-3/. The methods and the results of the competing systems are analyzed with a focus on the approaches taken for learning from class-imbalanced data.
Collapse
Affiliation(s)
- Davy Weissenbacher
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Karen O’Connor
- DBEI, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Siddharth Rawal
- DBEI, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yu Zhang
- Department of Computer Science and Information Engineering, National Central University, No. 300, Zhongda Rd, Zhongli District, Taoyuan 320, Taiwan
| | - Richard Tzong-Han Tsai
- Department of Computer Science and Information Engineering, National Central University, No. 300, Zhongda Rd, Zhongli District, Taoyuan 320, Taiwan
- IoX Center, National Taiwan University, Da’an District, Section 4, Roosevelt Rd, No. 1, Barry Lam Hall, Taipei 106, Taiwan
- Research Center for Humanities and Social Sciences, Academia Sinica, No. 128, Section 2, Academia Rd, Nangang District, Taipei 115, Taiwan
| | - Timothy Miller
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Dongfang Xu
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | | | - Bo Liu
- NVIDIA, Santa Clara, CA, USA
| | - Qing Han
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Igor Kulev
- Data and Analytics Chapter, F. Hoffmann-La Roche Ltd, Switzerland
| | - Berkay Köprü
- Data and Analytics Chapter, F. Hoffmann-La Roche Ltd, Switzerland
| | - Raul Rodriguez-Esteban
- Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Switzerland
| | - Elif Ozkirimli
- Data and Analytics Chapter, F. Hoffmann-La Roche Ltd, Switzerland
| | - Ammer Ayach
- Speech and Language Technology Lab, DFKI, Berlin, Germany
| | - Roland Roller
- Speech and Language Technology Lab, DFKI, Berlin, Germany
| | - Stephen Piccolo
- Department of Biology, Brigham Young University, Provo, UT, USA
| | - Peijin Han
- Department of Computational Medicine and Bioinformatics, Medical School, University of Michigan, Ann Arbor, MI, USA
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, Medical School, University of Michigan, Ann Arbor, MI, USA
- School of Information, University of Michigan, Ann Arbor, MI, USA
| | - Ramya Tekumalla
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Juan M Banda
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | | | | | - João F Silva
- DETI, Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Portugal
| | - Tiago Almeida
- DETI, Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Portugal
- Department of Computation, University of A Coruña, Spain
| | - Paloma Martinez
- Computer Science and Engineering Department, Universidad Carlos III de Madrid, Madrid, Spain
| | - Renzo Rivera-Zavala
- Computer Science and Engineering Department, Universidad Carlos III de Madrid, Madrid, Spain
| | - Chen-Kai Wang
- Big Data Laboratory, Chunghwa Telecom Laboratories, Taoyuan, Taiwan
- Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Hong-Jie Dai
- Department of Electrical Engineering, College of Electrical Engineering and Computer Science, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
| | | | | |
Collapse
|
9
|
Mahmoudi E, Wu W, Najarian C, Aikens J, Bynum J, Vydiswaran VGV. Leveraging Natural Language Processing to Identify Caregiver Availability for Patients with ADRD from Electronic Medical Records. Alzheimers Dement 2022. [DOI: 10.1002/alz.064646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Affiliation(s)
| | - Wenbo Wu
- University of Michigan Ann Arbor MI USA
| | | | | | - Julie Bynum
- University of Michigan Ann Arbor MI USA
- University of Michigan Medical School Ann Arbor MI USA
| | | |
Collapse
|
10
|
Yu D, Vydiswaran VGV. An Assessment of Mentions of Adverse Drug Events on Social Media With Natural Language Processing: Model Development and Analysis. JMIR Med Inform 2022; 10:e38140. [PMID: 36170004 PMCID: PMC9557755 DOI: 10.2196/38140] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 08/13/2022] [Accepted: 09/07/2022] [Indexed: 11/27/2022] Open
Abstract
Background Adverse reactions to drugs attract significant concern in both clinical practice and public health monitoring. Multiple measures have been put into place to increase postmarketing surveillance of the adverse effects of drugs and to improve drug safety. These measures include implementing spontaneous reporting systems and developing automated natural language processing systems based on data from electronic health records and social media to collect evidence of adverse drug events that can be further investigated as possible adverse reactions. Objective While using social media for collecting evidence of adverse drug events has potential, it is not clear whether social media are a reliable source for this information. Our work aims to (1) develop natural language processing approaches to identify adverse drug events on social media and (2) assess the reliability of social media data to identify adverse drug events. Methods We propose a collocated long short-term memory network model with attentive pooling and aggregated, contextual representation generated by a pretrained model. We applied this model on large-scale Twitter data to identify adverse drug event–related tweets. We conducted a qualitative content analysis of these tweets to validate the reliability of social media data as a means to collect such information. Results The model outperformed a variant without contextual representation during both the validation and evaluation phases. Through the content analysis of adverse drug event tweets, we observed that adverse drug event–related discussions had 7 themes. Mental health–related, sleep-related, and pain-related adverse drug event discussions were most frequent. We also contrast known adverse drug reactions to those mentioned in tweets. Conclusions We observed a distinct improvement in the model when it used contextual information. However, our results reveal weak generalizability of the current systems to unseen data. Additional research is needed to fully utilize social media data and improve the robustness and reliability of natural language processing systems. The content analysis, on the other hand, showed that Twitter covered a sufficiently wide range of adverse drug events, as well as known adverse reactions, for the drugs mentioned in tweets. Our work demonstrates that social media can be a reliable data source for collecting adverse drug event mentions.
Collapse
Affiliation(s)
- Deahan Yu
- School of Information, University of Michigan, Ann Arbor, MI, United States
| | - V G Vinod Vydiswaran
- School of Information, University of Michigan, Ann Arbor, MI, United States.,Department of Learning Health Sciences, Medical School, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
11
|
Han P, Fu S, Kolis J, Hughes R, Hallstrom BR, Carvour M, Maradit-Kremers H, Sohn S, Vydiswaran VGV. Multi-Center Validation of Natural Language Processing Algorithms for Detection of Common Data Elements in Operative Notes for Total Hip Arthroplasty (Preprint). JMIR Med Inform 2022; 10:e38155. [PMID: 36044253 PMCID: PMC9475406 DOI: 10.2196/38155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/30/2022] [Accepted: 07/12/2022] [Indexed: 11/18/2022] Open
Abstract
Background Natural language processing (NLP) methods are powerful tools for extracting and analyzing critical information from free-text data. MedTaggerIE, an open-source NLP pipeline for information extraction based on text patterns, has been widely used in the annotation of clinical notes. A rule-based system, MedTagger-total hip arthroplasty (THA), developed based on MedTaggerIE, was previously shown to correctly identify the surgical approach, fixation, and bearing surface from the THA operative notes at Mayo Clinic. Objective This study aimed to assess the implementability, usability, and portability of MedTagger-THA at two external institutions, Michigan Medicine and the University of Iowa, and provide lessons learned for best practices. Methods We conducted iterative test-apply-refinement processes with three involved sites—the development site (Mayo Clinic) and two deployment sites (Michigan Medicine and the University of Iowa). Mayo Clinic was the primary NLP development site, with the THA registry as the gold standard. The activities at the two deployment sites included the extraction of the operative notes, gold standard development (Michigan: registry data; Iowa: manual chart review), the refinement of NLP algorithms on training data, and the evaluation of test data. Error analyses were conducted to understand language variations across sites. To further assess the model specificity for approach and fixation, we applied the refined MedTagger-THA to arthroscopic hip procedures and periacetabular osteotomy cases, as neither of these operative notes should contain any approach or fixation keywords. Results MedTagger-THA algorithms were implemented and refined independently for both sites. At Michigan, the study comprised THA-related notes for 2569 patient-date pairs. Before model refinement, MedTagger-THA algorithms demonstrated excellent accuracy for approach (96.6%, 95% CI 94.6%-97.9%) and fixation (95.7%, 95% CI 92.4%-97.6%). These results were comparable with internal accuracy at the development site (99.2% for approach and 90.7% for fixation). Model refinement improved accuracies slightly for both approach (99%, 95% CI 97.6%-99.6%) and fixation (98%, 95% CI 95.3%-99.3%). The specificity of approach identification was 88.9% for arthroscopy cases, and the specificity of fixation identification was 100% for both periacetabular osteotomy and arthroscopy cases. At the Iowa site, the study comprised an overall data set of 100 operative notes (50 training notes and 50 test notes). MedTagger-THA algorithms achieved moderate-high performance on the training data. After model refinement, the model achieved high performance for approach (100%, 95% CI 91.3%-100%), fixation (98%, 95% CI 88.3%-100%), and bearing surface (92%, 95% CI 80.5%-97.3%). Conclusions High performance across centers was achieved for the MedTagger-THA algorithms, demonstrating that they were sufficiently implementable, usable, and portable to different deployment sites. This study provided important lessons learned during the model deployment and validation processes, and it can serve as a reference for transferring rule-based electronic health record models.
Collapse
Affiliation(s)
- Peijin Han
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Julie Kolis
- Department of Orthopedic Surgery, University of Michigan, Ann Arbor, MI, United States
| | - Richard Hughes
- Department of Orthopedic Surgery, University of Michigan, Ann Arbor, MI, United States
| | - Brian R Hallstrom
- Department of Orthopedic Surgery, University of Michigan, Ann Arbor, MI, United States
| | - Martha Carvour
- Department of Internal Medicine and Epidemiology, University of Iowa, Iowa City, IA, United States
| | - Hilal Maradit-Kremers
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
- Departments of Orthopedic Surgery, Mayo Clinic, Rochester, MN, United States
| | - Sunghwan Sohn
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, Medical School, University of Michigan, Ann Arbor, MI, United States
- School of Information, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
12
|
Singh K, Murali A, Stevens H, Vydiswaran VGV, Bohnert A, Brummett CM, Fernandez AC. Predicting persistent opioid use after surgery using electronic health record and patient-reported data. Surgery 2022; 172:241-248. [PMID: 35181126 DOI: 10.1016/j.surg.2022.01.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 01/04/2022] [Accepted: 01/07/2022] [Indexed: 11/16/2022]
Abstract
BACKGROUND More than 100 million surgeries take place annually in the United States, and more than 90% of surgical patients receive an opioid prescription. A sizable minority of these patients will go on to use opioids long-term, contributing to the national opioid epidemic. METHODS The objective of this study was to develop and validate a model to predict persistent opioid use after surgery. Participants included surgical patients (≥18 years old) enrolled in a cohort study at an academic medical center between 2015 and 2018. Persistent opioid use was defined as filling opioid prescriptions in postdischarge days 4 to 90 and 91 to 180. Predictors included electronic health record data, state prescription drug monitoring data, and patient-reported measures. Three models were developed: a full, a restricted, and a minimal model using a derivation and validation cohort. RESULTS Of 24,040 patients, 4,879 (20%) experienced persistent opioid use. In the validation cohort, the full, restricted, and minimal model had C-statistics of 0.87 (95% CI 0.86-0.88), 0.86 (0.85-0.88), and 0.85 (0.84-0.87), respectively. All models performed better among patients with preoperative opioid use compared to opioid-naive patients (P < .001). The models slightly overpredicted risk in the validation cohort. The net benefit of using the restricted model to refer patients for preoperative counseling was 0.072 to 0.092, which is superior to evaluating no patients (net benefit of 0) or all patients (net benefit of -0.22 to -0.63). CONCLUSION This study developed and validated a prediction model for persistent opioid use using accessible data resources. The models achieved strong performance, outperforming prior published models.
Collapse
Affiliation(s)
- Karandeep Singh
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI; Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI; Department of Urology, University of Michigan Medical School, Ann Arbor, MI; School of Information, University of Michigan, Ann Arbor, MI
| | | | - Haley Stevens
- Department of Psychiatry, University of Michigan Medical School, Ann Arbor, MI
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI; School of Information, University of Michigan, Ann Arbor, MI
| | - Amy Bohnert
- Department of Psychiatry, University of Michigan Medical School, Ann Arbor, MI; Division of Pain Medicine, Department of Anesthesiology, University of Michigan Medical School, Ann Arbor, MI; VA Center for Clinical Management Research, Ann Arbor, MI
| | - Chad M Brummett
- Division of Pain Medicine, Department of Anesthesiology, University of Michigan Medical School, Ann Arbor, MI
| | - Anne C Fernandez
- Department of Psychiatry, University of Michigan Medical School, Ann Arbor, MI.
| |
Collapse
|
13
|
McCreedy E, Gilmore-Bykovskyi A, Dorr DA, Lima J, McCarthy EP, Meyers DJ, Platt R, Vydiswaran VGV, Bynum JP. Barriers to identifying residents with dementia for embedded pragmatic trials: A call to action. J Am Geriatr Soc 2022; 70:638-641. [PMID: 34727369 PMCID: PMC8821246 DOI: 10.1111/jgs.17539] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 10/11/2021] [Accepted: 10/17/2021] [Indexed: 02/03/2023]
Affiliation(s)
- Ellen McCreedy
- Department of Health Services, Policy, and Practice, Center for Gerontology and Healthcare Research, Brown University School of Public Health, Providence, RI
| | | | - David A. Dorr
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR USA
| | - Julie Lima
- Department of Health Services, Policy, and Practice, Center for Gerontology and Healthcare Research, Brown University School of Public Health, Providence, RI
| | - Ellen P. McCarthy
- Hinda and Arthur Marcus Institute for Aging Research, Hebrew SeniorLife, Boston, MA,Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| | - David J. Meyers
- Department of Health Services, Policy, and Practice, Center for Gerontology and Healthcare Research, Brown University School of Public Health, Providence, RI
| | - Richard Platt
- Department of Population Medicine, Harvard Pilgrim Health Care Institute, Harvard Medical School, Boston, MA
| | - V. G. Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, Michigan, USA,School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Julie P.W. Bynum
- Department of Internal Medicine, Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
14
|
Bynum JPW, Dorr DA, Lima J, McCarthy EP, McCreedy E, Platt R, Vydiswaran VGV. Using Healthcare Data in Embedded Pragmatic Clinical Trials among People Living with Dementia and Their Caregivers: State of the Art. J Am Geriatr Soc 2021; 68 Suppl 2:S49-S54. [PMID: 32589274 DOI: 10.1111/jgs.16617] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 04/25/2020] [Accepted: 04/29/2020] [Indexed: 12/16/2022]
Abstract
Embedded pragmatic clinical trials (ePCTs) are embedded in healthcare systems as well as their data environments. For people living with dementia (PLWD), settings of care can be different from the general population and involve additional people whose information is also important. The ePCT designs have the opportunity to leverage data that becomes available through the normal delivery of care. They may be particularly valuable in Alzheimer's disease and Alzheimer's disease-related dementia (AD/ADRD), given the complexity of case identification and the diversity of care settings. Grounded in the objectives of the Data and Technical Core of the newly established National Institute on Aging Imbedded Pragmatic Alzheimer's Disease and AD-Related Dementias Clinical Trials Collaboratory (IMPACT Collaboratory), this article summarizes the state of the art in using existing data sources (eg, Medicare claims, electronic health records) in AD/ADRD ePCTs and approaches to integrating them in real-world settings. J Am Geriatr Soc 68:S49-S54, 2020.
Collapse
Affiliation(s)
- Julie P W Bynum
- Department of Internal Medicine, Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, Michigan, USA
| | - David A Dorr
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Julie Lima
- Center for Gerontology and Healthcare Research, School of Public Health, Brown University, Providence, Rhode Island, USA
| | - Ellen P McCarthy
- Hinda and Arthur Marcus Institute for Aging Research, Hebrew SeniorLife, Boston, Massachusetts, USA.,Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Ellen McCreedy
- Center for Gerontology and Healthcare Research, School of Public Health, Brown University, Providence, Rhode Island, USA
| | - Richard Platt
- Department of Population Medicine, Harvard Pilgrim Health Care Institute, Harvard Medical School, Boston, Massachusetts, USA
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, Michigan, USA.,School of Information, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
15
|
Joo H, Burns M, Kalidaikurichi Lakshmanan SS, Hu Y, Vydiswaran VGV. Neural Machine Translation-Based Automated Current Procedural Terminology Classification System Using Procedure Text: Development and Validation Study. JMIR Form Res 2021; 5:e22461. [PMID: 34037526 PMCID: PMC8190648 DOI: 10.2196/22461] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 03/02/2021] [Accepted: 04/19/2021] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Administrative costs for billing and insurance-related activities in the United States are substantial. One critical cause of the high overhead of administrative costs is medical billing errors. With advanced deep learning techniques, developing advanced models to predict hospital and professional billing codes has become feasible. These models can be used for administrative cost reduction and billing process improvements. OBJECTIVE In this study, we aim to develop an automated anesthesiology current procedural terminology (CPT) prediction system that translates manually entered surgical procedure text into standard forms using neural machine translation (NMT) techniques. The standard forms are calculated using similarity scores to predict the most appropriate CPT codes. Although this system aims to enhance medical billing coding accuracy to reduce administrative costs, we compare its performance with that of previously developed machine learning algorithms. METHODS We collected and analyzed all operative procedures performed at Michigan Medicine between January 2017 and June 2019 (2.5 years). The first 2 years of data were used to train and validate the existing models and compare the results from the NMT-based model. Data from 2019 (6-month follow-up period) were then used to measure the accuracy of the CPT code prediction. Three experimental settings were designed with different data types to evaluate the models. Experiment 1 used the surgical procedure text entered manually in the electronic health record. Experiment 2 used preprocessing of the procedure text. Experiment 3 used preprocessing of the combined procedure text and preoperative diagnoses. The NMT-based model was compared with the support vector machine (SVM) and long short-term memory (LSTM) models. RESULTS The NMT model yielded the highest top-1 accuracy in experiments 1 and 2 at 81.64% and 81.71% compared with the SVM model (81.19% and 81.27%, respectively) and the LSTM model (80.96% and 81.07%, respectively). The SVM model yielded the highest top-1 accuracy of 84.30% in experiment 3, followed by the LSTM model (83.70%) and the NMT model (82.80%). In experiment 3, the addition of preoperative diagnoses showed 3.7%, 3.2%, and 1.3% increases in the SVM, LSTM, and NMT models in top-1 accuracy over those in experiment 2, respectively. For top-3 accuracy, the SVM, LSTM, and NMT models achieved 95.64%, 95.72%, and 95.60% for experiment 1, 95.75%, 95.67%, and 95.69% for experiment 2, and 95.88%, 95.93%, and 95.06% for experiment 3, respectively. CONCLUSIONS This study demonstrates the feasibility of creating an automated anesthesiology CPT classification system based on NMT techniques using surgical procedure text and preoperative diagnosis. Our results show that the performance of the NMT-based CPT prediction system is equivalent to that of the SVM and LSTM prediction models. Importantly, we found that including preoperative diagnoses improved the accuracy of using the procedure text alone.
Collapse
Affiliation(s)
- Hyeon Joo
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
- Department of Anesthesiology, University of Michigan, Ann Arbor, MI, United States
| | - Michael Burns
- Department of Anesthesiology, University of Michigan, Ann Arbor, MI, United States
| | | | - Yaokun Hu
- Department of Anesthesiology, University of Michigan, Ann Arbor, MI, United States
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
- School of Information, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
16
|
Lester CA, Ding Y, Li J, Jiang Y, Rowell B, Vydiswaran VGV. Human versus machine editing of electronic prescription directions. J Am Pharm Assoc (2003) 2021; 61:484-491.e1. [PMID: 33766549 DOI: 10.1016/j.japh.2021.02.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 02/02/2021] [Accepted: 02/15/2021] [Indexed: 10/22/2022]
Abstract
BACKGROUND Pharmacy staff are responsible for editing poor-quality and difficult-to-read electronic prescription (e-prescription) directions. Machine translation (MT) models are capable of translating free text from 1 sequence into another. However, the quality of MTs of e-prescriptions into pharmacy label directions is unknown. OBJECTIVE To determine the types and frequencies of e-prescription direction component errors made by an MT model, pharmacy staff, and prescribers. METHODS A prospective evaluation was conducted on a random sample of 300 patient directions in a test set of e-prescriptions from a mail-order pharmacy. Each row included directions produced by (1) prescribers on e-prescriptions, (2) pharmacy staff on prescription labels, and (3) an open neural MT model. Annotators labeled direction sets for missing direction components, use of abbreviations and medical jargon, and incorrect information (e.g., changing the number of tablets to be taken). The longest common subsequence (LCS) compared the amount of pharmacy staff editing with and without MT. RESULTS Out of 279 direction sets labeled, the MT model directions contained no quality issues in 196 (70.3%) samples compared with 187 (67.0%) and 83 (29.8%) samples for pharmacy staff directions and prescriber directions, respectively. The MT model directions contained more incorrect components (n = 23). Median LCS was greater without MT (30.0 vs. 18.5, P < 0.01, Wilcoxon signed-rank test), indicating more editing was needed. CONCLUSION MT could be used to improve the quality of e-prescription directions; however, MT makes high-risk mistakes such as incorrectly predicting the tapering regimen for prednisone. The use of semiautomated MT, where pharmacy staff can review model predictions to detect and resolve quality issues, should be considered to improve safety and decrease total work time compared with current practice. MT has strengths and weaknesses for improving the editing process of the patient directions compared with pharmacy staff alone.
Collapse
|
17
|
Vydiswaran VGV, Romero DM, Zhao X, Yu D, Gomez-Lopez I, Lu JX, Iott BE, Baylin A, Jansen EC, Clarke P, Berrocal VJ, Goodspeed R, Veinot TC. Uncovering the relationship between food-related discussion on Twitter and neighborhood characteristics. J Am Med Inform Assoc 2021; 27:254-264. [PMID: 31633756 PMCID: PMC7025333 DOI: 10.1093/jamia/ocz181] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 09/11/2019] [Accepted: 09/27/2019] [Indexed: 12/20/2022] Open
Abstract
Objective Initiatives to reduce neighborhood-based health disparities require access to meaningful, timely, and local information regarding health behavior and its determinants. We examined the validity of Twitter as a source of information for neighborhood-level analysis of dietary choices and attitudes. Materials and Methods We analyzed the “healthiness” quotient and sentiment in food-related tweets at the census tract level, and associated them with neighborhood characteristics and health outcomes. We analyzed keywords driving the differences in food healthiness between the most and least-affluent tracts, and qualitatively analyzed contents of a random sample of tweets. Results Significant, albeit weak, correlations existed between healthiness and sentiment in food-related tweets and tract-level measures of affluence, disadvantage, race, age, U.S. density, and mortality from conditions associated with obesity. Analyses of keywords driving the differences in food healthiness revealed foods high in saturated fat (eg, pizza, bacon, fries) were mentioned more frequently in less-affluent tracts. Food-related discussion referred to activities (eating, drinking, cooking), locations where food was consumed, and positive (affection, cravings, enjoyment) and negative attitudes (dislike, personal struggles, complaints). Discussion Tweet-based healthiness scores largely correlated with offline phenomena in the expected directions. Social media offer less resource-intensive data collection methods than traditional surveys do. Twitter may assist in informing local health programs that focus on drivers of food consumption and could inform interventions focused on attitudes and the food environment. Conclusions Twitter provided weak but significant signals concerning food-related behavior and attitudes at the neighborhood level, suggesting its potential usefulness for informing local health disparity reduction efforts.
Collapse
Affiliation(s)
- V G Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, Michigan, USA.,School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Daniel M Romero
- School of Information, University of Michigan, Ann Arbor, Michigan, USA.,Center for the Study of Complex Systems, University of Michigan, Ann Arbor, Michigan, USA.,Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, USA
| | - Xinyan Zhao
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Deahan Yu
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Iris Gomez-Lopez
- Institute for Social Research, University of Michigan, Ann Arbor, Michigan, USA
| | - Jin Xiu Lu
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Bradley E Iott
- School of Information, University of Michigan, Ann Arbor, Michigan, USA.,Department of Health Management and Policy, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA
| | - Ana Baylin
- Department of Nutritional Sciences, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA.,Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA
| | - Erica C Jansen
- Department of Nutritional Sciences, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA
| | - Philippa Clarke
- Institute for Social Research, University of Michigan, Ann Arbor, Michigan, USA.,Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA
| | - Veronica J Berrocal
- Department of Statistics, Donald Bren School of Information and Computer Science, University of California, Irvine, California, USA
| | - Robert Goodspeed
- Urban and Regional Planning Program, Taubman College of Architecture and Urban Planning, University of Michigan, Ann Arbor, Michigan, USA
| | - Tiffany C Veinot
- School of Information, University of Michigan, Ann Arbor, Michigan, USA.,Department of Health Behavior and Health Education, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
18
|
Vydiswaran VGV, Strayhorn A, Zhao X, Robinson P, Agarwal M, Bagazinski E, Essiet M, Iott BE, Joo H, Ko P, Lee D, Lu JX, Liu J, Murali A, Sasagawa K, Wang T, Yuan N. Hybrid bag of approaches to characterize selection criteria for cohort identification. J Am Med Inform Assoc 2021; 26:1172-1180. [PMID: 31197354 DOI: 10.1093/jamia/ocz079] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 03/23/2019] [Accepted: 05/01/2019] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVE The 2018 National NLP Clinical Challenge (2018 n2c2) focused on the task of cohort selection for clinical trials, where participating systems were tasked with analyzing longitudinal patient records to determine if the patients met or did not meet any of the 13 selection criteria. This article describes our participation in this shared task. MATERIALS AND METHODS We followed a hybrid approach combining pattern-based, knowledge-intensive, and feature weighting techniques. After preprocessing the notes using publicly available natural language processing tools, we developed individual criterion-specific components that relied on collecting knowledge resources relevant for these criteria and pattern-based and weighting approaches to identify "met" and "not met" cases. RESULTS As part of the 2018 n2c2 challenge, 3 runs were submitted. The overall micro-averaged F1 on the training set was 0.9444. On the test set, the micro-averaged F1 for the 3 submitted runs were 0.9075, 0.9065, and 0.9056. The best run was placed second in the overall challenge and all 3 runs were statistically similar to the top-ranked system. A reimplemented system achieved the best overall F1 of 0.9111 on the test set. DISCUSSION We highlight the need for a focused resource-intensive effort to address the class imbalance in the cohort selection identification task. CONCLUSION Our hybrid approach was able to identify all selection criteria with high F1 performance on both training and test sets. Based on our participation in the 2018 n2c2 task, we conclude that there is merit in continuing a focused criterion-specific analysis and developing appropriate knowledge resources to build a quality cohort selection system.
Collapse
Affiliation(s)
- V G Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, Michigan, USA.,School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Asher Strayhorn
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, Michigan, USA
| | - Xinyan Zhao
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Phil Robinson
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Mahesh Agarwal
- Department of Mathematics and Statistics, College of Arts, Sciences, and Letters, University of Michigan-Dearborn, Dearborn, Michigan, USA
| | - Erin Bagazinski
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Madia Essiet
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Bradley E Iott
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Hyeon Joo
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - PingJui Ko
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Dahee Lee
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Jin Xiu Lu
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Jinghui Liu
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Adharsh Murali
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Koki Sasagawa
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Tianshi Wang
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Nalingna Yuan
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
19
|
Zheng Y, Jiang Y, Dorsch MP, Ding Y, Vydiswaran VGV, Lester CA. Work effort, readability and quality of pharmacy transcription of patient directions from electronic prescriptions: a retrospective observational cohort analysis. BMJ Qual Saf 2020; 30:311-319. [PMID: 32451350 PMCID: PMC7295863 DOI: 10.1136/bmjqs-2019-010405] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Revised: 04/20/2020] [Accepted: 05/05/2020] [Indexed: 11/22/2022]
Abstract
Background Free-text directions generated by prescribers in electronic prescriptions can be difficult for patients to understand due to their variability, complexity and ambiguity. Pharmacy staff are responsible for transcribing these directions so that patients can take their medication as prescribed. However, little is known about the quality of these transcribed directions received by patients. Methods A retrospective observational analysis of 529 990 e-prescription directions processed at a mail-order pharmacy in the USA. We measured pharmacy staff editing of directions using string edit distance and execution time using the Keystroke-Level Model. Using the New Dale-Chall (NDC) readability formula, we calculated NDC cloze scores of the patient directions before and after transcription. We also evaluated the quality of directions (eg, included a dose, dose unit, frequency of administration) before and after transcription with a random sample of 966 patient directions. Results Pharmacy staff edited 83.8% of all e-prescription directions received with a median edit distance of 18 per e-prescription. We estimated a median of 6.64 s of transcribing each e-prescription. The median NDC score increased by 68.6% after transcription (26.12 vs 44.03, p<0.001), which indicated a significant readability improvement. In our sample, 51.4% of patient directions on e-prescriptions contained at least one pre-defined direction quality issue. Pharmacy staff corrected 79.5% of the quality issues. Conclusion Pharmacy staff put significant effort into transcribing e-prescription directions. Manual transcription removed the majority of quality issues; however, pharmacy staff still miss or introduce following their manual transcription processes. The development of tools and techniques such as a comprehensive set of structured direction components or machine learning–based natural language processing techniques may help produce clear directions.
Collapse
Affiliation(s)
- Yifan Zheng
- Department of Clinical Pharmacy, University of Michigan College of Pharmacy, Ann Arbor, Michigan, USA
| | - Yun Jiang
- Department of Systems, Populations and Leadership, University of Michigan School of Nursing, Ann Arbor, Michigan, USA
| | - Michael P Dorsch
- Department of Clinical Pharmacy, University of Michigan College of Pharmacy, Ann Arbor, Michigan, USA
| | - Yuting Ding
- Department of Clinical Pharmacy, University of Michigan College of Pharmacy, Ann Arbor, Michigan, USA
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Corey A Lester
- Department of Clinical Pharmacy, University of Michigan College of Pharmacy, Ann Arbor, Michigan, USA
| |
Collapse
|
20
|
Abstract
Background Online health forums have become increasingly popular over the past several years. They provide members with a platform to network with peers and share information, experiential advice, and support. Among the members of health forums, we define “peer experts” as a set of lay users who have gained expertise on the particular health topic through personal experience, and who demonstrate credibility in responding to questions from other members. This paper aims to motivate the need to identify peer experts in health forums and study their characteristics. Methods We analyze profiles and activity of members of a popular online health forum and characterize the interaction behavior of peer experts. We study the temporal patterns of comments posted by lay users and peer experts to uncover how peer expertise is developed. We further train a supervised classifier to identify peer experts based on their activity level, textual features, and temporal progression of posts. Result A support vector machine classifier with radial basis function kernel was found to be the most suitable model among those studied. Features capturing the key semantic word classes and higher mean user activity were found to be most significant features. Conclusion We define a new class of members of health forums called peer experts, and present preliminary, yet promising, approaches to distinguish peer experts from novice users. Identifying such peer expertise could potentially help improve the perceived reliability and trustworthiness of information in community health forums.
Collapse
Affiliation(s)
- V G Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan, 300 N. Ingalls St, Ann Arbor, 48108, MI, USA.
| | - Manoj Reddy
- Department of Computer Science, University of California, Los Angeles, 404 Westwood Plaza, Los Angeles, 90095, CA, USA
| |
Collapse
|
21
|
Affiliation(s)
| | - Yaoyun Zhang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX USA
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX USA
| |
Collapse
|
22
|
DeJonckheere M, Nichols LP, Vydiswaran VGV, Zhao X, Collins-Thompson K, Resnicow K, Chang T. Using Text Messaging, Social Media, and Interviews to Understand What Pregnant Youth Think About Weight Gain During Pregnancy. JMIR Form Res 2019; 3:e11397. [PMID: 30932869 PMCID: PMC6462892 DOI: 10.2196/11397] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 11/30/2018] [Accepted: 01/27/2019] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND The majority of pregnant youth gain more weight than recommended by the National Academy of Medicine guidelines. Excess weight gain during pregnancy increases the risk of dangerous complications during delivery, including operative delivery and stillbirth, and contributes to the risk of long-term obesity in both mother and child. Little is known regarding youth's perceptions of and knowledge about weight gain during pregnancy. OBJECTIVE The aim of this study was to describe the feasibility and acceptability of 3 novel data collection and analysis strategies for use with youth (social media posts, text message surveys, and semistructured interviews) to explore their experiences during pregnancy. The mixed-methods analysis included natural language processing and thematic analysis. METHODS To demonstrate the feasibility and acceptability of this novel approach, we used descriptive statistics and thematic qualitative analysis to characterize participation and engagement in the study. RESULTS Recruitment of 54 pregnant women aged between 16 and 24 years occurred from April 2016 to September 2016. All participants completed at least 1 phase of the study. Semistructured interviews had the highest rate of completion, yet all 3 strategies were feasible and acceptable to pregnant youth. CONCLUSIONS This study has described a novel youth-centered strategy of triangulating 3 sources of mixed-methods data to gain a deeper understanding of a health behavior phenomenon among an at-risk population of youth.
Collapse
Affiliation(s)
- Melissa DeJonckheere
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Lauren P Nichols
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States.,School of Information, University of Michigan, Ann Arbor, MI, United States
| | - Xinyan Zhao
- School of Information, University of Michigan, Ann Arbor, MI, United States
| | | | - Kenneth Resnicow
- School of Public Health, University of Michigan, Ann Arbor, MI, United States
| | - Tammy Chang
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States.,Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
23
|
Oram D, Tzilos Wernette G, Nichols LP, Vydiswaran VGV, Zhao X, Chang T. Substance Use Among Young Mothers: An Analysis of Facebook Posts. JMIR Pediatr Parent 2018; 1:e10261. [PMID: 31518312 PMCID: PMC6716430 DOI: 10.2196/10261] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Revised: 10/08/2018] [Accepted: 10/22/2018] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Substance use among young pregnant women is a common and significant public health concern associated with a number of adverse outcomes for both mothers and infants. Social media posts by young women can provide valuable, real-world insight into their perceptions of substance use immediately before and during pregnancy. OBJECTIVE The aim of this study was to characterize the frequency and content of posts regarding substance use in the year before pregnancy and during pregnancy among young mothers. METHODS Facebook posts were mined from young pregnant women (age, 16-24 years) who consented from 2 Midwest primary care clinics that serve a predominantly low-income community. Natural language processing was used to identify posts related to substance use by keyword searching (eg, drunk, drugs, pot, and meth). Using mixed-methods techniques, 2 investigators iteratively coded and identified major themes around substance use from these mined Facebook posts. Outcome measures include the frequency of posts and major themes expressed regarding substance use before and during pregnancy. RESULTS Women in our sample (N=43) had a mean age of 21 (SD 2.3) years, and the largest subgroup (21/43, 49%) identified as non-Hispanic black; 26% (11/43) identified as non-Hispanic white; 16% (7/43) as Hispanic; and 9% (4/43) as non-Hispanic mixed race, Native American, or other. The largest subgroup (20/43, 47%) graduated high school without further education, while 30% (13/43) completed only some high school and 23% (10/43) completed at least some postsecondary education. Young women discussed substance use on social media before and during pregnancy, although compared with the year before pregnancy, the average frequency of substance-related posts during pregnancy decreased. Themes identified included craving alcohol or marijuana, social use of alcohol or marijuana, reasons for abstaining from substance use, and intoxication. CONCLUSIONS Facebook posts reveal that young pregnant women discuss the use of substances, predominantly alcohol and marijuana. Future work can explore clinical opportunities to prevent and treat substance use before and during pregnancy among young, at-risk mothers.
Collapse
Affiliation(s)
- Daniel Oram
- Department of Family Medicine, School of Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Golfo Tzilos Wernette
- Department of Family Medicine, School of Medicine, University of Michigan, Ann Arbor, MI, United States.,Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, MI, United States
| | - Lauren P Nichols
- Department of Family Medicine, School of Medicine, University of Michigan, Ann Arbor, MI, United States
| | - V G Vinod Vydiswaran
- Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, MI, United States.,Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States.,School of Information, University of Michigan, Ann Arbor, MI, United States
| | - Xinyan Zhao
- School of Information, University of Michigan, Ann Arbor, MI, United States
| | - Tammy Chang
- Department of Family Medicine, School of Medicine, University of Michigan, Ann Arbor, MI, United States.,Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
24
|
Goodspeed R, Yan X, Hardy J, Vydiswaran VGV, Berrocal VJ, Clarke P, Romero DM, Gomez-Lopez IN, Veinot T. Comparing the Data Quality of Global Positioning System Devices and Mobile Phones for Assessing Relationships Between Place, Mobility, and Health: Field Study. JMIR Mhealth Uhealth 2018; 6:e168. [PMID: 30104185 PMCID: PMC6111146 DOI: 10.2196/mhealth.9771] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Revised: 05/16/2018] [Accepted: 06/21/2018] [Indexed: 12/20/2022] Open
Abstract
Background Mobile devices are increasingly used to collect location-based information from individuals about their physical activities, dietary intake, environmental exposures, and mental well-being. Such research, which typically uses wearable devices or mobile phones to track location, benefits from the growing availability of fine-grained data regarding human mobility. However, little is known about the comparative geospatial accuracy of such devices. Objective In this study, we compared the data quality of location information collected from two mobile devices that determine location in different ways—a global positioning system (GPS) watch and a mobile phone with Google’s Location History feature enabled. Methods A total of 21 chronically ill participants carried both devices, which generated digital traces of locations, for 28 days. A mobile phone–based brief ecological momentary assessment (EMA) survey asked participants to manually report their location at 4 random times throughout each day. Participants also took part in qualitative interviews and completed surveys twice during the study period in which they reviewed recent mobile phone and watch trace data to compare the devices’ trace data with their memory of their activities on those days. Trace data from the devices were compared on the basis of (1) missing data days, (2) reasons for missing data, (3) distance between the route data collected for matching day and the associated EMA survey locations, and (4) activity space total area and density surfaces. Results The watch resulted in a much higher proportion of missing data days (P<.001), with missing data explained by technical differences between the devices as well as participant behaviors. The mobile phone was significantly more accurate in detecting home locations (P=.004) and marginally more accurate (P=.07) for all types of locations combined. The watch data resulted in a smaller activity space area and more accurately recorded outdoor travel and recreation. Conclusions The most suitable mobile device for location-based health research depends on the particular study objectives. Furthermore, data generated from mobile devices, such as GPS phones and smartwatches, require careful analysis to ensure quality and completeness. Studies that seek precise measurement of outdoor activity and travel, such as measuring outdoor physical activity or exposure to localized environmental hazards, would benefit from the use of GPS devices. Conversely, studies that aim to account for time within buildings at home or work, or those that document visits to particular places (such as supermarkets, medical facilities, or fast food restaurants), would benefit from the greater precision demonstrated by the mobile phone in recording indoor activities.
Collapse
Affiliation(s)
- Robert Goodspeed
- Urban and Regional Planning Program, Taubman College of Architecture and Urban Planning, University of Michigan, Ann Arbor, MI, United States
| | - Xiang Yan
- Urban and Regional Planning Program, Taubman College of Architecture and Urban Planning, University of Michigan, Ann Arbor, MI, United States
| | - Jean Hardy
- School of Information, University of Michigan, Ann Arbor, MI, United States
| | - V G Vinod Vydiswaran
- School of Information, University of Michigan, Ann Arbor, MI, United States.,Department of Learning Health Sciences, Medical School, University of Michigan, Ann Arbor, MI, United States
| | - Veronica J Berrocal
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, United States
| | - Philippa Clarke
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, United States.,Institute for Social Research, University of Michigan, Ann Arbor, MI, United States
| | - Daniel M Romero
- School of Information, University of Michigan, Ann Arbor, MI, United States
| | - Iris N Gomez-Lopez
- Institute for Social Research, University of Michigan, Ann Arbor, MI, United States
| | - Tiffany Veinot
- School of Information, University of Michigan, Ann Arbor, MI, United States.,Department of Health Behavior and Health Education, School of Public Health, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
25
|
Guetterman TC, Chang T, DeJonckheere M, Basu T, Scruggs E, Vydiswaran VGV. Augmenting Qualitative Text Analysis with Natural Language Processing: Methodological Study. J Med Internet Res 2018; 20:e231. [PMID: 29959110 PMCID: PMC6045788 DOI: 10.2196/jmir.9702] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Revised: 05/14/2018] [Accepted: 05/15/2018] [Indexed: 11/18/2022] Open
Abstract
Background Qualitative research methods are increasingly being used across disciplines because of their ability to help investigators understand the perspectives of participants in their own words. However, qualitative analysis is a laborious and resource-intensive process. To achieve depth, researchers are limited to smaller sample sizes when analyzing text data. One potential method to address this concern is natural language processing (NLP). Qualitative text analysis involves researchers reading data, assigning code labels, and iteratively developing findings; NLP has the potential to automate part of this process. Unfortunately, little methodological research has been done to compare automatic coding using NLP techniques and qualitative coding, which is critical to establish the viability of NLP as a useful, rigorous analysis procedure. Objective The purpose of this study was to compare the utility of a traditional qualitative text analysis, an NLP analysis, and an augmented approach that combines qualitative and NLP methods. Methods We conducted a 2-arm cross-over experiment to compare qualitative and NLP approaches to analyze data generated through 2 text (short message service) message survey questions, one about prescription drugs and the other about police interactions, sent to youth aged 14-24 years. We randomly assigned a question to each of the 2 experienced qualitative analysis teams for independent coding and analysis before receiving NLP results. A third team separately conducted NLP analysis of the same 2 questions. We examined the results of our analyses to compare (1) the similarity of findings derived, (2) the quality of inferences generated, and (3) the time spent in analysis. Results The qualitative-only analysis for the drug question (n=58) yielded 4 major findings, whereas the NLP analysis yielded 3 findings that missed contextual elements. The qualitative and NLP-augmented analysis was the most comprehensive. For the police question (n=68), the qualitative-only analysis yielded 4 primary findings and the NLP-only analysis yielded 4 slightly different findings. Again, the augmented qualitative and NLP analysis was the most comprehensive and produced the highest quality inferences, increasing our depth of understanding (ie, details and frequencies). In terms of time, the NLP-only approach was quicker than the qualitative-only approach for the drug (120 vs 270 minutes) and police (40 vs 270 minutes) questions. An approach beginning with qualitative analysis followed by qualitative- or NLP-augmented analysis took longer time than that beginning with NLP for both drug (450 vs 240 minutes) and police (390 vs 220 minutes) questions. Conclusions NLP provides both a foundation to code qualitatively more quickly and a method to validate qualitative findings. NLP methods were able to identify major themes found with traditional qualitative analysis but were not useful in identifying nuances. Traditional qualitative text analysis added important details and context.
Collapse
Affiliation(s)
- Timothy C Guetterman
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Tammy Chang
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States.,Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, MI, United States
| | - Melissa DeJonckheere
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Tanmay Basu
- Ramakrishna Mission Vivekananda Educational and Research Institute, Belur Math, West Bengal, India
| | - Elizabeth Scruggs
- Department of Internal Medicine-Pediatrics, University of Michigan, Ann Arbor, MI, United States
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, Medical School, University of Michigan, Ann Arbor, MI, United States.,School of Information, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
26
|
DeJonckheere M, Nichols LP, Moniz MH, Sonneville KR, Vydiswaran VGV, Zhao X, Guetterman TC, Chang T. MyVoice National Text Message Survey of Youth Aged 14 to 24 Years: Study Protocol. JMIR Res Protoc 2017; 6:e247. [PMID: 29229587 PMCID: PMC5742661 DOI: 10.2196/resprot.8502] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Revised: 10/10/2017] [Accepted: 10/29/2017] [Indexed: 11/23/2022] Open
Abstract
Background There has been little progress in adolescent health outcomes in recent decades. Researchers and youth-serving organizations struggle to accurately elicit youth voice and translate youth perspectives into health care policy. Objective Our aim is to describe the protocol of the MyVoice Project, a longitudinal mixed methods study designed to engage youth, particularly those not typically included in research. Text messaging surveys are collected, analyzed, and disseminated in real time to leverage youth perspectives to impact policy. Methods Youth aged 14 to 24 years are recruited to receive weekly text message surveys on a variety of policy and health topics. The research team, including academic researchers, methodologists, and youth, develop questions through an iterative writing and piloting process. Question topics are elicited from community organizations, researchers, and policy makers to inform salient policies. A youth-centered interactive platform has been developed that automatically sends confidential weekly surveys and incentives to participants. Parental consent is not required because the survey is of minimal risk to participants. Recruitment occurs online (eg, Facebook, Instagram, university health research website) and in person at community events. Weekly surveys collect both quantitative and qualitative data. Quantitative data are analyzed using descriptive statistics. Qualitative data are quickly analyzed using natural language processing and traditional qualitative methods. Mixed methods integration and analysis supports a more in-depth understanding of the research questions. Results We are currently recruiting and enrolling participants through in-person and online strategies. Question development, weekly data collection, data analysis, and dissemination are in progress. Conclusions MyVoice quickly ascertains the thoughts and opinions of youth in real time using a widespread, readily available technology—text messaging. Results are disseminated to researchers, policy makers, and youth-serving organizations through a variety of methods. Policy makers and organizations also share their priority areas with the research team to develop additional question sets to inform important policy decisions. Youth-serving organizations can use results to make decisions to promote youth well-being.
Collapse
Affiliation(s)
- Melissa DeJonckheere
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Lauren P Nichols
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Michelle H Moniz
- Department of Obstetrics & Gynecology, University of Michigan, Ann Arbor, MI, United States.,Institute for Healthcare Policy & Innovation, University of Michigan, Ann Arbor, MI, United States
| | - Kendrin R Sonneville
- Department of Nutritional Sciences, School of Public Health, University of Michigan, Ann Arbor, MI, United States
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States.,School of Information, University of Michigan, Ann Arbor, MI, United States
| | - Xinyan Zhao
- School of Information, University of Michigan, Ann Arbor, MI, United States
| | - Timothy C Guetterman
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States.,Institute for Healthcare Policy & Innovation, University of Michigan, Ann Arbor, MI, United States
| | - Tammy Chang
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States.,Institute for Healthcare Policy & Innovation, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
27
|
Gomez-Lopez IN, Clarke P, Hill AB, Romero DM, Goodspeed R, Berrocal VJ, Vinod Vydiswaran VG, Veinot TC. Using Social Media to Identify Sources of Healthy Food in Urban Neighborhoods. J Urban Health 2017; 94:429-436. [PMID: 28455606 PMCID: PMC5481219 DOI: 10.1007/s11524-017-0154-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
An established body of research has used secondary data sources (such as proprietary business databases) to demonstrate the importance of the neighborhood food environment for multiple health outcomes. However, documenting food availability using secondary sources in low-income urban neighborhoods can be particularly challenging since small businesses play a crucial role in food availability. These small businesses are typically underrepresented in national databases, which rely on secondary sources to develop data for marketing purposes. Using social media and other crowdsourced data to account for these smaller businesses holds promise, but the quality of these data remains unknown. This paper compares the quality of full-line grocery store information from Yelp, a crowdsourced content service, to a "ground truth" data set (Detroit Food Map) and a commercially-available dataset (Reference USA) for the greater Detroit area. Results suggest that Yelp is more accurate than Reference USA in identifying healthy food stores in urban areas. Researchers investigating the relationship between the nutrition environment and health may consider Yelp as a reliable and valid source for identifying sources of healthy food in urban environments.
Collapse
Affiliation(s)
| | - Philippa Clarke
- Institute for Social Research and Department of Epidemiology, University of Michigan, 426 Thompson Street, Ann Arbor, MI, 48104, USA.
| | - Alex B Hill
- Detroit Food Map Initiative, Detroit, MI, USA
| | - Daniel M Romero
- School of Information, University of Michigan, Ann Arbor, MI, USA
| | - Robert Goodspeed
- Taubman College of Architecture and Urban Planning, University of Michigan, Ann Arbor, MI, USA
| | | | - V G Vinod Vydiswaran
- School of Information, University of Michigan, Ann Arbor, MI, USA.,Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, USA
| | - Tiffany C Veinot
- School of Information, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
28
|
Hanauer DA, Wu DTY, Yang L, Mei Q, Murkowski-Steffy KB, Vydiswaran VGV, Zheng K. Development and empirical user-centered evaluation of semantically-based query recommendation for an electronic health record search engine. J Biomed Inform 2017; 67:1-10. [PMID: 28131722 DOI: 10.1016/j.jbi.2017.01.013] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Revised: 12/21/2016] [Accepted: 01/23/2017] [Indexed: 02/01/2023]
Abstract
OBJECTIVE The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). MATERIALS AND METHODS The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. RESULTS The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. DISCUSSION AND CONCLUSION Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge.
Collapse
Affiliation(s)
- David A Hanauer
- Department of Pediatrics, University of Michigan Medical School, 5312 CC, SPC 5940, 1500 East Medical Center Drive, Ann Arbor, MI 48109, USA; School of Information, University of Michigan, 105 South State Street, Ann Arbor, MI 48109, USA.
| | - Danny T Y Wu
- School of Information, University of Michigan, 105 South State Street, Ann Arbor, MI 48109, USA; Department of Pediatrics, University of Michigan Medical School, 5312 CC, SPC 5940, 1500 East Medical Center Drive, Ann Arbor, MI 48109, USA.
| | - Lei Yang
- School of Information, University of Michigan, 105 South State Street, Ann Arbor, MI 48109, USA.
| | - Qiaozhu Mei
- School of Information, University of Michigan, 105 South State Street, Ann Arbor, MI 48109, USA; Department of Electrical Engineering and Computer Science, University of Michigan, 2260 Hayward Street, Ann Arbor, MI 48109, USA.
| | - Katherine B Murkowski-Steffy
- Department of Health Management and Policy, School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA.
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan Medical School, 1111 East Catherine Street, Ann Arbor, MI 48109, USA; School of Information, University of Michigan, 105 South State Street, Ann Arbor, MI 48109, USA.
| | - Kai Zheng
- Department of Health Management and Policy, School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA; School of Information, University of Michigan, 105 South State Street, Ann Arbor, MI 48109, USA.
| |
Collapse
|
29
|
Zheng K, Vydiswaran VGV, Liu Y, Wang Y, Stubbs A, Uzuner Ö, Gururaj AE, Bayer S, Aberdeen J, Rumshisky A, Pakhomov S, Liu H, Xu H. Ease of adoption of clinical natural language processing software: An evaluation of five systems. J Biomed Inform 2015; 58 Suppl:S189-S196. [PMID: 26210361 PMCID: PMC4974203 DOI: 10.1016/j.jbi.2015.07.008] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Revised: 06/09/2015] [Accepted: 07/06/2015] [Indexed: 12/19/2022]
Abstract
OBJECTIVE In recognition of potential barriers that may inhibit the widespread adoption of biomedical software, the 2014 i2b2 Challenge introduced a special track, Track 3 - Software Usability Assessment, in order to develop a better understanding of the adoption issues that might be associated with the state-of-the-art clinical NLP systems. This paper reports the ease of adoption assessment methods we developed for this track, and the results of evaluating five clinical NLP system submissions. MATERIALS AND METHODS A team of human evaluators performed a series of scripted adoptability test tasks with each of the participating systems. The evaluation team consisted of four "expert evaluators" with training in computer science, and eight "end user evaluators" with mixed backgrounds in medicine, nursing, pharmacy, and health informatics. We assessed how easy it is to adopt the submitted systems along the following three dimensions: communication effectiveness (i.e., how effective a system is in communicating its designed objectives to intended audience), effort required to install, and effort required to use. We used a formal software usability testing tool, TURF, to record the evaluators' interactions with the systems and 'think-aloud' data revealing their thought processes when installing and using the systems and when resolving unexpected issues. RESULTS Overall, the ease of adoption ratings that the five systems received are unsatisfactory. Installation of some of the systems proved to be rather difficult, and some systems failed to adequately communicate their designed objectives to intended adopters. Further, the average ratings provided by the end user evaluators on ease of use and ease of interpreting output are -0.35 and -0.53, respectively, indicating that this group of users generally deemed the systems extremely difficult to work with. While the ratings provided by the expert evaluators are higher, 0.6 and 0.45, respectively, these ratings are still low indicating that they also experienced considerable struggles. DISCUSSION The results of the Track 3 evaluation show that the adoptability of the five participating clinical NLP systems has a great margin for improvement. Remedy strategies suggested by the evaluators included (1) more detailed and operation system specific use instructions; (2) provision of more pertinent onscreen feedback for easier diagnosis of problems; (3) including screen walk-throughs in use instructions so users know what to expect and what might have gone wrong; (4) avoiding jargon and acronyms in materials intended for end users; and (5) packaging prerequisites required within software distributions so that prospective adopters of the software do not have to obtain each of the third-party components on their own.
Collapse
Affiliation(s)
- Kai Zheng
- School of Public Health Department of Health Management and Policy, University of Michigan, Ann Arbor, MI, USA; School of Information, University of Michigan, Ann Arbor, MI, USA.
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Yang Liu
- School of Information, University of Michigan, Ann Arbor, MI, USA
| | - Yue Wang
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA
| | - Amber Stubbs
- School of Library and Information Science, Simmons College, Boston, MA, USA
| | - Özlem Uzuner
- Department of Information Studies, University at Albany, SUNY, Albany, NY, USA
| | - Anupama E Gururaj
- The University of Texas School of Biomedical Informatics at Houston, Houston, TX, USA
| | | | | | - Anna Rumshisky
- Department of Computer Science, University of Massachusetts, Lowell, MA, USA
| | | | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Hua Xu
- The University of Texas School of Biomedical Informatics at Houston, Houston, TX, USA.
| |
Collapse
|
30
|
Wu DTY, Hanauer DA, Mei Q, Clark PM, An LC, Proulx J, Zeng QT, Vydiswaran VGV, Collins-Thompson K, Zheng K. Assessing the readability of ClinicalTrials.gov. J Am Med Inform Assoc 2015; 23:269-75. [PMID: 26269536 DOI: 10.1093/jamia/ocv062] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 05/04/2015] [Indexed: 01/22/2023] Open
Abstract
OBJECTIVE ClinicalTrials.gov serves critical functions of disseminating trial information to the public and helping the trials recruit participants. This study assessed the readability of trial descriptions at ClinicalTrials.gov using multiple quantitative measures. MATERIALS AND METHODS The analysis included all 165,988 trials registered at ClinicalTrials.gov as of April 30, 2014. To obtain benchmarks, the authors also analyzed 2 other medical corpora: (1) all 955 Health Topics articles from MedlinePlus and (2) a random sample of 100,000 clinician notes retrieved from an electronic health records system intended for conveying internal communication among medical professionals. The authors characterized each of the corpora using 4 surface metrics, and then applied 5 different scoring algorithms to assess their readability. The authors hypothesized that clinician notes would be most difficult to read, followed by trial descriptions and MedlinePlus Health Topics articles. RESULTS Trial descriptions have the longest average sentence length (26.1 words) across all corpora; 65% of their words used are not covered by a basic medical English dictionary. In comparison, average sentence length of MedlinePlus Health Topics articles is 61% shorter, vocabulary size is 95% smaller, and dictionary coverage is 46% higher. All 5 scoring algorithms consistently rated CliniclTrials.gov trial descriptions the most difficult corpus to read, even harder than clinician notes. On average, it requires 18 years of education to properly understand these trial descriptions according to the results generated by the readability assessment algorithms. DISCUSSION AND CONCLUSION Trial descriptions at CliniclTrials.gov are extremely difficult to read. Significant work is warranted to improve their readability in order to achieve CliniclTrials.gov's goal of facilitating information dissemination and subject recruitment.
Collapse
Affiliation(s)
- Danny T Y Wu
- School of Information, University of Michigan, Ann Arbor, MI, USA
| | - David A Hanauer
- School of Information, University of Michigan, Ann Arbor, MI, USA Department of Pediatrics, University of Michigan, Ann Arbor, MI, USA
| | - Qiaozhu Mei
- School of Information, University of Michigan, Ann Arbor, MI, USA Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA
| | - Patricia M Clark
- School of Nursing, University of Michigan, Ann Arbor, MI, USA Center for Health Communication Research, University of Michigan, Ann Arbor, MI, USA
| | - Lawrence C An
- Center for Health Communication Research, University of Michigan, Ann Arbor, MI, USA Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Joshua Proulx
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Qing T Zeng
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | | | - Kevyn Collins-Thompson
- School of Information, University of Michigan, Ann Arbor, MI, USA Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA
| | - Kai Zheng
- School of Information, University of Michigan, Ann Arbor, MI, USA School of Public Health Department of Health Management and Policy, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
31
|
Affiliation(s)
- V. G. Vinod Vydiswaran
- School of Information; University of Michigan, Ann Arbor; 105 S. State Street Ann Arbor MI 48109
| | - ChengXiang Zhai
- Department of Computer Science; University of Illinois, Urbana-Champaign; 201 N. Goodwin Avenue, MC-258 Urbana IL 61801
| | - Dan Roth
- Department of Computer Science; University of Illinois, Urbana-Champaign; 201 N. Goodwin Avenue, MC-258 Urbana IL 61801
| | - Peter Pirolli
- Palo Alto Research Center; 3333 Coyote Hill Road Palo Alto CA 94304
| |
Collapse
|
32
|
Vydiswaran VGV, Mei Q, Hanauer DA, Zheng K. Mining consumer health vocabulary from community-generated text. AMIA Annu Symp Proc 2014; 2014:1150-1159. [PMID: 25954426 PMCID: PMC4419967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Community-generated text corpora can be a valuable resource to extract consumer health vocabulary (CHV) and link them to professional terminologies and alternative variants. In this research, we propose a pattern-based text-mining approach to identify pairs of CHV and professional terms from Wikipedia, a large text corpus created and maintained by the community. A novel measure, leveraging the ratio of frequency of occurrence, was used to differentiate consumer terms from professional terms. We empirically evaluated the applicability of this approach using a large data sample consisting of MedLine abstracts and all posts from an online health forum, MedHelp. The results show that the proposed approach is able to identify synonymous pairs and label the terms as either consumer or professional term with high accuracy. We conclude that the proposed approach provides great potential to produce a high quality CHV to improve the performance of computational applications in processing consumer-generated health text.
Collapse
Affiliation(s)
| | - Qiaozhu Mei
- School of Information, University of Michigan, Ann Arbor, MI ; Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI
| | - David A Hanauer
- Department of Pediatrics, University of Michigan, Ann Arbor, MI ; School of Information, University of Michigan, Ann Arbor, MI
| | - Kai Zheng
- School of Public Health Department of Health Management and Policy, University of Michigan, Ann Arbor, MI ; School of Information, University of Michigan, Ann Arbor, MI
| |
Collapse
|