1
|
Lin AY, Arabandi S, Beale T, Duncan WD, Hicks A, Hogan WR, Jensen M, Koppel R, Martínez-Costa C, Nytrø Ø, Obeid JS, de Oliveira JP, Ruttenberg A, Seppälä S, Smith B, Soergel D, Zheng J, Schulz S. Improving the Quality and Utility of Electronic Health Record Data through Ontologies. STANDARDS (BASEL, SWITZERLAND) 2023; 3:316-340. [PMID: 37873508 PMCID: PMC10591519 DOI: 10.3390/standards3030023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
The translational research community, in general, and the Clinical and Translational Science Awards (CTSA) community, in particular, share the vision of repurposing EHRs for research that will improve the quality of clinical practice. Many members of these communities are also aware that electronic health records (EHRs) suffer limitations of data becoming poorly structured, biased, and unusable out of original context. This creates obstacles to the continuity of care, utility, quality improvement, and translational research. Analogous limitations to sharing objective data in other areas of the natural sciences have been successfully overcome by developing and using common ontologies. This White Paper presents the authors' rationale for the use of ontologies with computable semantics for the improvement of clinical data quality and EHR usability formulated for researchers with a stake in clinical and translational science and who are advocates for the use of information technology in medicine but at the same time are concerned by current major shortfalls. This White Paper outlines pitfalls, opportunities, and solutions and recommends increased investment in research and development of ontologies with computable semantics for a new generation of EHRs.
Collapse
Affiliation(s)
- Asiyah Yu Lin
- National Institutes of Health, Bethesda, MD 20892, USA
| | | | | | - William D. Duncan
- College of Dentistry, University of Florida, Gainesville, FL 32610, USA
| | - Amanda Hicks
- The Johns Hopkins University Applied Physics Laboratory, Laurel, MD 20723, USA
| | - William R. Hogan
- Data Science Institute, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | | | - Ross Koppel
- Department of Medical Informatics, Jacobs School of Medicine, University at Buffalo, Buffalo, NY 14260, USA
- Department of Medical Informatics, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Catalina Martínez-Costa
- Department of Informatics and Systems, Faculty of Computer Science, University of Murcia, 30100 Murcia, Spain
| | - Øystein Nytrø
- Department of Computer Science, UIT Arctic University of Norway, 9037 Tromsø, Norway
- Department of Computer Science, Norwegian University of Science and Technology, 7491 Trondheim, Norway
| | - Jihad S. Obeid
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425, USA
| | | | - Alan Ruttenberg
- School of Dental Medicine, University at Buffalo, Buffalo, NY 14260, USA
| | - Selja Seppälä
- Department of Business Information Systems, University College Cork, T12 K8AF Cork, Ireland
| | - Barry Smith
- Department of Philosophy, University at Buffalo, Buffalo, NY 14260, USA
| | - Dagobert Soergel
- Department of Philosophy, University at Buffalo, Buffalo, NY 14260, USA
| | - Jie Zheng
- Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI 48104, USA
| | - Stefan Schulz
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, 8036 Graz, Austria
- Averbis GmbH, Salzstrasse 15, 79098 Freiburg im Breisgau, Germany
| |
Collapse
|
2
|
van Mens HJ, Martens SS, Paiman EH, Mertens AC, Nienhuis R, de Keizer NF, Cornet R. Diagnosis clarification by generalization to patient-friendly terms and definitions: Validation study. J Biomed Inform 2022; 129:104071. [DOI: 10.1016/j.jbi.2022.104071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 03/12/2022] [Accepted: 04/05/2022] [Indexed: 11/16/2022]
|
3
|
Newman-Griffis D, Divita G, Desmet B, Zirikly A, Rosé CP, Fosler-Lussier E. Ambiguity in medical concept normalization: An analysis of types and coverage in electronic health record datasets. J Am Med Inform Assoc 2021; 28:516-532. [PMID: 33319905 DOI: 10.1093/jamia/ocaa269] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 09/13/2020] [Accepted: 11/17/2020] [Indexed: 12/18/2022] Open
Abstract
OBJECTIVES Normalizing mentions of medical concepts to standardized vocabularies is a fundamental component of clinical text analysis. Ambiguity-words or phrases that may refer to different concepts-has been extensively researched as part of information extraction from biomedical literature, but less is known about the types and frequency of ambiguity in clinical text. This study characterizes the distribution and distinct types of ambiguity exhibited by benchmark clinical concept normalization datasets, in order to identify directions for advancing medical concept normalization research. MATERIALS AND METHODS We identified ambiguous strings in datasets derived from the 2 available clinical corpora for concept normalization and categorized the distinct types of ambiguity they exhibited. We then compared observed string ambiguity in the datasets with potential ambiguity in the Unified Medical Language System (UMLS) to assess how representative available datasets are of ambiguity in clinical language. RESULTS We found that <15% of strings were ambiguous within the datasets, while over 50% were ambiguous in the UMLS, indicating only partial coverage of clinical ambiguity. The percentage of strings in common between any pair of datasets ranged from 2% to only 36%; of these, 40% were annotated with different sets of concepts, severely limiting generalization. Finally, we observed 12 distinct types of ambiguity, distributed unequally across the available datasets, reflecting diverse linguistic and medical phenomena. DISCUSSION Existing datasets are not sufficient to cover the diversity of clinical concept ambiguity, limiting both training and evaluation of normalization methods for clinical text. Additionally, the UMLS offers important semantic information for building and evaluating normalization methods. CONCLUSIONS Our findings identify 3 opportunities for concept normalization research, including a need for ambiguity-specific clinical datasets and leveraging the rich semantics of the UMLS in new methods and evaluation measures for normalization.
Collapse
Affiliation(s)
- Denis Newman-Griffis
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, Maryland, USA.,Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio, USA
| | - Guy Divita
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, Maryland, USA
| | - Bart Desmet
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, Maryland, USA
| | - Ayah Zirikly
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, Maryland, USA
| | - Carolyn P Rosé
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, Maryland, USA.,Language Technologies Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Eric Fosler-Lussier
- Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
4
|
Ibrahim M, Gauch S, Salman O, Alqahtani M. An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resource. PeerJ Comput Sci 2021; 7:e668. [PMID: 34458573 PMCID: PMC8371999 DOI: 10.7717/peerj-cs.668] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 07/19/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND Clear language makes communication easier between any two parties. A layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical terminology which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several professional vocabularies and ontologies have been created to map laymen medical terms to professional medical terms and vice versa. OBJECTIVE Many of the presented vocabularies are built manually or semi-automatically requiring large investments of time and human effort and consequently the slow growth of these vocabularies. In this paper, we present an automatic method to enrich laymen's vocabularies that has the benefit of being able to be applied to vocabularies in any domain. METHODS Our entirely automatic approach uses machine learning, specifically Global Vectors for Word Embeddings (GloVe), on a corpus collected from a social media healthcare platform to extend and enhance consumer health vocabularies. Our approach further improves the consumer health vocabularies by incorporating synonyms and hyponyms from the WordNet ontology. The basic GloVe and our novel algorithms incorporating WordNet were evaluated using two laymen datasets from the National Library of Medicine (NLM), Open-Access Consumer Health Vocabulary (OAC CHV) and MedlinePlus Healthcare Vocabulary. RESULTS The results show that GloVe was able to find new laymen terms with an F-score of 48.44%. Furthermore, our enhanced GloVe approach outperformed basic GloVe with an average F-score of 61%, a relative improvement of 25%. Furthermore, the enhanced GloVe showed a statistical significance over the two ground truth datasets with P < 0.001. CONCLUSIONS This paper presents an automatic approach to enrich consumer health vocabularies using the GloVe word embeddings and an auxiliary lexical source, WordNet. Our approach was evaluated used healthcare text downloaded from MedHelp.org, a healthcare social media platform using two standard laymen vocabularies, OAC CHV, and MedlinePlus. We used the WordNet ontology to expand the healthcare corpus by including synonyms, hyponyms, and hypernyms for each layman term occurrence in the corpus. Given a seed term selected from a concept in the ontology, we measured our algorithms' ability to automatically extract synonyms for those terms that appeared in the ground truth concept. We found that enhanced GloVe outperformed GloVe with a relative improvement of 25% in the F-score.
Collapse
|
5
|
Li P, Xu L, Tang T, Wu X, Huang C. Users' Willingness to Share Health Information in a Social Question-and-Answer Community: Cross-sectional Survey in China. JMIR Med Inform 2021; 9:e26265. [PMID: 33783364 PMCID: PMC8075348 DOI: 10.2196/26265] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 02/10/2021] [Accepted: 03/07/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Social question-and-answer communities play an increasingly important role in the dissemination of health information. It is important to identify influencing factors of user willingness to share health information to improve public health literacy. OBJECTIVE This study explored influencing factors of social question-and-answer community users who share health information to provide reference for the construction of a high-quality health information sharing community. METHODS A cross-sectional study was conducted through snowball sampling of 185 participants who are Zhihu users in China. A structural equation analysis was used to verify the interaction and influence of the strength between variables in the model. Hierarchical regression was also used to test the mediating effect in the model. RESULTS Altruism (β=.264, P<.001), intrinsic reward (β=.260, P=.03), self-efficacy (β=.468, P<.001), and community influence (β=.277, P=.003) had a positive effect on users' willingness to share health information (WSHI). By contrast, extrinsic reward (β=-0.351, P<.001) had a negative effect. Self-efficacy also had a mediating effect (β=.147, 29.15%, 0.147/0.505) between community influence and WSHI. CONCLUSIONS The findings suggest that users' WSHI is influenced by many factors including altruism, self-efficacy, community influence, and intrinsic reward. Improving the social atmosphere of the platform is an effective method of encouraging users to share health information.
Collapse
Affiliation(s)
- PengFei Li
- College of Medical Informatics, Chongqing Medical University, Chongqing, China.,Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - Lin Xu
- College of Medical Informatics, Chongqing Medical University, Chongqing, China.,Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - TingTing Tang
- College of Medical Informatics, Chongqing Medical University, Chongqing, China.,The Children's Hospital of Chongqing Medical University, Chongqing, China
| | - Xiaoqian Wu
- College of Medical Informatics, Chongqing Medical University, Chongqing, China.,Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - Cheng Huang
- College of Medical Informatics, Chongqing Medical University, Chongqing, China.,Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| |
Collapse
|
6
|
Sarker A, DeRoos A, Perrone J. Mining social media for prescription medication abuse monitoring: a review and proposal for a data-centric framework. J Am Med Inform Assoc 2021; 27:315-329. [PMID: 31584645 PMCID: PMC7025330 DOI: 10.1093/jamia/ocz162] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 08/14/2019] [Indexed: 01/02/2023] Open
Abstract
Objective Prescription medication (PM) misuse and abuse is a major health problem globally, and a number of recent studies have focused on exploring social media as a resource for monitoring nonmedical PM use. Our objectives are to present a methodological review of social media–based PM abuse or misuse monitoring studies, and to propose a potential generalizable, data-centric processing pipeline for the curation of data from this resource. Materials and Methods We identified studies involving social media, PMs, and misuse or abuse (inclusion criteria) from Medline, Embase, Scopus, Web of Science, and Google Scholar. We categorized studies based on multiple characteristics including but not limited to data size; social media source(s); medications studied; and primary objectives, methods, and findings. Results A total of 39 studies met our inclusion criteria, with 31 (∼79.5%) published since 2015. Twitter has been the most popular resource, with Reddit and Instagram gaining popularity recently. Early studies focused mostly on manual, qualitative analyses, with a growing trend toward the use of data-centric methods involving natural language processing and machine learning. Discussion There is a paucity of standardized, data-centric frameworks for curating social media data for task-specific analyses and near real-time surveillance of nonmedical PM use. Many existing studies do not quantify human agreements for manual annotation tasks or take into account the presence of noise in data. Conclusion The development of reproducible and standardized data-centric frameworks that build on the current state-of-the-art methods in data and text mining may enable effective utilization of social media data for understanding and monitoring nonmedical PM use.
Collapse
Affiliation(s)
- Abeed Sarker
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, Georgia, USA
| | - Annika DeRoos
- College of Arts and Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Jeanmarie Perrone
- Department of Emergency Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
7
|
Fodeh SJ, Al-Garadi M, Elsankary O, Perrone J, Becker W, Sarker A. Utilizing a multi-class classification approach to detect therapeutic and recreational misuse of opioids on Twitter. Comput Biol Med 2020; 129:104132. [PMID: 33290931 DOI: 10.1016/j.compbiomed.2020.104132] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 11/10/2020] [Accepted: 11/16/2020] [Indexed: 10/23/2022]
Abstract
BACKGROUND Opioid misuse (OM) is a major health problem in the United States, and can lead to addiction and fatal overdose. We sought to employ natural language processing (NLP) and machine learning to categorize Twitter chatter based on the motive of OM. MATERIALS AND METHODS We collected data from Twitter using opioid-related keywords, and manually annotated 6988 tweets into three classes-No-OM, Pain-related-OM, and Recreational-OM-with the No-OM class representing tweets indicating no use/misuse, and the Pain-related misuse and Recreational-misuse classes representing misuse for pain or recreation/addiction. We trained and evaluated multi-class classifiers, and performed term-level k-means clustering to assess whether there were terms closely associated with the three classes. RESULTS On a held-out test set of 1677 tweets, a transformer-based classifier (XLNet) achieved the best performance with F1-score of 0.71 for the Pain-misuse class, and 0.79 for the Recreational-misuse class. Macro- and micro-averaged F1-scores over all classes were 0.82 and 0.92, respectively. Content-analysis using clustering revealed distinct clusters of terms associated with each class. DISCUSSION While some past studies have attempted to automatically detect opioid misuse, none have further characterized the motive for misuse. Our multi-class classification approach using XLNet showed promising performance, including in detecting the subtle differences between pain-related and recreation-related misuse. The distinct clustering of class-specific keywords may help conduct targeted data collection, overcoming under-representation of minority classes. CONCLUSION Machine learning can help identify pain-related and recreational-related OM contents on Twitter to potentially enable the study of the characteristics of individuals exhibiting such behavior.
Collapse
Affiliation(s)
- Samah Jamal Fodeh
- Department of Emergency Medicine, Yale School of Medicine, Yale University, New Haven, CT 06510, USA; VA Connecticut Healthcare System, West Haven, CT 06516, USA.
| | - Mohammed Al-Garadi
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA 30322, USA
| | - Osama Elsankary
- Frank Netter M.D. School of Medicine, Quinnipiac University, North Haven, CT 06473, USA
| | - Jeanmarie Perrone
- Department of Emergency Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - William Becker
- VA Connecticut Healthcare System, West Haven, CT 06516, USA
| | - Abeed Sarker
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
8
|
An Automatic Approach to Extending the Consumer Health Vocabulary. JOURNAL OF DATA AND INFORMATION SCIENCE 2020. [DOI: 10.2478/jdis-2021-0003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Abstract
Purpose
Given the ubiquitous presence of the internet in our lives, many individuals turn to the web for medical information. A challenge here is that many laypersons (as “consumers”) do not use professional terms found in the medical nomenclature when describing their conditions and searching the internet. The Consumer Health Vocabulary (CHV) ontology, initially developed in 2007, aimed to bridge this gap, although updates have been limited over the last decade. The purpose of this research is to implement a means of automatically creating a hierarchical consumer health vocabulary. This overall purpose is improving consumers’ ability to search for medical conditions and symptoms with an enhanced CHV and improving the search capabilities of our searching and indexing tool HIVE (Helping Interdisciplinary Vocabulary Engineering).
Design/methodology/approach
The research design uses ontological fusion, an approach for automatically extracting and integrating the Medical Subject Headings (MeSH) ontology into CHV, and further convert CHV from a flat mapping to a hierarchical ontology. The additional relationships and parent terms from MeSH allow us to uncover relationships between existing terms in the CHV ontology as well. The research design also included improving the search capabilities of HIVE identifying alternate relationships and consolidating them to a single entry.
Findings
The key findings are an improved CHV with a hierarchical structure that enables consumers to search through the ontology and uncover more relationships.
Research limitations
There are some cases where the improved search results in HIVE return terms that are related but not completely synonymous. We present an example and discuss the implications of this result.
Practical implications
This research makes available an updated and richer CHV ontology using the HIVE tool. Consumers may use this tool to search consumer terminology for medical conditions and symptoms. The HIVE tool will return results about the medical term linked with the consumer term as well as the hierarchy of other medical terms connected to the term.
Originality/value
This is a first attempt in over a decade to improve and enhance the CHV ontology with current terminology and the first research effort to convert CHV's original flat ontology structure to a hierarchical structure. This research also enhances the HIVE infrastructure and provides consumers with a simple, efficient mechanism for searching the CHV ontology and providing meaningful data to consumers.
Collapse
|
9
|
Wu DTY, Xin C, Bindhu S, Xu C, Sachdeva J, Brown JL, Jung H. Clinician Perspectives and Design Implications in Using Patient-Generated Health Data to Improve Mental Health Practices: Mixed Methods Study. JMIR Form Res 2020; 4:e18123. [PMID: 32763884 PMCID: PMC7442947 DOI: 10.2196/18123] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 05/25/2020] [Accepted: 06/15/2020] [Indexed: 01/10/2023] Open
Abstract
Background Patient-generated health data (PGHD) have been largely collected through mobile health (mHealth) apps and wearable devices. PGHD can be especially helpful in mental health, as patients’ illness history and symptom narratives are vital to developing diagnoses and treatment plans. However, the extent to which clinicians use mental health–related PGHD is unknown. Objective A mixed methods study was conducted to understand clinicians’ perspectives on PGHD and current mental health apps. This approach uses information gathered from semistructured interviews, workflow analysis, and user-written mental health app reviews to answer the following research questions: (1) What is the current workflow of mental health practice and how are PGHD integrated into this workflow, (2) what are clinicians’ perspectives on PGHD and how do they choose mobile apps for their patients, (3) and what are the features of current mobile apps in terms of interpreting and sharing PGHD? Methods The study consists of semistructured interviews with 12 psychiatrists and clinical psychologists from a large academic hospital. These interviews were thematically and qualitatively analyzed for common themes and workflow elements. User-posted reviews of 56 sleep and mood tracking apps were analyzed to understand app features in comparison with the information gathered from interviews. Results The results showed that PGHD have been part of the workflow, but its integration and use are not optimized. Mental health clinicians supported the use of PGHD but had concerns regarding data reliability and accuracy. They also identified challenges in selecting suitable apps for their patients. From the app review, it was discovered that mHealth apps had limited features to support personalization and collaborative care as well as data interpretation and sharing. Conclusions This study investigates clinicians’ perspectives on PGHD use and explored existing app features using the app review data in the mental health setting. A total of 3 design guidelines were generated: (1) improve data interpretation and sharing mechanisms, (2) consider clinical workflow and electronic health record integration, and (3) support personalized and collaborative care. More research is needed to demonstrate the best practices of PGHD use and to evaluate their effectiveness in improving patient outcomes.
Collapse
Affiliation(s)
- Danny T Y Wu
- Department of Biomedical Informatics, College of Medicine, University of Cincinnati, Cincinnati, OH, United States.,Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, United States
| | - Chen Xin
- Department of Biomedical Informatics, College of Medicine, University of Cincinnati, Cincinnati, OH, United States.,School of Design, College of Design, Architecture, Art, and Planning, University of Cincinnati, Cincinnati, OH, United States
| | - Shwetha Bindhu
- Department of Biomedical Informatics, College of Medicine, University of Cincinnati, Cincinnati, OH, United States.,Medical Sciences Baccalaureate Program, College of Medicine, University of Cincinnati, Cincinnati, OH, United States
| | - Catherine Xu
- Department of Biomedical Informatics, College of Medicine, University of Cincinnati, Cincinnati, OH, United States.,Medical Sciences Baccalaureate Program, College of Medicine, University of Cincinnati, Cincinnati, OH, United States
| | - Jyoti Sachdeva
- Department of Psychiatry and Behavioral Neuroscience, College of Medicine, University of Cincinnati, Cincinnati, OH, United States
| | - Jennifer L Brown
- Department of Psychiatry and Behavioral Neuroscience, College of Medicine, University of Cincinnati, Cincinnati, OH, United States
| | - Heekyoung Jung
- School of Design, College of Design, Architecture, Art, and Planning, University of Cincinnati, Cincinnati, OH, United States
| |
Collapse
|
10
|
Yu B, He Z, Xing A, Lustria MLA. An Informatics Framework to Assess Consumer Health Language Complexity Differences: Proof-of-Concept Study. J Med Internet Res 2020; 22:e16795. [PMID: 32436849 PMCID: PMC7273233 DOI: 10.2196/16795] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 01/21/2020] [Accepted: 02/21/2020] [Indexed: 11/23/2022] Open
Abstract
Background The language gap between health consumers and health professionals has been long recognized as the main hindrance to effective health information comprehension. Although providing health information access in consumer health language (CHL) is widely accepted as the solution to the problem, health consumers are found to have varying health language preferences and proficiencies. To simplify health documents for heterogeneous consumer groups, it is important to quantify how CHLs are different in terms of complexity among various consumer groups. Objective This study aimed to propose an informatics framework (consumer health language complexity [CHELC]) to assess the complexity differences of CHL using syntax-level, text-level, term-level, and semantic-level complexity metrics. Specifically, we identified 8 language complexity metrics validated in previous literature and combined them into a 4-faceted framework. Through a rank-based algorithm, we developed unifying scores (CHELC scores [CHELCS]) to quantify syntax-level, text-level, term-level, semantic-level, and overall CHL complexity. We applied CHELCS to compare posts of each individual on online health forums designed for (1) the general public, (2) deaf and hearing-impaired people, and (3) people with autism spectrum disorder (ASD). Methods We examined posts with more than 4 sentences of each user from 3 health forums to understand CHL complexity differences among these groups: 12,560 posts from 3756 users in Yahoo! Answers, 25,545 posts from 1623 users in AllDeaf, and 26,484 posts from 2751 users in Wrong Planet. We calculated CHELCS for each user and compared the scores of 3 user groups (ie, deaf and hearing-impaired people, people with ASD, and the public) through 2-sample Kolmogorov-Smirnov tests and analysis of covariance tests. Results The results suggest that users in the public forum used more complex CHL, particularly more diverse semantics and more complex health terms compared with users in the ASD and deaf and hearing-impaired user forums. However, between the latter 2 groups, people with ASD used more complex words, and deaf and hearing-impaired users used more complex syntax. Conclusions Our results show that the users in 3 online forums had significantly different CHL complexities in different facets. The proposed framework and detailed measurements help to quantify these CHL complexity differences comprehensively. The results emphasize the importance of tailoring health-related content for different consumer groups with varying CHL complexities.
Collapse
Affiliation(s)
- Biyang Yu
- Florida State University, School of Information, Tallahassee, FL, United States
| | - Zhe He
- Florida State University, School of Information, Tallahassee, FL, United States
| | - Aiwen Xing
- Florida State University, Department of Statistics, Tallahassee, FL, United States
| | - Mia Liza A Lustria
- Florida State University, School of Information, Tallahassee, FL, United States
| |
Collapse
|
11
|
Khaleghi T, Murat A, Arslanturk S, Davies E. Automated Surgical Term Clustering: A Text Mining Approach for Unstructured Textual Surgery Descriptions. IEEE J Biomed Health Inform 2019; 24:2107-2118. [PMID: 31796420 DOI: 10.1109/jbhi.2019.2956973] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
High costs in health care and everlasting need for quality improvement in care delivery is increasingly becoming the motivating factor for novel predictive studies in health care informatics. Surgical services impact both the operating theatre costs and revenues and play critical role in care quality. Efficiency of such units relies extremely on effective operational planning and inventory management. A key ingredient to such planning activities is the structured and unstructured data available prior to the surgery day from the electronic health records and other information systems. Unstructured data, such as textual features of procedure description and notes, provide additional information while structured data alone is not sufficient. To effectively utilize textual information using text mining, textual features should be easily identifiable, i.e., without typographical errors and ad hoc abbreviations. While there exists numerous spelling correction and abbreviation identification tools, they are not suitable for the surgical medical text as they require a dictionary and cannot accommodate ad hoc words such as abbreviations. This study proposes a novel preprocessing framework for surgical text data to detect misspellings and abbreviations prior to the application of any text mining and predictive modeling. The proposed approach helps extract the most salient text features from the unstructured principal procedure and additional notes by effectively reducing the raw feature set dimension. The transformed (text) feature set thus improves subsequent prediction tasks in surgery units. We test and validate the proposed approach using datasets from multiple hospitals' surgical departments and benchmark feature sets.
Collapse
|
12
|
Zhang Z, Lu Y, Kou Y, Wu DTY, Huh-Yoo J, He Z. Understanding Patient Information Needs About Their Clinical Laboratory Results: A Study of Social Q&A Site. Stud Health Technol Inform 2019; 264:1403-1407. [PMID: 31438157 DOI: 10.3233/shti190458] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Clinical data, such as laboratory test results, is increasingly being made available to patients through patient portals. However, patients often have difficulties understanding and acting upon the clinical data presented in portals. As such, many turn to online resources to fill their knowledge gaps and obtain actionable advice. In this work, we present a content analysis of the questions posted in a major social Q&A site to characterize lay people's general information needs concerning laboratory test results and to inform the design of patient portals for supporting patients' understanding of clinical data. We identified 15 information needs related to laboratory test results, and clustered them under four themes: understanding the results of lab test, interpreting doctor's diagnosis, learning about lab tests, and consulting the next steps. We draw on our findings to discuss design opportunities for supporting the understanding of laboratory results.
Collapse
Affiliation(s)
- Zhan Zhang
- Department of Information Technology, Pace University, New York, NY, USA
| | - Yu Lu
- Department of Information Technology, Pace University, New York, NY, USA
| | - Yubo Kou
- School of Information, Florida State University, Tallahassee, Florida, USA
| | - Danny T Y Wu
- Department of Biomedical Informatics, University of Cincinnati, Cincinnati, OH, USA
| | - Jina Huh-Yoo
- Department of Media and Information, Michigan State University, East Lansing, MI, USA
| | - Zhe He
- School of Information, Florida State University, Tallahassee, Florida, USA
| |
Collapse
|
13
|
Rizvi RF, Wang Y, Nguyen T, Vasilakes J, Bian J, He Z, Zhang R. Analyzing Social Media Data to Understand Consumer Information Needs on Dietary Supplements. Stud Health Technol Inform 2019; 264:323-327. [PMID: 31437938 PMCID: PMC6792048 DOI: 10.3233/shti190236] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Despite the high consumption of dietary supplements (DS), few reliable, relevant, and comprehensive online resources could satisfy information seekers. This research study aims to understand consumer information needs on DS using topic modeling, and to evaluate accuracy in correctly identifying topics from social media. We retrieved 16,095 unique questions posted on Yahoo! Answers relating to 438 unique DS ingredients mentioned in sub-section, "Alternative medicine" under the section, "Health" . We implemented an unsupervised topic modeling method, Correlation Explanation (CorEx) to unveil the various topics in which consumers are most interested. We manually reviewed the keywords of all the 200 topics generated by CorEx and assigned them to 38 health-related categories, corresponding to 12 higher-level groups. We found high accuracy (90-100%) in identifying questions that correctly align with the selected topics. The results could guide us to generate a more comprehensive and structured DS resource based on consumers' information needs.
Collapse
Affiliation(s)
- Rubina F. Rizvi
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
- Department of Pharmaceutical Care & Health Systems, University of Minnesota, Minneapolis, MN, USA
| | - Yefeng Wang
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
| | - Thao Nguyen
- Data Science, University of Minnesota, Minneapolis, MN, USA
| | - Jake Vasilakes
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
- Department of Pharmaceutical Care & Health Systems, University of Minnesota, Minneapolis, MN, USA
| | - Jiang Bian
- Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | - Zhe He
- School of Information, Florida State University, Tallahassee, FL, USA
| | - Rui Zhang
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
- Department of Pharmaceutical Care & Health Systems, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
14
|
Gu G, Zhang X, Zhu X, Jian Z, Chen K, Wen D, Gao L, Zhang S, Wang F, Ma H, Lei J. Development of a Consumer Health Vocabulary by Mining Health Forum Texts Based on Word Embedding: Semiautomatic Approach. JMIR Med Inform 2019; 7:e12704. [PMID: 31124461 PMCID: PMC6552449 DOI: 10.2196/12704] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Revised: 03/19/2019] [Accepted: 04/05/2019] [Indexed: 12/31/2022] Open
Abstract
Background The vocabulary gap between consumers and professionals in the medical domain hinders information seeking and communication. Consumer health vocabularies have been developed to aid such informatics applications. This purpose is best served if the vocabulary evolves with consumers’ language. Objective Our objective is to develop a method for identifying and adding new terms to consumer health vocabularies, so that it can keep up with the constantly evolving medical knowledge and language use. Methods In this paper, we propose a consumer health term–finding framework based on a distributed word vector space model. We first learned word vectors from a large-scale text corpus and then adopted a supervised method with existing consumer health vocabularies for learning vector representation of words, which can provide additional supervised fine tuning after unsupervised word embedding learning. With a fine-tuned word vector space, we identified pairs of professional terms and their consumer variants by their semantic distance in the vector space. A subsequent manual review of the extracted and labeled pairs of entities was conducted to validate the results generated by the proposed approach. The results were evaluated using mean reciprocal rank (MRR). Results Manual evaluation showed that it is feasible to identify alternative medical concepts by using professional or consumer concepts as queries in the word vector space without fine tuning, but the results are more promising in the final fine-tuned word vector space. The MRR values indicated that on an average, a professional or consumer concept is about 14th closest to its counterpart in the word vector space without fine tuning, and the MRR in the final fine-tuned word vector space is 8. Furthermore, the results demonstrate that our method can collect abbreviations and common typos frequently used by consumers. Conclusions By integrating a large amount of text information and existing consumer health vocabularies, our method outperformed several baseline ranking methods and is effective for generating a list of candidate terms for human review during consumer health vocabulary development.
Collapse
Affiliation(s)
- Gen Gu
- Synyi Research, Shanghai, China
| | - Xingting Zhang
- Center for Medical Informatics, Peking University, Beijing, China
| | | | - Zhe Jian
- Harbin Medical University, Harbin, China
| | | | - Dong Wen
- Center for Medical Informatics, Peking University, Beijing, China
| | - Li Gao
- School of Stomatology, Peking University, Beijing, China
| | - Shaodian Zhang
- Synyi Research, Shanghai, China.,APEX Data & Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, China
| | - Fei Wang
- Synyi Research, Shanghai, China.,Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY, United States
| | | | - Jianbo Lei
- Center for Medical Informatics, Peking University, Beijing, China.,School of Medical Informatics and Engineering, Southwest Medical University, Luzhou city, Sichuan Province, China
| |
Collapse
|
15
|
Denecke K, Gabarron E, Grainger R, Konstantinidis ST, Lau A, Rivera-Romero O, Miron-Shatz T, Merolli M. Artificial Intelligence for Participatory Health: Applications, Impact, and Future Implications. Yearb Med Inform 2019; 28:165-173. [PMID: 31022749 PMCID: PMC6697496 DOI: 10.1055/s-0039-1677902] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Objective
: Artificial intelligence (AI) provides people and professionals working in the field of participatory health informatics an opportunity to derive robust insights from a variety of online sources. The objective of this paper is to identify current state of the art and application areas of AI in the context of participatory health.
Methods
: A search was conducted across seven databases (PubMed, Embase, CINAHL, PsychInfo, ACM Digital Library, IEEExplore, and SCOPUS), covering articles published since 2013. Additionally, clinical trials involving AI in participatory health contexts registered at clinicaltrials.gov were collected and analyzed.
Results
: Twenty-two articles and 12 trials were selected for review. The most common application of AI in participatory health was the secondary analysis of social media data: self-reported data including patient experiences with healthcare facilities, reports of adverse drug reactions, safety and efficacy concerns about over-the-counter medications, and other perspectives on medications. Other application areas included determining which online forum threads required moderator assistance, identifying users who were likely to drop out from a forum, extracting terms used in an online forum to learn its vocabulary, highlighting contextual information that is missing from online questions and answers, and paraphrasing technical medical terms for consumers.
Conclusions
: While AI for supporting participatory health is still in its infancy, there are a number of important research priorities that should be considered for the advancement of the field. Further research evaluating the impact of AI in participatory health informatics on the psychosocial wellbeing of individuals would help in facilitating the wider acceptance of AI into the healthcare ecosystem.
Collapse
Affiliation(s)
| | - Elia Gabarron
- Norwegian Centre for E-health Research, University Hospital of North Norway, Norway
| | | | | | - Annie Lau
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Australia
| | | | - Talya Miron-Shatz
- Ono Academic College, Israel, and Winton Centre for Risk and Evidence Communication, Cambridge University, England
| | - Mark Merolli
- Swinburne University of Technology, and University of Melbourne, Australia
| |
Collapse
|
16
|
He Z, Keloth VK, Chen Y, Geller J. Extended Analysis of Topological-Pattern-Based Ontology Enrichment. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2019; 2018:1641-1648. [PMID: 30854243 DOI: 10.1109/bibm.2018.8621564] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Maintenance of biomedical ontologies is difficult. We have previously developed a topological-pattern-based method to deal with the problem of identifying concepts in a reference ontology that could be of interest for insertion into a target ontology. Assuming that both ontologies are parts of the Unified Medical Language System (UMLS), the method suggests approximate locations where the target ontology could be extended with new concepts from the reference ontology. However, the final decision about each concept has to be made by a human expert. In this paper, we describe the universe of cross-ontology topological patterns in quantitative terms. We then present a theoretical analysis of the number of potential placements of reference concepts in a path in a target ontology, allowing for new cross-ontology synonyms. This provides a rough estimate of what expert resources need to be allocated for the task. One insight in previous work on this topic was the large percentage of cases where importing concepts was impossible, due to a configuration called "alternative classification." In this paper, we confirm this observation. Our target ontology is the National Cancer Institute thesaurus (NCIt). However, the methods can be applied to other pairs of ontologies with hierarchical relationships from the UMLS.
Collapse
Affiliation(s)
- Zhe He
- School of Information, Florida State University Tallahassee, Florida USA
| | | | - Yan Chen
- Department of Computer Inforamtion Systems, BMCC, CUNY, New York, NY USA,
| | - James Geller
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ USA,
| |
Collapse
|
17
|
Chen Z, He Z, Liu X, Bian J. Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases. BMC Med Inform Decis Mak 2018; 18:65. [PMID: 30066651 PMCID: PMC6069806 DOI: 10.1186/s12911-018-0630-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the past few years, neural word embeddings have been widely used in text mining. However, the vector representations of word embeddings mostly act as a black box in downstream applications using them, thereby limiting their interpretability. Even though word embeddings are able to capture semantic regularities in free text documents, it is not clear how different kinds of semantic relations are represented by word embeddings and how semantically-related terms can be retrieved from word embeddings. METHODS To improve the transparency of word embeddings and the interpretability of the applications using them, in this study, we propose a novel approach for evaluating the semantic relations in word embeddings using external knowledge bases: Wikipedia, WordNet and Unified Medical Language System (UMLS). We trained multiple word embeddings using health-related articles in Wikipedia and then evaluated their performance in the analogy and semantic relation term retrieval tasks. We also assessed if the evaluation results depend on the domain of the textual corpora by comparing the embeddings of health-related Wikipedia articles with those of general Wikipedia articles. RESULTS Regarding the retrieval of semantic relations, we were able to retrieve diverse semantic relations in the nearest neighbors of a given word. Meanwhile, the two popular word embedding approaches, Word2vec and GloVe, obtained comparable results on both the analogy retrieval task and the semantic relation retrieval task, while dependency-based word embeddings had much worse performance in both tasks. We also found that the word embeddings trained with health-related Wikipedia articles obtained better performance in the health-related relation retrieval tasks than those trained with general Wikipedia articles. CONCLUSION It is evident from this study that word embeddings can group terms with diverse semantic relations together. The domain of the training corpus does have impact on the semantic relations represented by word embeddings. We thus recommend using domain-specific corpus to train word embeddings for domain-specific text mining tasks.
Collapse
Affiliation(s)
- Zhiwei Chen
- Department of Computer Science, Florida State University, Tallahassee, FL, USA
| | - Zhe He
- School of Information, Florida State University, 142 Collegiate Loop, Tallahassee, FL, 32306 USA
| | - Xiuwen Liu
- Department of Computer Science, Florida State University, Tallahassee, FL, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
| |
Collapse
|
18
|
Abstract
OBJECTIVE To describe the increasing professional use of social media within oncology health care practice. DATA SOURCES Peer-reviewed and lay publications. CONCLUSION Social media has changed the communication landscape over the last 15 years. An integral part of worldwide culture, oncology health care professionals can utilize social media to listen, learn, engage, and co-create to advance cancer care. IMPLICATIONS FOR NURSING PRACTICE Nurses must be aware of the professional uses for social media, how to use the media, and where to find evidence supporting health care social media efforts within cancer care.
Collapse
|
19
|
Gu H, He Z, Wei D, Elhanan G, Chen Y. Validating UMLS Semantic Type Assignments Using SNOMED CT Semantic Tags. Methods Inf Med 2018; 57:43-53. [PMID: 29621830 DOI: 10.3414/me17-01-0120] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
BACKGROUND The UMLS assigns semantic types to all its integrated concepts. The semantic types are widely used in various natural language processing tasks in the biomedical domain, such as named entity recognition, semantic disambiguation, and semantic annotation. Due to the size of the UMLS, erroneous semantic type assignments are hard to detect. It is imperative to devise automated techniques to identify errors and inconsistencies in semantic type assignments. OBJECTIVES Designing a methodology to perform programmatic checks to detect semantic type assignment errors for UMLS concepts with one or more SNOMED CT terms and evaluating concepts in a selected set of SNOMED CT hierarchies to verify our hypothesis that UMLS semantic type assignment errors may exist in concepts residing in semantically inconsistent groups. METHODS Our methodology is a four-stage process. 1) partitioning concepts in a SNOMED CT hierarchy into semantically uniform groups based on their assigned semantic tags; 2) partitioning concepts in each group from 1) into the disjoint sub-groups based on their semantic type assignments; 3) mapping all SNOMED CT semantic tags into one or more semantic types in the UMLS; 4) identifying semantically inconsistent groups that have inconsistent assignments between semantic tags and semantic types according to the mapping from 3) and providing concepts in such groups to the domain experts for reviewing. RESULTS We applied our method on the UMLS 2013AA release. Concepts of the semantically inconsistent groups in the PHYSICAL FORCE and RECORD ARTIFACT hierarchies have error rates 33% and 62.5% respectively, which are greatly larger than error rates 0.6% and 1% in semantically consistent groups of the two hierarchies. CONCLUSION Concepts in semantically in - consistent groups are more likely to contain semantic type assignment errors. Our methodology can make auditing more efficient by limiting auditing resources on concepts of semantically inconsistent groups.
Collapse
|