1
|
Tao J, Zhou L, Hickey K. Making sense of the black‐boxes: Toward interpretable text classification using deep learning models. J Assoc Inf Sci Technol 2022. [DOI: 10.1002/asi.24642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Jie Tao
- Department of Analytics, Dolan School of Business Fairfield University Fairfield Connecticut USA
| | - Lina Zhou
- Department of Business Information Systems and Operations Management, Belk College of Business The University of North Carolina at Charlotte Charlotte North Carolina USA
| | - Kevin Hickey
- Data Science Department Worcester Polytechnic Institute Worcester Massachusetts USA
| |
Collapse
|
2
|
Perry A, Lamont-Mills A, du Preez J, du Plessis C. "I Want to Be Stepping in More" - Professional Online Forum Moderators' Experiences of Supporting Individuals in a Suicide Crisis. Front Psychiatry 2022; 13:863509. [PMID: 35774095 PMCID: PMC9238438 DOI: 10.3389/fpsyt.2022.863509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 05/20/2022] [Indexed: 11/16/2022] Open
Abstract
INTRODUCTION Individuals experiencing suicidal crises increasingly turn to online mental health forums for support. Support can come from peers but also from online moderators, many of whom are trained health professionals. Much is known about users' forum experiences; however, the experiences of professional moderators who work to keep users safe has been overlooked. The beneficial nature of online forums cannot be fully realized until there is a clearer understanding of both parties' participation. This study explored the experiences of professional online forum moderators engaged in suicide prevention. MATERIALS AND METHODS A purposive sample of professionally qualified moderators was recruited from three online mental health organizations. In-depth semi-structured, video-recorded interviews were conducted with 15 moderators (3 male, 12 female), to explore their experiences and perceptions of working in online suicide prevention spaces. Data was analyzed using inductive thematic analysis. RESULTS Five themes were identified related to the experiences and challenges for moderators. These were the sense of the unknown, the scope of the role, limitations of the written word, volume of tasks, and balancing individual vs. community needs. DISCUSSION Findings indicate that the professionally qualified moderator role is complex and multifaceted, with organizations failing to recognize these aspects. Organizations restrict moderators from using their full therapeutic skill set, limiting them to only identifying and re-directing at-risk users to crisis services. The benefits of moderated online forums could be enhanced by allowing moderators to use more of their skills. To facilitate this, in-situ research is needed that examines how moderators use their skills to identify at-risk users.
Collapse
Affiliation(s)
- Amanda Perry
- School of Psychology and Wellbeing, University of Southern Queensland, Toowoomba, QLD, Australia.,Laidlaw College, Social of Social Practice, Auckland, New Zealand
| | - Andrea Lamont-Mills
- School of Psychology and Wellbeing, University of Southern Queensland, Ipswich, QLD, Australia.,Centre for Health, Institute of Resilient Regions, University of Southern Queensland, Springfield, QLD, Australia
| | - Jan du Preez
- School of Psychology and Wellbeing, University of Southern Queensland, Toowoomba, QLD, Australia
| | - Carol du Plessis
- School of Psychology and Wellbeing, University of Southern Queensland, Ipswich, QLD, Australia
| |
Collapse
|
3
|
Ryu H, Pratt W. Microaggression clues from social media: revealing and counteracting the suppression of women's health care. J Am Med Inform Assoc 2021; 29:257-270. [PMID: 34741511 DOI: 10.1093/jamia/ocab208] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 09/03/2021] [Accepted: 09/16/2021] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE The purpose of this study was to demonstrate how analyzing social media posts can uncover microaggressions and generate new cultural insights. We explore why Korean women hesitate to seek recommended gynecological care and how microaggressions visible in social media reveal insights for counteracting such harmful messaging. MATERIALS AND METHODS We scraped the posts and responses on social media related to unmarried women's uncomfortableness or unpleasantness in receiving gynecological care. We conducted content analyses of the posts and responses with the microaggression framework to identify both the types of microaggressions occurring within and outside the clinic as well as the responsible perpetrators. With an open-coding and subsequent deductive coding approach, we further investigated the socio-cultural context for receiving gynecological care as an unmarried woman in South Korea. RESULTS Our analysis uncovered that mothers, male partners, and superficially supportive social media responders contribute to pre- and post-visit microaggressions toward unmarried women seeking gynecological care whereas healthcare providers contribute to only mid-visit microaggressions. We also exposed how social media was not only revealing but also reinforcing the suppression of women's health care. DISCUSSION Mid-visit microaggressions are currently addressed by cultural competence education, but pre- and post-visit microaggressions are overlooked. We uncover the gaps in current practices of informatics and public health methods and suggest ways to counteract online and offline microaggressions. CONCLUSIONS Social media provides valuable information about the cultural context of health care and should be used as a source of insights for targeted interventions to improve health care, in this case for unmarried Korean women.
Collapse
Affiliation(s)
- Hyeyoung Ryu
- Information School, University of Washington, Seattle, Washington, USA
| | - Wanda Pratt
- Information School, University of Washington, Seattle, Washington, USA
| |
Collapse
|
4
|
Li X, Yuan W, Peng D, Mei Q, Wang Y. When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification. BMC Med Inform Decis Mak 2021; 21:377. [PMID: 35382811 PMCID: PMC8981604 DOI: 10.1186/s12911-022-01829-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 03/22/2022] [Indexed: 11/12/2022] Open
Abstract
Background Natural language processing (NLP) tasks in the health domain often deal with limited amount of labeled data due to high annotation costs and naturally rare observations. To compensate for the lack of training data, health NLP researchers often have to leverage knowledge and resources external to a task at hand. Recently, pretrained large-scale language models such as the Bidirectional Encoder Representations from Transformers (BERT) have been proven to be a powerful way of learning rich linguistic knowledge from massive unlabeled text and transferring that knowledge to downstream tasks. However, previous downstream tasks often used training data at such a large scale that is unlikely to obtain in the health domain. In this work, we aim to study whether BERT can still benefit downstream tasks when training data are relatively small in the context of health NLP. Method We conducted a learning curve analysis to study the behavior of BERT and baseline models as training data size increases. We observed the classification performance of these models on two disease diagnosis data sets, where some diseases are naturally rare and have very limited observations (fewer than 2 out of 10,000). The baselines included commonly used text classification models such as sparse and dense bag-of-words models, long short-term memory networks, and their variants that leveraged external knowledge. To obtain learning curves, we incremented the amount of training examples per disease from small to large, and measured the classification performance in macro-averaged \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_{1}$$\end{document}F1 score. Results On the task of classifying all diseases, the learning curves of BERT were consistently above all baselines, significantly outperforming them across the spectrum of training data sizes. But under extreme situations where only one or two training documents per disease were available, BERT was outperformed by linear classifiers with carefully engineered bag-of-words features. Conclusion As long as the amount of training documents is not extremely few, fine-tuning a pretrained BERT model is a highly effective approach to health NLP tasks like disease classification. However, in extreme cases where each class has only one or two training documents and no more will be available, simple linear models using bag-of-words features shall be considered.
Collapse
|
5
|
Gulati S. Decoding the global trend of “vaccine tourism” through public sentiments and emotions: does it get a nod on Twitter? GLOBAL KNOWLEDGE, MEMORY AND COMMUNICATION 2021. [DOI: 10.1108/gkmc-06-2021-0106] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Purpose
This paper aims to fill the major research gap prevalent in the tourism literature on the new form of tourism branching out from the COVID-19. While there are newspaper reports mentioning about the government’s reaction to vaccine tourism, there is no such study or report that tries to understand what the global masses feel about it; thus, a preliminary investigation of the social sentiment and emotion accruing around vaccine tourism on Twitter is carried out.
Design/methodology/approach
This exploratory study serves as a preliminary investigation of the social sentiment and emotion accruing around vaccine tourism on Twitter and tries to categorise them into eight basic emotions from Plutchik (1994) “wheel of emotions” as joy, disgust, fear, anger, anticipation, sadness, trust and surprise. The results are presented through data visualisation technique for analysis. The study makes use of R programming languages and the extensive packages offered on RStudio.
Findings
A total of 12,258 emotions were captured. It is evident that Vaccine Tourism has got maximum of positive sentiments (28.14%) which is almost double of the negative sentiment (14.05%). It is visible that the highest sentiment is “trust” (12.74%) and is followed by “fear” (8.97%). The least visible sentiment is “surprise” (4.32%). Polarity has been found for maximum tweets as positive (55.52%) which yet again surpasses negative polarity (33.7%), and neutral polarity is the least (10.67%).
Research limitations/implications
It can be said that people bear a positive emotion regarding vaccine tourism such as “trust” and “joy” which also denotes a positive sentiment score for testing polarity. But there are still concerns of high prices of the packages, fear-prevalent people to step out, and the uncertainty of right precautionary measures being taken still puts vaccine tourism under the radar of doubt with a fourth population having negative and neutral sentiments each. This is indicative with “fear” being the second highest emotion to the users. There are mixed emotions for vaccine tourism, but positive dominates the results.
Practical implications
The study attempts to see the global reaction on social media on vaccine tourism trend for giving food for thought to marketers. It can be said that Asians can be the target group.
Originality/value
To the best of the authors’ knowledge, there is no study that addresses the new trend of “Vaccine Tourism” or attempts to understand the emotions and sentiments of people globally.
Collapse
|
6
|
He L, Yin T, Hu Z, Chen Y, Hanauer DA, Zheng K. Developing a standardized protocol for computational sentiment analysis research using health-related social media data. J Am Med Inform Assoc 2021; 28:1125-1134. [PMID: 33355353 DOI: 10.1093/jamia/ocaa298] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 12/04/2020] [Indexed: 12/18/2022] Open
Abstract
OBJECTIVE Sentiment analysis is a popular tool for analyzing health-related social media content. However, existing studies exhibit numerous methodological issues and inconsistencies with respect to research design and results reporting, which could lead to biased data, imprecise or incorrect conclusions, or incomparable results across studies. This article reports a systematic analysis of the literature with respect to such issues. The objective was to develop a standardized protocol for improving the research validity and comparability of results in future relevant studies. MATERIALS AND METHODS We developed the Protocol of Analysis of senTiment in Health (PATH) based on a systematic review that analyzed common research design choices and how such choices were made, or reported, among eligible studies published 2010-2019. RESULTS Of 409 articles screened, 89 met the inclusion criteria. A total of 16 distinctive research design choices were identified, 9 of which have significant methodological or reporting inconsistencies among the articles reviewed, ranging from how relevance of study data was determined to how the sentiment analysis tool selected was validated. Based on this result, we developed the PATH protocol that encompasses all these distinctive design choices and highlights the ones for which careful consideration and detailed reporting are particularly warranted. CONCLUSIONS A substantial degree of methodological and reporting inconsistencies exist in the extant literature that applied sentiment analysis to analyzing health-related social media data. The PATH protocol developed through this research may contribute to mitigating such issues in future relevant studies.
Collapse
Affiliation(s)
- Lu He
- Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, California, USA
| | - Tingjue Yin
- Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, California, USA
| | - Zhaoxian Hu
- Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, California, USA
| | - Yunan Chen
- Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, California, USA
| | - David A Hanauer
- Department of Learning Health Sciences, School of Medicine, University of Michigan, Ann Arbor, Michigan, USA.,Department of Pediatrics, School of Medicine, University of Michigan, Ann Arbor, Michigan, USA
| | - Kai Zheng
- Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, California, USA.,Department of Emergency Medicine, School of Medicine, University of California, Irvine, Irvine, California, USA
| |
Collapse
|
7
|
Yang Z, Xu W, Chen R. A deep learning-based multi-turn conversation modeling for diagnostic Q&A document recommendation. Inf Process Manag 2021. [DOI: 10.1016/j.ipm.2020.102485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
8
|
Lee YJ, Kamen C, Margolies L, Boehmer U. Online health community experiences of sexual minority women with cancer. J Am Med Inform Assoc 2021; 26:759-766. [PMID: 31361002 DOI: 10.1093/jamia/ocz103] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 05/07/2019] [Accepted: 05/28/2019] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE The study sought to explore online health communities (OHCs) for sexual minority women (SMW) with cancer by conducting computational text analysis on posts. MATERIALS AND METHODS Eight moderated OHCs were hosted by the National LGBT Cancer Network from 2013 to 2015. Forty-six SMW wrote a total of 885 posts across the OHCs, which were analyzed using Linguistic Inquiry and Word Count and latent Dirichlet allocation. Pearson correlation was calculated between Linguistic Inquiry and Word Count word categories and participant engagement in the OHCs. Latent Dirichlet allocation was used to derive main topics. RESULTS Participants (average age 46 years; 89% white/non-Hispanic) who used more sadness, female-reference, drives, and religion-related words were more likely to post in the OHCs. Ten topics emerged: coping, holidays and vacation, cancer diagnosis and treatment, structure of day-to-day life, self-care, loved ones, physical recovery, support systems, body image, and symptom management. Coping was the most common topic; symptom management was the least common topic. DISCUSSION Highly engaged SMW in the OHCs connected to others via their shared female gender identity. Topics discussed in these OHCs were similar to OHCs for heterosexual women, and sexual identity was not a dominant topic. The presence of OHC moderators may have driven participation. Formal comparison between sexual minority and heterosexual women's OHCs are needed. CONCLUSIONS Our findings contribute to a better understanding of the experiences of SMW cancer survivors and can inform the development of tailored OHC-based interventions for SMW who are survivors of cancer.
Collapse
Affiliation(s)
- Young Ji Lee
- Department of Health and Community Systems, School of Nursing, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.,Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Charles Kamen
- Department of Surgery, University of Rochester Medical Center, Rochester, New York, USA
| | - Liz Margolies
- National LGBT Cancer Network, New York City, New York, USA
| | - Ulrike Boehmer
- Department of Community Health Sciences, Boston University, Boston, Massachusetts, USA
| |
Collapse
|
9
|
Ferraro G, Loo Gee B, Ji S, Salvador-Carulla L. Lightme: analysing language in internet support groups for mental health. Health Inf Sci Syst 2020; 8:34. [PMID: 33088490 DOI: 10.1007/s13755-020-00115-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 07/24/2020] [Indexed: 10/23/2022] Open
Abstract
Background Assisting moderators to triage harmful posts in Internet Support Groups is relevant to ensure its safe use. Automated text classification methods analysing the language expressed in posts of online forums is a promising solution. Methods Natural Language Processing and Machine Learning technologies were used to build a triage post classifier using a dataset from Reachout.com mental health forum for young people. Results When comparing with the state-of-the-art, a solution mainly based on features from lexical resources, received the best classification performance for the crisis posts (52%), which is the most severe class. Six salient linguistic characteristics were found when analysing the crisis post; (1) posts expressing hopelessness, (2) short posts expressing concise negative emotional responses, (3) long posts expressing variations of emotions, (4) posts expressing dissatisfaction with available health services, (5) posts utilising storytelling, and (6) posts expressing users seeking advice from peers during a crisis. Conclusion It is possible to build a competitive triage classifier using features derived only from the textual content of the post. Further research needs to be done in order to translate our quantitative and qualitative findings into features, as it may improve overall performance.
Collapse
Affiliation(s)
- Gabriela Ferraro
- Commonwealth Scientific and Industrial Research Organization & Australian National University, GPO Box 1700, Canberra, ACT 2601 Australia
| | - Brendan Loo Gee
- Australasian Institute of Digital Health & Research School of Population Health, Centre for Mental Health Research, Australian National University, Canberra, Australia
| | - Shenjia Ji
- College of Engineering and Computer Science, Australian National University, Canberra, Australia
| | - Luis Salvador-Carulla
- Research School of Population Health, Centre for Mental Health Research, Australian National University, Canberra, Australia
| |
Collapse
|
10
|
Lee YJ, Park A, Roberge M, Donovan H. What Can Social Media Tell Us About Patient Symptoms: A Text-Mining Approach to Online Ovarian Cancer Forum. Cancer Nurs 2020; 45:E27-E35. [PMID: 32649337 DOI: 10.1097/ncc.0000000000000860] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
BACKGROUND Ovarian cancer (OvCa) patients suffer from symptoms that severely affect quality of life. To optimally manage these symptoms, their symptom experiences must be better understood. Social media have emerged as a data source to understand these experiences. OBJECTIVE The objective of this study was to use topic modeling (ie, latent Dirichlet allocation [LDA]) to understand the symptom experience of OvCa patients through analysis of online forum posts from OvCa patients and their caregivers. INTERVENTIONS/METHODS Ovarian cancer patient/caregiver posts (n = 50 626) were collected from an online OvCa forum. We developed a symptom dictionary to identify symptoms described therein, selected the top 5 most frequently discussed symptoms, extracted posts that mentioned at least one of those symptoms, and conducted LDA on those extracted posts. RESULTS Pain, nausea, anxiety, fatigue, and skin rash were the top 5 most frequently discussed symptoms (n = 4536, 1296, 967, 878, and 657, respectively). Using LDA, we identified 11 topic categories, which differed across symptoms. For example, chemotherapy-related adverse effects likely reflected fatigue, nausea, and rash; social and spiritual support likely reflected anxiety; and diagnosis and treatment often reflected pain. CONCLUSION The frequency of a symptom discussed on a social media platform may not include all symptom experience and their severity. Indeed, users, who are experiencing different symptoms, mentioned different topics on the forum. Subsequent studies should consider the influence of additional factors (eg, cancer stage) from discussions. IMPLICATIONS FOR PRACTICE Social media have the potential to prioritize and answer the questions about clinical care that are frequently asked by cancer patients and their caregivers.
Collapse
Affiliation(s)
- Young Ji Lee
- Author Affiliations: School of Nursing and (Drs Lee and Donovan and Ms Roberge); Department of Biomedical Informatics (Dr Lee), University of Pittsburgh, Pennsylvania; College of Computing and Information Science, University of North Carolina, Charlotte (Dr Park); Department of Obstetrics, Gynecology and Reproductive Science, University of Pittsburgh, Pennsylvania (Dr Donovan)
| | | | | | | |
Collapse
|
11
|
Jelodar H, Wang Y, Rabbani M, Xiao G, Zhao R. A Collaborative Framework Based for Semantic Patients-Behavior Analysis and Highlight Topics Discovery of Alcoholic Beverages in Online Healthcare Forums. J Med Syst 2020; 44:101. [PMID: 32266484 DOI: 10.1007/s10916-020-01547-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 02/17/2020] [Indexed: 11/28/2022]
Abstract
Medical data in online groups and social media contain valuable information, which is provided by both healthcare professionals and patients. In fact, patients can talk freely and share their personal experiences. These resources are a valuable opportunity for health professionals who can access patients' opinions, as well as discussions between patients. Recently, the data processing of the health community and, how to extract knowledge is a significant technical challenge. There are many online group and forums that users can discuss on healthcare issues. Therefore, we can examine these text documents for discovering knowledge and evaluating patients' behavior based on their opinions and discussions. For example, there are many questions and answering groups on Twitter or Facebook. Given the importance of the research, in this paper, we present a semantic framework based on topic model (LDA) and Random forest(RF) to predict and retrieval latent topics of healthcare text-documents from an online forum. We extract our healthcare records (patient-questions) from patient.info website as a real dataset. Experiments on our dataset show that social media forums could help for detecting significant patient safety problems on healthcare issues.
Collapse
Affiliation(s)
- Hamed Jelodar
- School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, 210094, China.
| | - Yongli Wang
- School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, 210094, China.
| | - Mahdi Rabbani
- School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, 210094, China
| | - Gang Xiao
- Science and Technology on Complex Systems Simulation Laboratory, Beijing, 100101, China
| | - Ruxin Zhao
- School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, 210094, China
| |
Collapse
|
12
|
Rivas R, Sadah SA, Guo Y, Hristidis V. Classification of Health-Related Social Media Posts: Evaluation of Post Content-Classifier Models and Analysis of User Demographics. JMIR Public Health Surveill 2020; 6:e14952. [PMID: 32234706 PMCID: PMC7160708 DOI: 10.2196/14952] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 08/06/2019] [Accepted: 01/27/2020] [Indexed: 11/23/2022] Open
Abstract
Background The increasing volume of health-related social media activity, where users connect, collaborate, and engage, has increased the significance of analyzing how people use health-related social media. Objective The aim of this study was to classify the content (eg, posts that share experiences and seek support) of users who write health-related social media posts and study the effect of user demographics on post content. Methods We analyzed two different types of health-related social media: (1) health-related online forums—WebMD and DailyStrength—and (2) general online social networks—Twitter and Google+. We identified several categories of post content and built classifiers to automatically detect these categories. These classifiers were used to study the distribution of categories for various demographic groups. Results We achieved an accuracy of at least 84% and a balanced accuracy of at least 0.81 for half of the post content categories in our experiments. In addition, 70.04% (4741/6769) of posts by male WebMD users asked for advice, and male users’ WebMD posts were more likely to ask for medical advice than female users’ posts. The majority of posts on DailyStrength shared experiences, regardless of the gender, age group, or location of their authors. Furthermore, health-related posts on Twitter and Google+ were used to share experiences less frequently than posts on WebMD and DailyStrength. Conclusions We studied and analyzed the content of health-related social media posts. Our results can guide health advocates and researchers to better target patient populations based on the application type. Given a research question or an outreach goal, our results can be used to choose the best online forums to answer the question or disseminate a message.
Collapse
Affiliation(s)
- Ryan Rivas
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, United States
| | - Shouq A Sadah
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, United States
| | - Yuhang Guo
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, United States
| | - Vagelis Hristidis
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, United States
| |
Collapse
|
13
|
Pre- and post-launch emotions in new product development: Insights from twitter analytics of three products. INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT 2020. [DOI: 10.1016/j.ijinfomgt.2019.05.015] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
14
|
Zunic A, Corcoran P, Spasic I. Sentiment Analysis in Health and Well-Being: Systematic Review. JMIR Med Inform 2020; 8:e16023. [PMID: 32012057 PMCID: PMC7013658 DOI: 10.2196/16023] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Revised: 10/26/2019] [Accepted: 10/27/2019] [Indexed: 12/22/2022] Open
Abstract
Background Sentiment analysis (SA) is a subfield of natural language processing whose aim is to automatically classify the sentiment expressed in a free text. It has found practical applications across a wide range of societal contexts including marketing, economy, and politics. This review focuses specifically on applications related to health, which is defined as “a state of complete physical, mental, and social well-being and not merely the absence of disease or infirmity.” Objective This study aimed to establish the state of the art in SA related to health and well-being by conducting a systematic review of the recent literature. To capture the perspective of those individuals whose health and well-being are affected, we focused specifically on spontaneously generated content and not necessarily that of health care professionals. Methods Our methodology is based on the guidelines for performing systematic reviews. In January 2019, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified a total of 86 relevant studies and extracted data about the datasets analyzed, discourse topics, data creators, downstream applications, algorithms used, and their evaluation. Results The majority of data were collected from social networking and Web-based retailing platforms. The primary purpose of online conversations is to exchange information and provide social support online. These communities tend to form around health conditions with high severity and chronicity rates. Different treatments and services discussed include medications, vaccination, surgery, orthodontic services, individual physicians, and health care services in general. We identified 5 roles with respect to health and well-being among the authors of the types of spontaneously generated narratives considered in this review: a sufferer, an addict, a patient, a carer, and a suicide victim. Out of 86 studies considered, only 4 reported the demographic characteristics. A wide range of methods were used to perform SA. Most common choices included support vector machines, naïve Bayesian learning, decision trees, logistic regression, and adaptive boosting. In contrast with general trends in SA research, only 1 study used deep learning. The performance lags behind the state of the art achieved in other domains when measured by F-score, which was found to be below 60% on average. In the context of SA, the domain of health and well-being was found to be resource poor: few domain-specific corpora and lexica are shared publicly for research purposes. Conclusions SA results in the area of health and well-being lag behind those in other domains. It is yet unclear if this is because of the intrinsic differences between the domains and their respective sublanguages, the size of training datasets, the lack of domain-specific sentiment lexica, or the choice of algorithms.
Collapse
Affiliation(s)
- Anastazia Zunic
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Padraig Corcoran
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Irena Spasic
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| |
Collapse
|
15
|
Li S, Yu CH, Wang Y, Babu Y. Exploring adverse drug reactions of diabetes medicine using social media analytics and interactive visualizations. INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT 2019. [DOI: 10.1016/j.ijinfomgt.2018.12.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
16
|
Lau AYS, Staccini P. Artificial Intelligence in Health: New Opportunities, Challenges, and Practical Implications. Yearb Med Inform 2019; 28:174-178. [PMID: 31419829 PMCID: PMC6697520 DOI: 10.1055/s-0039-1677935] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Objectives
: To summarise the state of the art during the year 2018 in consumer health informatics and education, with a special emphasis on the special topic of the International Medical Informatics Association (IMIA) Yearbook for 2019: “Artificial intelligence in health: new opportunities, challenges, and practical implications”.
Methods
: We conducted a systematic search of articles published in PubMed using a predefined set of queries that identified 99 potential articles for review. These articles were screened according to topic relevance and 14 were selected for consideration as best paper candidates. The 14 papers were then presented to a panel of international experts for full paper review and scoring. Three papers that received the highest score were discussed in a consensus meeting and were agreed upon as best papers on artificial intelligence in health for patients and consumers in the year 2018.
Results
: Only a small number of 2018 papers reported Artificial Intelligence (AI) research for patients and consumers. No studies were found on AI applications designed specifically for patients or consumers, nor were there studies that elicited patient and consumer input on AI. Currently, the most common use of AI for patients and consumers lies in secondary analysis of social media data (e.g., online discussion forums). In particular, the three best papers shared a common methodology of using data-driven algorithms (such as text mining, topic modelling, Latent Dirichlet allocation modelling), combined with insight-led approaches (e.g., visualisation, qualitative analysis and manual review), to uncover patient and consumer experiences of health and illness in online communities.
Conclusions
: While discussion remains active on how AI could 'revolutionise' healthcare delivery, there is a lack of direction and evidence on how AI could actually benefit patients and consumers. Perhaps instead of primarily focusing on data and algorithms, researchers should engage with patients and consumers early in the AI research agenda to ensure we are indeed asking the right questions, and that important use cases and critical contexts are identified together with patients and consumers. Without a clear understanding on why patients and consumers need AI in the first place, or how AI could support individuals with their healthcare needs, it is difficult to imagine the kinds of AI applications that would have meaningful and sustainable impact on individual daily lives.
Collapse
Affiliation(s)
- Annie Y S Lau
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Australia
| | - Pascal Staccini
- IRIS Department, URE RETINES, Faculté de Médecine, Université Côte d'Azur, France
| | | |
Collapse
|
17
|
Abstract
BACKGROUND Contents published on social media have an impact on individuals and on their decision making. Knowing the sentiment toward diabetes is fundamental to understanding the impact that such information could have on people affected with this health condition and their family members. The objective of this study is to analyze the sentiment expressed in messages on diabetes posted on Twitter. METHOD Tweets including one of the terms "diabetes," "t1d," and/or "t2d" were extracted for one week using the Twitter standard API. Only the text message and the number of followers of the users were extracted. The sentiment analysis was performed by using SentiStrength. RESULTS A total of 67 421 tweets were automatically extracted, of those 3.7% specifically referred to T1D; and 6.8% specifically mentioned T2D. One or more emojis were included in 7.0% of the posts. Tweets specifically mentioning T2D and that did not include emojis were significantly more negative than the tweets that included emojis (-2.22 vs -1.48, P < .001). Tweets on T1D and that included emojis were both significantly more positive and also less negative than tweets without emojis (1.71 vs 1.49 and -1.31 vs -1.50, respectively; P < .005). The number of followers had a negative association with positive sentiment strength ( r = -.023, P < .001) and a positive association with negative sentiment ( r = .016, P < .001). CONCLUSION The use of sentiment analysis techniques on social media could increase our knowledge of how social media impact people with diabetes and their families and could help to improve public health strategies.
Collapse
Affiliation(s)
- Elia Gabarron
- Norwegian Centre for E-health Research, University Hospital of North Norway, Tromsø, Norway
| | - Enrique Dorronzoro
- Department of Electronic Technology, Universidad de Sevilla, Sevilla, Spain
| | | | - Rolf Wynn
- Department of Clinical Medicine, Faculty of Health Sciences, UiT—The Arctic University of Norway, Tromsø, Norway
- Division of Mental Health and Addictions, University Hospital of North Norway, Tromsø, Norway
| |
Collapse
|
18
|
Milne DN, McCabe KL, Calvo RA. Improving Moderator Responsiveness in Online Peer Support Through Automated Triage. J Med Internet Res 2019; 21:e11410. [PMID: 31025945 PMCID: PMC6658385 DOI: 10.2196/11410] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Revised: 11/22/2018] [Accepted: 12/09/2018] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Online peer support forums require oversight to ensure they remain safe and therapeutic. As online communities grow, they place a greater burden on their human moderators, which increases the likelihood that people at risk may be overlooked. This study evaluated the potential for machine learning to assist online peer support by directing moderators' attention where it is most needed. OBJECTIVE This study aimed to evaluate the accuracy of an automated triage system and the extent to which it influences moderator behavior. METHODS A machine learning classifier was trained to prioritize forum messages as green, amber, red, or crisis depending on how urgently they require attention from a moderator. This was then launched as a set of widgets injected into a popular online peer support forum hosted by ReachOut.com, an Australian Web-based youth mental health service that aims to intervene early in the onset of mental health problems in young people. The accuracy of the system was evaluated using a holdout test set of manually prioritized messages. The impact on moderator behavior was measured as response ratio and response latency, that is, the proportion of messages that receive at least one reply from a moderator and how long it took for these replies to be made. These measures were compared across 3 periods: before launch, after an informal launch, and after a formal launch accompanied by training. RESULTS The algorithm achieved 84% f-measure in identifying content that required a moderator response. Between prelaunch and post-training periods, response ratios increased by 0.9, 4.4, and 10.5 percentage points for messages labelled as crisis, red, and green, respectively, but decreased by 5.0 percentage points for amber messages. Logistic regression indicated that the triage system was a significant contributor to response ratios for green, amber, and red messages, but not for crisis messages. Response latency was significantly reduced (P<.001), between the same periods, by factors of 80%, 80%, 77%, and 12% for crisis, red, amber, and green messages, respectively. Regression analysis indicated that the triage system made a significant and unique contribution to reducing the time taken to respond to green, amber, and red messages, but not to crisis messages, after accounting for moderator and community activity. CONCLUSIONS The triage system was generally accurate, and moderators were largely in agreement with how messages were prioritized. It had a modest effect on response ratios, primarily because moderators were already more likely to respond to high priority content before the introduction of triage. However, it significantly and substantially reduced the time taken for moderators to respond to prioritized content. Further evaluations are needed to assess the impact of mistakes made by the triage algorithm and how changes to moderator responsiveness impact the well-being of forum members.
Collapse
Affiliation(s)
- David N Milne
- School of Information, Systems and Modelling, Faculty of Engineering and Information Technology, University of Technology, Sydney, Sydney, Australia
- School of Electrical and Information Engineering, University of Sydney, Sydney, Australia
| | - Kathryn L McCabe
- School of Electrical and Information Engineering, University of Sydney, Sydney, Australia
- Department of Psychiatry and Behavioral Sciences, University of California (Davis), Davis, CA, United States
- Medical Investigation of Neurodevelopmental Disorders Institute, University of California (Davis), Davis, CA, United States
| | - Rafael A Calvo
- School of Electrical and Information Engineering, University of Sydney, Sydney, Australia
- Dyson School of Design Engineering, Imperial College, London, United Kingdom
| |
Collapse
|
19
|
Denecke K, Gabarron E, Grainger R, Konstantinidis ST, Lau A, Rivera-Romero O, Miron-Shatz T, Merolli M. Artificial Intelligence for Participatory Health: Applications, Impact, and Future Implications. Yearb Med Inform 2019; 28:165-173. [PMID: 31022749 PMCID: PMC6697496 DOI: 10.1055/s-0039-1677902] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Objective
: Artificial intelligence (AI) provides people and professionals working in the field of participatory health informatics an opportunity to derive robust insights from a variety of online sources. The objective of this paper is to identify current state of the art and application areas of AI in the context of participatory health.
Methods
: A search was conducted across seven databases (PubMed, Embase, CINAHL, PsychInfo, ACM Digital Library, IEEExplore, and SCOPUS), covering articles published since 2013. Additionally, clinical trials involving AI in participatory health contexts registered at clinicaltrials.gov were collected and analyzed.
Results
: Twenty-two articles and 12 trials were selected for review. The most common application of AI in participatory health was the secondary analysis of social media data: self-reported data including patient experiences with healthcare facilities, reports of adverse drug reactions, safety and efficacy concerns about over-the-counter medications, and other perspectives on medications. Other application areas included determining which online forum threads required moderator assistance, identifying users who were likely to drop out from a forum, extracting terms used in an online forum to learn its vocabulary, highlighting contextual information that is missing from online questions and answers, and paraphrasing technical medical terms for consumers.
Conclusions
: While AI for supporting participatory health is still in its infancy, there are a number of important research priorities that should be considered for the advancement of the field. Further research evaluating the impact of AI in participatory health informatics on the psychosocial wellbeing of individuals would help in facilitating the wider acceptance of AI into the healthcare ecosystem.
Collapse
Affiliation(s)
| | - Elia Gabarron
- Norwegian Centre for E-health Research, University Hospital of North Norway, Norway
| | | | | | - Annie Lau
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Australia
| | | | - Talya Miron-Shatz
- Ono Academic College, Israel, and Winton Centre for Risk and Evidence Communication, Cambridge University, England
| | - Mark Merolli
- Swinburne University of Technology, and University of Melbourne, Australia
| |
Collapse
|
20
|
Chen AT, Swaminathan A, Kearns WR, Alberts NM, Law EF, Palermo TM. Understanding User Experience: Exploring Participants' Messages With a Web-Based Behavioral Health Intervention for Adolescents With Chronic Pain. J Med Internet Res 2019; 21:e11756. [PMID: 30985288 PMCID: PMC6487347 DOI: 10.2196/11756] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 02/05/2019] [Accepted: 02/10/2019] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Delivery of behavioral health interventions on the internet offers many benefits, including accessibility, cost-effectiveness, convenience, and anonymity. In recent years, an increased number of internet interventions have been developed, targeting a range of conditions and behaviors, including depression, pain, anxiety, sleep disturbance, and eating disorders. Human support (coaching) is a common component of internet interventions that is intended to boost engagement; however, little is known about how participants interact with coaches and how this may relate to their experience with the intervention. By examining the data that participants produce during an intervention, we can characterize their interaction patterns and refine treatments to address different needs. OBJECTIVE In this study, we employed text mining and visual analytics techniques to analyze messages exchanged between coaches and participants in an internet-delivered pain management intervention for adolescents with chronic pain and their parents. METHODS We explored the main themes in coaches' and participants' messages using an automated textual analysis method, topic modeling. We then clustered participants' messages to identify subgroups of participants with similar engagement patterns. RESULTS First, we performed topic modeling on coaches' messages. The themes in coaches' messages fell into 3 categories: Treatment Content, Administrative and Technical, and Rapport Building. Next, we employed topic modeling to identify topics from participants' message histories. Similar to the coaches' topics, these were subsumed under 3 high-level categories: Health Management and Treatment Content, Questions and Concerns, and Activities and Interests. Finally, the cluster analysis identified 4 clusters, each with a distinguishing characteristic: Assignment-Focused, Short Message Histories, Pain-Focused, and Activity-Focused. The name of each cluster exemplifies the main engagement patterns of that cluster. CONCLUSIONS In this secondary data analysis, we demonstrated how automated text analysis techniques could be used to identify messages of interest, such as questions and concerns from users. In addition, we demonstrated how cluster analysis could be used to identify subgroups of individuals who share communication and engagement patterns, and in turn facilitate personalization of interventions for different subgroups of patients. This work makes 2 key methodological contributions. First, this study is innovative in its use of topic modeling to provide a rich characterization of the textual content produced by coaches and participants in an internet-delivered behavioral health intervention. Second, to our knowledge, this is the first example of the use of a visual analysis method to cluster participants and identify similar patterns of behavior based on intervention message content.
Collapse
Affiliation(s)
- Annie T Chen
- Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, WA, United States
| | - Aarti Swaminathan
- Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, WA, United States
| | - William R Kearns
- Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, WA, United States
| | - Nicole M Alberts
- Department of Psychology, St Jude Children's Research Hospital, Memphis, TN, United States
| | - Emily F Law
- Department of Anesthesiology and Pain Medicine, School of Medicine, University of Washington, Seattle, WA, United States
- Center for Child Health, Behavior and Development, Seattle Children's Research Institute, Seattle, WA, United States
| | - Tonya M Palermo
- Department of Anesthesiology and Pain Medicine, School of Medicine, University of Washington, Seattle, WA, United States
- Center for Child Health, Behavior and Development, Seattle Children's Research Institute, Seattle, WA, United States
| |
Collapse
|
21
|
The language of information need: Differentiating conscious and formalized information needs. Inf Process Manag 2019. [DOI: 10.1016/j.ipm.2018.09.005] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
22
|
Kornfield R, Sarma PK, Shah DV, McTavish F, Landucci G, Pe-Romashko K, Gustafson DH. Detecting Recovery Problems Just in Time: Application of Automated Linguistic Analysis and Supervised Machine Learning to an Online Substance Abuse Forum. J Med Internet Res 2018; 20:e10136. [PMID: 29895517 PMCID: PMC6019846 DOI: 10.2196/10136] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2018] [Revised: 04/04/2018] [Accepted: 04/05/2018] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Online discussion forums allow those in addiction recovery to seek help through text-based messages, including when facing triggers to drink or use drugs. Trained staff (or "moderators") may participate within these forums to offer guidance and support when participants are struggling but must expend considerable effort to continually review new content. Demands on moderators limit the scalability of evidence-based digital health interventions. OBJECTIVE Automated identification of recovery problems could allow moderators to engage in more timely and efficient ways with participants who are struggling. This paper aimed to investigate whether computational linguistics and supervised machine learning can be applied to successfully flag, in real time, those discussion forum messages that moderators find most concerning. METHODS Training data came from a trial of a mobile phone-based health intervention for individuals in recovery from alcohol use disorder, with human coders labeling discussion forum messages according to whether or not authors mentioned problems in their recovery process. Linguistic features of these messages were extracted via several computational techniques: (1) a Bag-of-Words approach, (2) the dictionary-based Linguistic Inquiry and Word Count program, and (3) a hybrid approach combining the most important features from both Bag-of-Words and Linguistic Inquiry and Word Count. These features were applied within binary classifiers leveraging several methods of supervised machine learning: support vector machines, decision trees, and boosted decision trees. Classifiers were evaluated in data from a later deployment of the recovery support intervention. RESULTS To distinguish recovery problem disclosures, the Bag-of-Words approach relied on domain-specific language, including words explicitly linked to substance use and mental health ("drink," "relapse," "depression," and so on), whereas the Linguistic Inquiry and Word Count approach relied on language characteristics such as tone, affect, insight, and presence of quantifiers and time references, as well as pronouns. A boosted decision tree classifier, utilizing features from both Bag-of-Words and Linguistic Inquiry and Word Count performed best in identifying problems disclosed within the discussion forum, achieving 88% sensitivity and 82% specificity in a separate cohort of patients in recovery. CONCLUSIONS Differences in language use can distinguish messages disclosing recovery problems from other message types. Incorporating machine learning models based on language use allows real-time flagging of concerning content such that trained staff may engage more efficiently and focus their attention on time-sensitive issues.
Collapse
Affiliation(s)
- Rachel Kornfield
- School of Journalism and Mass Communication, University of Wisconsin-Madison, Madison, WI, United States
| | - Prathusha K Sarma
- Department of Electrical & Computer Engineering, University of Wisconsin-Madison, Madison, WI, United States
| | - Dhavan V Shah
- School of Journalism and Mass Communication, University of Wisconsin-Madison, Madison, WI, United States
| | - Fiona McTavish
- Center for Health Enhancement System Studies, University of Wisconsin-Madison, Madison, WI, United States
| | - Gina Landucci
- Center for Health Enhancement System Studies, University of Wisconsin-Madison, Madison, WI, United States
| | - Klaren Pe-Romashko
- Center for Health Enhancement System Studies, University of Wisconsin-Madison, Madison, WI, United States
| | - David H Gustafson
- Center for Health Enhancement System Studies, University of Wisconsin-Madison, Madison, WI, United States
| |
Collapse
|
23
|
Ruthven I, Buchanan S, Jardine C. Isolated, overwhelmed, and worried: Young first-time mothers asking for information and support online. J Assoc Inf Sci Technol 2018. [DOI: 10.1002/asi.24037] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Ian Ruthven
- Department of Computer and Information Sciences; University of Strathclyde; UK
| | - Steven Buchanan
- Department of Computer and Information Sciences; University of Strathclyde; UK
| | - Cara Jardine
- Department of Computer and Information Sciences; University of Strathclyde; UK
| |
Collapse
|
24
|
Zhang J, Marmor R, Huh J. Towards Supporting Patient Decision-making In Online Diabetes Communities. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2017:1893-1902. [PMID: 29854261 PMCID: PMC5977569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
As of 2014, 29.1 million people in the US have diabetes. Patients with diabetes have evolving information needs around complex lifestyle and medical decisions. As their conditions progress, patients need to sporadically make decisions by understanding alternatives and comparing options. These moments along the decision-making process present a valuable opportunity to support their information needs. An increasing number of patients visit online diabetes communities to fulfill their information needs. To understand how patients attempt to fulfill the information needs around decision-making in online communities, we reviewed 801 posts from an online diabetes community and included 79 posts for in-depth content analysis. The findings revealed motivations for posters' inquiries related to decision-making including the changes in disease state, increased self-awareness, and conflict of information received. Medication and food were the among the most popular topics discussed as part of their decision-making inquiries. Additionally, We present insights for automatically identifying those decision-making inquiries to efficiently support information needs presented in online health communities.
Collapse
Affiliation(s)
- Jing Zhang
- University of California San Diego, San Diego, CA
| | | | - Jina Huh
- University of California San Diego, San Diego, CA
| |
Collapse
|
25
|
Zhang S, Bantum EO, Owen J, Bakken S, Elhadad N. Online cancer communities as informatics intervention for social support: conceptualization, characterization, and impact. J Am Med Inform Assoc 2017; 24:451-459. [PMID: 27402140 PMCID: PMC5565989 DOI: 10.1093/jamia/ocw093] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 05/11/2016] [Indexed: 11/13/2022] Open
Abstract
Objectives: The Internet and social media are revolutionizing how social support is exchanged and perceived, making online health communities (OHCs) one of the most exciting research areas in health informatics. This paper aims to provide a framework for organizing research of OHCs and help identify questions to explore for future informatics research. Based on the framework, we conceptualize OHCs from a social support standpoint and identify variables of interest in characterizing community members. For the sake of this tutorial, we focus our review on online cancer communities. Target audience: The primary target audience is informaticists interested in understanding ways to characterize OHCs, their members, and the impact of participation, and in creating tools to facilitate outcome research of OHCs. OHC designers and moderators are also among the target audience for this tutorial. Scope: The tutorial provides an informatics point of view of online cancer communities, with social support as their leading element. We conceptualize OHCs according to 3 major variables: type of support, source of support, and setting in which the support is exchanged. We summarize current research and synthesize the findings for 2 primary research questions on online cancer communities: (1) the impact of using online social support on an individual's health, and (2) the characteristics of the community, its members, and their interactions. We discuss ways in which future research in informatics in social support and OHCs can ultimately benefit patients.
Collapse
Affiliation(s)
- Shaodian Zhang
- Department of Biomedical Informatics, Columbia University, New York, USA
| | - Erin O'Carroll Bantum
- Cancer Prevention and Control Program, University of Hawaii Cancer Center, Honolulu, HI, USA
| | - Jason Owen
- Veterans Administration Palo Alto Health Care System, Menlo Park, California, USA
| | - Suzanne Bakken
- Department of Biomedical Informatics, Columbia University, New York, USA
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, New York, USA
| |
Collapse
|
26
|
Guerra-Reyes L, Christie VM, Prabhakar A, Harris AL, Siek KA. Postpartum Health Information Seeking Using Mobile Phones: Experiences of Low-Income Mothers. Matern Child Health J 2017; 20:13-21. [PMID: 27639571 PMCID: PMC5118389 DOI: 10.1007/s10995-016-2185-8] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Objectives To assess low-income mothers' perceptions of their postpartum information needs; describe their information seeking behavior; explore their use of mobile technology to address those needs; and to contribute to the sparse literature on postpartum health and wellness. Methods Exploratory community-based qualitative approach. Interviewees were recruited among clients of community partners and had children aged 48 months and under. A survey assessing demographics was used to identify low-income mothers. 10 low-income mothers were recruited from survey participants to complete in-depth interviews regarding postpartum information needs, information seeking, and technology use. Interviews were transcribed verbatim and coded by three researchers independently. Narratives were analyzed along predetermined (etic) and emergent (emic) categories. Results Establishing breastfeeding and solving breastfeeding problems were central postpartum concerns leading to information seeking. Interviewees reported almost exclusive use of mobile phones to access the Internet. Mobile applications were widely used during pregnancy, but were not valuable postpartum. Face-to-face information from medical professionals was found to be repetitive. Online information seeking was mediated by default mobile phone search engines, and occurred over short, fragmented time periods. College graduates reported searching for authoritative knowledge sources; non-graduates preferred forums. Conclusions for Practice Low-income postpartum women rely on their smartphones to find online infant care and self-care health information. Websites replace pregnancy-related mobile applications and complement face-to-face information. Changes in searching behavior and multitasking mean information must be easily accessible and readily understood. Knowledge of page-rank systems and use of current and emergent social media will allow health-related organizations to better engage with low-income mothers online and promote evidence-based information.
Collapse
Affiliation(s)
- Lucia Guerra-Reyes
- Department of Applied Health Science, School of Public Health, Indiana University Bloomington, 1025 E 7th Street, Suite 116, Bloomington, IN, 47405, USA.
| | - Vanessa M Christie
- Department of Epidemiology, School of Public Health, Indiana University Bloomington, 1025 E 7th Street, Suite C028, Bloomington, IN, 47405, USA
| | - Annu Prabhakar
- Department of Informatics, School of Informatics and Computing, Indiana University Bloomington, 919 E. 10th Street, Bloomington, IN, 47408-3912, USA
| | - Asia L Harris
- Department of Applied Health Science, School of Public Health, Indiana University Bloomington, 1025 E 7th Street, Suite 116, Bloomington, IN, 47405, USA
| | - Katie A Siek
- Department of Informatics, School of Informatics and Computing, Indiana University Bloomington, 919 E. 10th Street, Bloomington, IN, 47408-3912, USA
| |
Collapse
|
27
|
Rathore AK, Kar AK, Ilavarasan PV. Social Media Analytics: Literature Review and Directions for Future Research. DECISION ANALYSIS 2017. [DOI: 10.1287/deca.2017.0355] [Citation(s) in RCA: 76] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Affiliation(s)
- Ashish K. Rathore
- Department of Management Studies, Indian Institute of Technology Delhi, New Delhi, Delhi 110016 India
| | - Arpan K. Kar
- Department of Management Studies, Indian Institute of Technology Delhi, New Delhi, Delhi 110016 India
| | - P. Vigneswara Ilavarasan
- Department of Management Studies, Indian Institute of Technology Delhi, New Delhi, Delhi 110016 India
| |
Collapse
|
28
|
Fronzetti Colladon A, Vagaggini F. Robustness and stability of enterprise intranet social networks: The impact of moderators. Inf Process Manag 2017. [DOI: 10.1016/j.ipm.2017.07.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
29
|
VanDam C, Kanthawala S, Pratt W, Chai J, Huh J. Detecting clinically related content in online patient posts. J Biomed Inform 2017; 75:96-106. [PMID: 28986329 PMCID: PMC5685920 DOI: 10.1016/j.jbi.2017.09.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 09/14/2017] [Accepted: 09/30/2017] [Indexed: 10/18/2022]
Abstract
Patients with chronic health conditions use online health communities to seek support and information to help manage their condition. For clinically related topics, patients can benefit from getting opinions from clinical experts, and many are concerned about misinformation and biased information being spread online. However, a large volume of community posts makes it challenging for moderators and clinical experts, if there are any, to provide necessary information. Automatically identifying forum posts that need validated clinical resources can help online health communities efficiently manage content exchange. This automation can also assist patients in need of clinical expertise by getting proper help. We present our results on testing text classification models that efficiently and accurately identify community posts containing clinical topics. We annotated 1817 posts comprised of 4966 sentences of an existing online diabetes community. We found that our classifier performed the best (F-measure: 0.83, Precision: 0.79, Recall:0.86) when using Naïve Bayes algorithm, unigrams, bigrams, trigrams, and MetaMap Symantic Types. Training took 5 s. The classification process took a fraction of 1 s. We applied our classifier to another online diabetes community, and the results were: F-measure: 0.63, Precision: 0.57, Recall: 0.71. Our results show our model is feasible to scale to other forums on identifying posts containing clinical topic with common errors properly addressed.
Collapse
Affiliation(s)
| | | | - Wanda Pratt
- University of Washington, Seattle, United States.
| | - Joyce Chai
- Michigan State University, United States.
| | - Jina Huh
- University of California San Diego, United States.
| |
Collapse
|
30
|
Taylor J, Pagliari C. Mining social media data: How are research sponsors and researchers addressing the ethical challenges? RESEARCH ETHICS REVIEW 2017. [DOI: 10.1177/1747016117738559] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Background: Data representing people’s behaviour, attitudes, feelings and relationships are increasingly being harvested from social media platforms and re-used for research purposes. This can be ethically problematic, even where such data exist in the public domain. We set out to explore how the academic community is addressing these challenges by analysing a national corpus of research ethics guidelines and published studies in one interdisciplinary research area. Methods: Ethics guidelines published by Research Councils UK (RCUK), its seven-member councils and guidelines cited within these were reviewed. Guidelines referring to social media were classified according to published typologies of social media research uses and ethical considerations for social media mining. Using health research as an exemplar, PubMed was searched to identify studies using social media data, which were assessed according to their coverage of ethical considerations and guidelines. Results: Of the 13 guidelines published or recommended by RCUK, only those from the Economic and Social Research Council, the British Psychological Society, the International Association of Internet Researchers and the National Institute for Health Research explicitly mentioned the use of social media. Regarding data re-use, all four mentioned privacy issues but varied with respect to other ethical considerations. The PubMed search revealed 156 health-related studies involving social media data, only 50 of which mentioned ethical concepts, in most cases simply stating that they had obtained ethical approval or that no consent was required. Of the nine studies originating from UK institutions, only two referred to RCUK ethics guidelines or guidelines cited within these. Conclusions: Our findings point to a deficit in ethical guidance for research involving data extracted from social media. Given the growth of studies using these new forms of data, there is a pressing need to raise awareness of their ethical challenges and provide actionable recommendations for ethical research practice.
Collapse
Affiliation(s)
- Joanna Taylor
- Usher Institute of Population Health Sciences and Informatics, The University of Edinburgh, UK
- Ernst and Young Ltd, Switzerland
| | - Claudia Pagliari
- Usher Institute of Population Health Sciences and Informatics, The University of Edinburgh, UK
| |
Collapse
|
31
|
Tapi Nzali MD, Bringay S, Lavergne C, Mollevi C, Opitz T. What Patients Can Tell Us: Topic Analysis for Social Media on Breast Cancer. JMIR Med Inform 2017; 5:e23. [PMID: 28760725 PMCID: PMC5556259 DOI: 10.2196/medinform.7779] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Revised: 06/16/2017] [Accepted: 06/17/2017] [Indexed: 11/13/2022] Open
Abstract
Background Social media dedicated to health are increasingly used by patients and health professionals. They are rich textual resources with content generated through free exchange between patients. We are proposing a method to tackle the problem of retrieving clinically relevant information from such social media in order to analyze the quality of life of patients with breast cancer. Objective Our aim was to detect the different topics discussed by patients on social media and to relate them to functional and symptomatic dimensions assessed in the internationally standardized self-administered questionnaires used in cancer clinical trials (European Organization for Research and Treatment of Cancer [EORTC] Quality of Life Questionnaire Core 30 [QLQ-C30] and breast cancer module [QLQ-BR23]). Methods First, we applied a classic text mining technique, latent Dirichlet allocation (LDA), to detect the different topics discussed on social media dealing with breast cancer. We applied the LDA model to 2 datasets composed of messages extracted from public Facebook groups and from a public health forum (cancerdusein.org, a French breast cancer forum) with relevant preprocessing. Second, we applied a customized Jaccard coefficient to automatically compute similarity distance between the topics detected with LDA and the questions in the self-administered questionnaires used to study quality of life. Results Among the 23 topics present in the self-administered questionnaires, 22 matched with the topics discussed by patients on social media. Interestingly, these topics corresponded to 95% (22/23) of the forum and 86% (20/23) of the Facebook group topics. These figures underline that topics related to quality of life are an important concern for patients. However, 5 social media topics had no corresponding topic in the questionnaires, which do not cover all of the patients’ concerns. Of these 5 topics, 2 could potentially be used in the questionnaires, and these 2 topics corresponded to a total of 3.10% (523/16,868) of topics in the cancerdusein.org corpus and 4.30% (3014/70,092) of the Facebook corpus. Conclusions We found a good correspondence between detected topics on social media and topics covered by the self-administered questionnaires, which substantiates the sound construction of such questionnaires. We detected new emerging topics from social media that can be used to complete current self-administered questionnaires. Moreover, we confirmed that social media mining is an important source of information for complementary analysis of quality of life.
Collapse
Affiliation(s)
- Mike Donald Tapi Nzali
- Institut Montpelliérain Alexander Grothendieck (IMAG), Department of Mathematics, Montpellier University, Montpellier, France.,Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Department of Computer Science, Montpellier University, Montpellier, France
| | - Sandra Bringay
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Department of Computer Science, Montpellier University, Montpellier, France.,Paul Valery University, Montpellier, France
| | - Christian Lavergne
- Institut Montpelliérain Alexander Grothendieck (IMAG), Department of Mathematics, Montpellier University, Montpellier, France.,Paul Valery University, Montpellier, France
| | - Caroline Mollevi
- Biometrics Unit, Institut du Cancer Montpellier (ICM), Montpellier, France
| | - Thomas Opitz
- BioSP Unit, Institut National de la Recherche Agronomique (INRA), Avignon, France
| |
Collapse
|
32
|
Cronin RM, Fabbri D, Denny JC, Rosenbloom ST, Jackson GP. A comparison of rule-based and machine learning approaches for classifying patient portal messages. Int J Med Inform 2017; 105:110-120. [PMID: 28750904 DOI: 10.1016/j.ijmedinf.2017.06.004] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Revised: 06/13/2017] [Accepted: 06/20/2017] [Indexed: 12/28/2022]
Abstract
OBJECTIVE Secure messaging through patient portals is an increasingly popular way that consumers interact with healthcare providers. The increasing burden of secure messaging can affect clinic staffing and workflows. Manual management of portal messages is costly and time consuming. Automated classification of portal messages could potentially expedite message triage and delivery of care. MATERIALS AND METHODS We developed automated patient portal message classifiers with rule-based and machine learning techniques using bag of words and natural language processing (NLP) approaches. To evaluate classifier performance, we used a gold standard of 3253 portal messages manually categorized using a taxonomy of communication types (i.e., main categories of informational, medical, logistical, social, and other communications, and subcategories including prescriptions, appointments, problems, tests, follow-up, contact information, and acknowledgement). We evaluated our classifiers' accuracies in identifying individual communication types within portal messages with area under the receiver-operator curve (AUC). Portal messages often contain more than one type of communication. To predict all communication types within single messages, we used the Jaccard Index. We extracted the variables of importance for the random forest classifiers. RESULTS The best performing approaches to classification for the major communication types were: logistic regression for medical communications (AUC: 0.899); basic (rule-based) for informational communications (AUC: 0.842); and random forests for social communications and logistical communications (AUCs: 0.875 and 0.925, respectively). The best performing classification approach of classifiers for individual communication subtypes was random forests for Logistical-Contact Information (AUC: 0.963). The Jaccard Indices by approach were: basic classifier, Jaccard Index: 0.674; Naïve Bayes, Jaccard Index: 0.799; random forests, Jaccard Index: 0.859; and logistic regression, Jaccard Index: 0.861. For medical communications, the most predictive variables were NLP concepts (e.g., Temporal_Concept, which maps to 'morning', 'evening' and Idea_or_Concept which maps to 'appointment' and 'refill'). For logistical communications, the most predictive variables contained similar numbers of NLP variables and words (e.g., Telephone mapping to 'phone', 'insurance'). For social and informational communications, the most predictive variables were words (e.g., social: 'thanks', 'much', informational: 'question', 'mean'). CONCLUSIONS This study applies automated classification methods to the content of patient portal messages and evaluates the application of NLP techniques on consumer communications in patient portal messages. We demonstrated that random forest and logistic regression approaches accurately classified the content of portal messages, although the best approach to classification varied by communication type. Words were the most predictive variables for classification of most communication types, although NLP variables were most predictive for medical communication types. As adoption of patient portals increases, automated techniques could assist in understanding and managing growing volumes of messages. Further work is needed to improve classification performance to potentially support message triage and answering.
Collapse
Affiliation(s)
- Robert M Cronin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Daniel Fabbri
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - S Trent Rosenbloom
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Gretchen Purcell Jackson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Pediatric Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
33
|
A tale of two countries: International comparison of online doctor reviews between China and the United States. Int J Med Inform 2017; 99:37-44. [PMID: 28118920 DOI: 10.1016/j.ijmedinf.2016.12.007] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Revised: 12/16/2016] [Accepted: 12/17/2016] [Indexed: 11/21/2022]
Abstract
BACKGROUND Worldwide, patients have posted millions of online reviews for their doctors. The rich textual information in the online reviews holds the potential to generate insights into how patients' experience with their doctors differ across nations and how should we use them to improve our health service. OBJECTIVE We apply customized text mining techniques to compare online doctor reviews from China and the United States, in order to measure the systematic differences in patient reviews between the two countries, and assess the potential insights that can be derived from this large volume of online text data. METHODS We compare the textual reviews of obstetrics and gynecology (OBGYN) doctors from the two most popular online doctor rating websites in the U.S. and China, respectively: RateMDs.com and Haodf.com. We apply a customized text mining technique, Latent Dirichlet Allocation (LDA) topic modeling to identify the major topics in positive and negative reviews of those two countries. We then compare their similarities and differences. RESULTS Among the positive reviews, both Chinese and American patients talked about medical treatment, bedside manner, and appreciation/recommendation, but Chinese patients commented more about medical treatment while American patients focused more on recommendation. Also, reviews about bedside manner from Chinese patients were more related to doctors while on the American side, they were more about staff. This reflects the difference between the two countries' health systems. Further, among the negative reviews, both countries' patients talked about medical treatment, bedside manner, and logistics. However, Chinese patients focus more on the registration process, while American patients are more related to the staff, wait time, and insurance, which further shows the differences between the two nations' health systems. CONCLUSIONS Online doctor reviews contain valuable information that can generate insights on the similarities and differences of patient experience across nations. They are useful assets to assist healthcare consumers, providers, and administrators in moving toward a patient-centered care. In this age of big data, online doctor reviews can be a valuable source for international perspectives on healthcare systems.
Collapse
|
34
|
Lim S, Tucker CS, Kumara S. An unsupervised machine learning model for discovering latent infectious diseases using social media data. J Biomed Inform 2016; 66:82-94. [PMID: 28034788 DOI: 10.1016/j.jbi.2016.12.007] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2016] [Revised: 12/03/2016] [Accepted: 12/14/2016] [Indexed: 10/20/2022]
Abstract
INTRODUCTION The authors of this work propose an unsupervised machine learning model that has the ability to identify real-world latent infectious diseases by mining social media data. In this study, a latent infectious disease is defined as a communicable disease that has not yet been formalized by national public health institutes and explicitly communicated to the general public. Most existing approaches to modeling infectious-disease-related knowledge discovery through social media networks are top-down approaches that are based on already known information, such as the names of diseases and their symptoms. In existing top-down approaches, necessary but unknown information, such as disease names and symptoms, is mostly unidentified in social media data until national public health institutes have formalized that disease. Most of the formalizing processes for latent infectious diseases are time consuming. Therefore, this study presents a bottom-up approach for latent infectious disease discovery in a given location without prior information, such as disease names and related symptoms. METHODS Social media messages with user and temporal information are extracted during the data preprocessing stage. An unsupervised sentiment analysis model is then presented. Users' expressions about symptoms, body parts, and pain locations are also identified from social media data. Then, symptom weighting vectors for each individual and time period are created, based on their sentiment and social media expressions. Finally, latent-infectious-disease-related information is retrieved from individuals' symptom weighting vectors. DATASETS AND RESULTS Twitter data from August 2012 to May 2013 are used to validate this study. Real electronic medical records for 104 individuals, who were diagnosed with influenza in the same period, are used to serve as ground truth validation. The results are promising, with the highest precision, recall, and F1 score values of 0.773, 0.680, and 0.724, respectively. CONCLUSION This work uses individuals' social media messages to identify latent infectious diseases, without prior information, quicker than when the disease(s) is formalized by national public health institutes. In particular, the unsupervised machine learning model using user, textual, and temporal information in social media data, along with sentiment analysis, identifies latent infectious diseases in a given location.
Collapse
Affiliation(s)
- Sunghoon Lim
- Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Conrad S Tucker
- School of Engineering Design, Technology, and Professional Programs, The Pennsylvania State University, University Park, PA 16802, USA; Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA.
| | - Soundar Kumara
- Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
35
|
Bui N, Yen J, Honavar V. Temporal Causality Analysis of Sentiment Change in a Cancer Survivor Network. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS 2016; 3:75-87. [PMID: 29399599 PMCID: PMC5796429 DOI: 10.1109/tcss.2016.2591880] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Online health communities constitute a useful source of information and social support for patients. American Cancer Society's Cancer Survivor Network (CSN), a 173,000-member community, is the largest online network for cancer patients, survivors, and caregivers. A discussion thread in CSN is often initiated by a cancer survivor seeking support from other members of CSN. Discussion threads are multi-party conversations that often provide a source of social support e.g., by bringing about a change of sentiment from negative to positive on the part of the thread originator. While previous studies regarding cancer survivors have shown that members of an online health community derive benefits from their participation in such communities, causal accounts of the factors that contribute to the observed benefits have been lacking. We introduce a novel framework to examine the temporal causality of sentiment dynamics in the CSN. We construct a Probabilistic Computation Tree Logic representation and a corresponding probabilistic Kripke structure to represent and reason about the changes in sentiments of posts in a thread over time. We use a sentiment classifier trained using machine learning on a set of posts manually tagged with sentiment labels to classify posts as expressing either positive or negative sentiment. We analyze the probabilistic Kripke structure to identify the prima facie causes of sentiment change on the part of the thread originators in the CSN forum and their significance. We find that the sentiment of replies appears to causally influence the sentiment of the thread originator. Our experiments also show that the conclusions are robust with respect to the choice of the (i) classification threshold of the sentiment classifier; (ii) and the choice of the specific sentiment classifier used. We also extend the basic framework for temporal causality analysis to incorporate the uncertainty in the states of the probabilistic Kripke structure resulting from the use of an imperfect state transducer (in our case, the sentiment classifier). Our analysis of temporal causality of CSN sentiment dynamics offers new insights that the designers, managers and moderators of an online community such as CSN can utilize to facilitate and enhance the interactions so as to better meet the social support needs of the CSN participants. The proposed methodology for analysis of temporal causality has broad applicability in a variety of settings where the dynamics of the underlying system can be modeled in terms of state variables that change in response to internal or external inputs.
Collapse
Affiliation(s)
- Ngot Bui
- PhD Candidate in the College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802 USA
| | - John Yen
- Professor in the College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802 USA
| | - Vasant Honavar
- Edward Frymoyer Endowed Professor in the College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802 USA
| |
Collapse
|
36
|
Kanthawala S, Vermeesch A, Given B, Huh J. Answers to Health Questions: Internet Search Results Versus Online Health Community Responses. J Med Internet Res 2016; 18:e95. [PMID: 27125622 PMCID: PMC4865652 DOI: 10.2196/jmir.5369] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Revised: 02/19/2016] [Accepted: 03/18/2016] [Indexed: 12/04/2022] Open
Abstract
Background About 6 million people search for health information on the Internet each day in the United States. Both patients and caregivers search for information about prescribed courses of treatments, unanswered questions after a visit to their providers, or diet and exercise regimens. Past literature has indicated potential challenges around quality in health information available on the Internet. However, diverse information exists on the Internet—ranging from government-initiated webpages to personal blog pages. Yet we do not fully understand the strengths and weaknesses of different types of information available on the Internet. Objective The objective of this research was to investigate the strengths and challenges of various types of health information available online and to suggest what information sources best fit various question types. Methods We collected questions posted to and the responses they received from an online diabetes community and classified them according to Rothwell’s classification of question types (fact, policy, or value questions). We selected 60 questions (20 each of fact, policy, and value) and the replies the questions received from the community. We then searched for responses to the same questions using a search engine and recorded the Results Community responses answered more questions than did search results overall. Search results were most effective in answering value questions and least effective in answering policy questions. Community responses answered questions across question types at an equivalent rate, but most answered policy questions and the least answered fact questions. Value questions were most answered by community responses, but some of these answers provided by the community were incorrect. Fact question search results were the most clinically valid. Conclusions The Internet is a prevalent source of health information for people. The information quality people encounter online can have a large impact on them. We present what kinds of questions people ask online and the advantages and disadvantages of various information sources in getting answers to those questions. This study contributes to addressing people’s online health information needs.
Collapse
Affiliation(s)
- Shaheen Kanthawala
- Department of Media and Information, Michigan State University, East Lansing, MI, United States.
| | | | | | | |
Collapse
|
37
|
Kwon BC, Kim SH, Lee S, Choo J, Huh J, Yi JS. VisOHC: Designing Visual Analytics for Online Health Communities. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:71-80. [PMID: 26529688 PMCID: PMC4638132 DOI: 10.1109/tvcg.2015.2467555] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Through online health communities (OHCs), patients and caregivers exchange their illness experiences and strategies for overcoming the illness, and provide emotional support. To facilitate healthy and lively conversations in these communities, their members should be continuously monitored and nurtured by OHC administrators. The main challenge of OHC administrators' tasks lies in understanding the diverse dimensions of conversation threads that lead to productive discussions in their communities. In this paper, we present a design study in which three domain expert groups participated, an OHC researcher and two OHC administrators of online health communities, which was conducted to find with a visual analytic solution. Through our design study, we characterized the domain goals of OHC administrators and derived tasks to achieve these goals. As a result of this study, we propose a system called VisOHC, which visualizes individual OHC conversation threads as collapsed boxes-a visual metaphor of conversation threads. In addition, we augmented the posters' reply authorship network with marks and/or beams to show conversation dynamics within threads. We also developed unique measures tailored to the characteristics of OHCs, which can be encoded for thread visualizations at the users' requests. Our observation of the two administrators while using VisOHC showed that it supports their tasks and reveals interesting insights into online health communities. Finally, we share our methodological lessons on probing visual designs together with domain experts by allowing them to freely encode measurements into visual variables.
Collapse
|
38
|
A research framework for pharmacovigilance in health social media: Identification and evaluation of patient adverse drug event reports. J Biomed Inform 2015; 58:268-279. [PMID: 26518315 DOI: 10.1016/j.jbi.2015.10.011] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Revised: 10/20/2015] [Accepted: 10/21/2015] [Indexed: 11/23/2022]
Abstract
Social media offer insights of patients' medical problems such as drug side effects and treatment failures. Patient reports of adverse drug events from social media have great potential to improve current practice of pharmacovigilance. However, extracting patient adverse drug event reports from social media continues to be an important challenge for health informatics research. In this study, we develop a research framework with advanced natural language processing techniques for integrated and high-performance patient reported adverse drug event extraction. The framework consists of medical entity extraction for recognizing patient discussions of drug and events, adverse drug event extraction with shortest dependency path kernel based statistical learning method and semantic filtering with information from medical knowledge bases, and report source classification to tease out noise. To evaluate the proposed framework, a series of experiments were conducted on a test bed encompassing about postings from major diabetes and heart disease forums in the United States. The results reveal that each component of the framework significantly contributes to its overall effectiveness. Our framework significantly outperforms prior work.
Collapse
|
39
|
Park A, Hartzler AL, Huh J, McDonald DW, Pratt W. Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text. J Med Internet Res 2015; 17:e212. [PMID: 26323337 PMCID: PMC4642409 DOI: 10.2196/jmir.4612] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2015] [Revised: 07/14/2015] [Accepted: 07/24/2015] [Indexed: 12/03/2022] Open
Abstract
Background The prevalence and value of patient-generated health text are increasing, but processing such text remains problematic. Although existing biomedical natural language processing (NLP) tools are appealing, most were developed to process clinician- or researcher-generated text, such as clinical notes or journal articles. In addition to being constructed for different types of text, other challenges of using existing NLP include constantly changing technologies, source vocabularies, and characteristics of text. These continuously evolving challenges warrant the need for applying low-cost systematic assessment. However, the primarily accepted evaluation method in NLP, manual annotation, requires tremendous effort and time. Objective The primary objective of this study is to explore an alternative approach—using low-cost, automated methods to detect failures (eg, incorrect boundaries, missed terms, mismapped concepts) when processing patient-generated text with existing biomedical NLP tools. We first characterize common failures that NLP tools can make in processing online community text. We then demonstrate the feasibility of our automated approach in detecting these common failures using one of the most popular biomedical NLP tools, MetaMap. Methods Using 9657 posts from an online cancer community, we explored our automated failure detection approach in two steps: (1) to characterize the failure types, we first manually reviewed MetaMap’s commonly occurring failures, grouped the inaccurate mappings into failure types, and then identified causes of the failures through iterative rounds of manual review using open coding, and (2) to automatically detect these failure types, we then explored combinations of existing NLP techniques and dictionary-based matching for each failure cause. Finally, we manually evaluated the automatically detected failures. Results From our manual review, we characterized three types of failure: (1) boundary failures, (2) missed term failures, and (3) word ambiguity failures. Within these three failure types, we discovered 12 causes of inaccurate mappings of concepts. We used automated methods to detect almost half of 383,572 MetaMap’s mappings as problematic. Word sense ambiguity failure was the most widely occurring, comprising 82.22% of failures. Boundary failure was the second most frequent, amounting to 15.90% of failures, while missed term failures were the least common, making up 1.88% of failures. The automated failure detection achieved precision, recall, accuracy, and F1 score of 83.00%, 92.57%, 88.17%, and 87.52%, respectively. Conclusions We illustrate the challenges of processing patient-generated online health community text and characterize failures of NLP tools on this patient-generated health text, demonstrating the feasibility of our low-cost approach to automatically detect those failures. Our approach shows the potential for scalable and effective solutions to automatically assess the constantly evolving NLP tools and source vocabularies to process patient-generated text.
Collapse
Affiliation(s)
- Albert Park
- Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, WA, United States.
| | | | | | | | | |
Collapse
|
40
|
McRoy S, Jones S, Kurmally A. Toward automated classification of consumers' cancer-related questions with a new taxonomy of expected answer types. Health Informatics J 2015; 22:523-35. [PMID: 25759063 DOI: 10.1177/1460458215571643] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This article examines methods for automated question classification applied to cancer-related questions that people have asked on the web. This work is part of a broader effort to provide automated question answering for health education. We created a new corpus of consumer-health questions related to cancer and a new taxonomy for those questions. We then compared the effectiveness of different statistical methods for developing classifiers, including weighted classification and resampling. Basic methods for building classifiers were limited by the high variability in the natural distribution of questions and typical refinement approaches of feature selection and merging categories achieved only small improvements to classifier accuracy. Best performance was achieved using weighted classification and resampling methods, the latter yielding an accuracy of F1 = 0.963. Thus, it would appear that statistical classifiers can be trained on natural data, but only if natural distributions of classes are smoothed. Such classifiers would be useful for automated question answering, for enriching web-based content, or assisting clinical professionals to answer questions.
Collapse
|
41
|
Yang M, Kiang M, Shang W. Filtering big data from social media--Building an early warning system for adverse drug reactions. J Biomed Inform 2015; 54:230-40. [PMID: 25688695 DOI: 10.1016/j.jbi.2015.01.011] [Citation(s) in RCA: 72] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2014] [Revised: 12/09/2014] [Accepted: 01/27/2015] [Indexed: 10/24/2022]
Abstract
OBJECTIVES Adverse drug reactions (ADRs) are believed to be a leading cause of death in the world. Pharmacovigilance systems are aimed at early detection of ADRs. With the popularity of social media, Web forums and discussion boards become important sources of data for consumers to share their drug use experience, as a result may provide useful information on drugs and their adverse reactions. In this study, we propose an automated ADR related posts filtering mechanism using text classification methods. In real-life settings, ADR related messages are highly distributed in social media, while non-ADR related messages are unspecific and topically diverse. It is expensive to manually label a large amount of ADR related messages (positive examples) and non-ADR related messages (negative examples) to train classification systems. To mitigate this challenge, we examine the use of a partially supervised learning classification method to automate the process. METHODS We propose a novel pharmacovigilance system leveraging a Latent Dirichlet Allocation modeling module and a partially supervised classification approach. We select drugs with more than 500 threads of discussion, and collect all the original posts and comments of these drugs using an automatic Web spidering program as the text corpus. Various classifiers were trained by varying the number of positive examples and the number of topics. The trained classifiers were applied to 3000 posts published over 60 days. Top-ranked posts from each classifier were pooled and the resulting set of 300 posts was reviewed by a domain expert to evaluate the classifiers. RESULTS Compare to the alternative approaches using supervised learning methods and three general purpose partially supervised learning methods, our approach performs significantly better in terms of precision, recall, and the F measure (the harmonic mean of precision and recall), based on a computational experiment using online discussion threads from Medhelp. CONCLUSIONS Our design provides satisfactory performance in identifying ADR related posts for post-marketing drug surveillance. The overall design of our system also points out a potentially fruitful direction for building other early warning systems that need to filter big data from social media networks.
Collapse
Affiliation(s)
- Ming Yang
- Department of Information Management, School of Information, Central University of Finance and Economics, Beijing 100081, China.
| | - Melody Kiang
- Department of Information Systems, California State University, Long Beach, CA 90840, United States.
| | - Wei Shang
- Academy of Mathematics and Systems Science, Beijing 100190, China.
| |
Collapse
|
42
|
Grabar N, Dumonet L. Automatic Computing of Global Emotional Polarity in French Health Forum Messages. Artif Intell Med 2015. [DOI: 10.1007/978-3-319-19551-3_32] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
43
|
Huh J, Pratt W. Weaving Clinical Expertise in Online Health Communities. PROCEEDINGS OF THE SIGCHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS. CHI CONFERENCE 2014; 2014:1355-1364. [PMID: 26413582 DOI: 10.1145/2556288.2557293] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Many patients visit online health communities to receive support. In face-to-face support groups, health professionals facilitate peer-patients exchanging experience while adding their clinical expertise when necessary. However, the large scale of online health communities makes it challenging for such health professional moderators' involvement to happen. To address this challenge of delivering clinical expertise to where patients need them, we explore the idea of semi-automatically providing clinical expertise in online health communities. We interviewed 14 clinicians showing them example peer-patient conversation threads. From the interviews, we examined the ideal practice of clinicians providing expertise to patients. The clinicians continuously assessed when peer-patients were providing appropriate support, what kinds of clinical help they could give online, and when to defer to patients' healthcare providers. The findings inform requirements for building a semi-automated system delivering clinical expertise in online health communities.
Collapse
Affiliation(s)
- Jina Huh
- Telecommunication, Information Studies, and Media Michigan State University
| | - Wanda Pratt
- The Information School DUB Biomedical and Health Informatics University of Washington
| |
Collapse
|
44
|
Bui DDA, Zeng-Treitler Q. Learning regular expressions for clinical text classification. J Am Med Inform Assoc 2014; 21:850-7. [PMID: 24578357 DOI: 10.1136/amiajnl-2013-002411] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVES Natural language processing (NLP) applications typically use regular expressions that have been developed manually by human experts. Our goal is to automate both the creation and utilization of regular expressions in text classification. METHODS We designed a novel regular expression discovery (RED) algorithm and implemented two text classifiers based on RED. The RED+ALIGN classifier combines RED with an alignment algorithm, and RED+SVM combines RED with a support vector machine (SVM) classifier. Two clinical datasets were used for testing and evaluation: the SMOKE dataset, containing 1091 text snippets describing smoking status; and the PAIN dataset, containing 702 snippets describing pain status. We performed 10-fold cross-validation to calculate accuracy, precision, recall, and F-measure metrics. In the evaluation, an SVM classifier was trained as the control. RESULTS The two RED classifiers achieved 80.9-83.0% in overall accuracy on the two datasets, which is 1.3-3% higher than SVM's accuracy (p<0.001). Similarly, small but consistent improvements have been observed in precision, recall, and F-measure when RED classifiers are compared with SVM alone. More significantly, RED+ALIGN correctly classified many instances that were misclassified by the SVM classifier (8.1-10.3% of the total instances and 43.8-53.0% of SVM's misclassifications). CONCLUSIONS Machine-generated regular expressions can be effectively used in clinical text classification. The regular expression-based classifier can be combined with other classifiers, like SVM, to improve classification performance.
Collapse
Affiliation(s)
- Duy Duc An Bui
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA VA Salt Lake City Health Care System, Salt Lake City, Utah, USA
| | - Qing Zeng-Treitler
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA VA Salt Lake City Health Care System, Salt Lake City, Utah, USA
| |
Collapse
|
45
|
Rodríguez-González A, Mayer MA, Fernández-Breis JT. Biomedical information through the implementation of social media environments. J Biomed Inform 2013. [DOI: 10.1016/j.jbi.2013.10.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|