1
|
Arquembourg J, Glaser P, Roblot F, Metzler I, Gallant-Dewavrin M, Nanguem HF, Mebarki A, Voillot P, Schück S. Discussions of Antibiotic Resistance on Social Media Platforms: Text Mining and Mixed Methods Content Analysis Study. JMIR Form Res 2025; 9:e37160. [PMID: 40289322 PMCID: PMC12047853 DOI: 10.2196/37160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 09/26/2022] [Accepted: 09/27/2022] [Indexed: 11/13/2022] Open
Abstract
Background With the increasing popularity of web 2.0 apps, social media has made it possible for individuals to post messages on antibiotic ineffectiveness. In such online conversations, patients discuss their quality of life (QoL). Social media have become key tools for finding and disseminating medical information. Objective To identify the main themes of discussion, the difficulties encountered by patients with respect to antibiotic ineffectiveness and the impact on their QoL (physical, psychological, social, or financial). Methods A noninterventional retrospective study was carried out by collecting social media posts in French language written by internet users mentioning their experience with antibiotics, and the impact of their ineffectiveness on their QoL. Messages posted between January 2014 and July 2020 were extracted from French-speaking publicly available online forums. Results A total of 3773 messages were included in the analysis corpus after extraction and filtering. These messages were posted by 2335 individual web users, most of them being women around 35 years of age. Inefficacy of treatment options and the lack of information regarding the use of antibiotics were among the most discussed topics. QoL was discussed in 63% of the 3773 messages posted. The most common is the physical impact (78%). Patients discussed the persistence of symptoms and adverse effects. The second kind of impact is psychological (65%), characterized by feelings of anxiety or despair about the situation. Conclusions This social media analysis allowed us to identify a strong impact of the perceived ineffectiveness of antibiotic therapy on patients' daily life particularly in terms of physical and psychological consequences. These results provide health care experts information directly generated by patients regarding their own experiences. Social media studies constitute a complementary source of evidence that could be used to optimize messages to the public about appropriate use of antibiotics.
Collapse
Affiliation(s)
| | - Philippe Glaser
- Unité EERA Institut Pasteur, CNRS UMR3525, Université de Paris, Paris, France
| | - France Roblot
- CHRU Poitiers, INSERM U1070, CHRU Poitiers, Poitiers, France
| | - Isabelle Metzler
- Association France Spondyloarthrites, Association France Spondyloarthrites, Tulle, France
| | | | - Hugues Feutze Nanguem
- CREDIMI: Center for Research on International Market and Investment Law, University of Burgundy, Dijon, France
- Pfizer, Paris, France
| | - Adel Mebarki
- Kap Code, 146 Rue Montmartre, Paris, 75002, France, 33 625530241
| | - Paméla Voillot
- Kap Code, 146 Rue Montmartre, Paris, 75002, France, 33 625530241
| | - Stéphane Schück
- Kap Code, 146 Rue Montmartre, Paris, 75002, France, 33 625530241
| |
Collapse
|
2
|
Karapetiantz P, Audeh B, Redjdal A, Tiffet T, Bousquet C, Jaulent MC. Monitoring Adverse Drug Events in Web Forums: Evaluation of a Pipeline and Use Case Study. J Med Internet Res 2024; 26:e46176. [PMID: 38888956 PMCID: PMC11220433 DOI: 10.2196/46176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 10/20/2023] [Accepted: 03/12/2024] [Indexed: 06/20/2024] Open
Abstract
BACKGROUND To mitigate safety concerns, regulatory agencies must make informed decisions regarding drug usage and adverse drug events (ADEs). The primary pharmacovigilance data stem from spontaneous reports by health care professionals. However, underreporting poses a notable challenge within the current system. Explorations into alternative sources, including electronic patient records and social media, have been undertaken. Nevertheless, social media's potential remains largely untapped in real-world scenarios. OBJECTIVE The challenge faced by regulatory agencies in using social media is primarily attributed to the absence of suitable tools to support decision makers. An effective tool should enable access to information via a graphical user interface, presenting data in a user-friendly manner rather than in their raw form. This interface should offer various visualization options, empowering users to choose representations that best convey the data and facilitate informed decision-making. Thus, this study aims to assess the potential of integrating social media into pharmacovigilance and enhancing decision-making with this novel data source. To achieve this, our objective was to develop and assess a pipeline that processes data from the extraction of web forum posts to the generation of indicators and alerts within a visual and interactive environment. The goal was to create a user-friendly tool that enables regulatory authorities to make better-informed decisions effectively. METHODS To enhance pharmacovigilance efforts, we have devised a pipeline comprising 4 distinct modules, each independently editable, aimed at efficiently analyzing health-related French web forums. These modules were (1) web forums' posts extraction, (2) web forums' posts annotation, (3) statistics and signal detection algorithm, and (4) a graphical user interface (GUI). We showcase the efficacy of the GUI through an illustrative case study involving the introduction of the new formula of Levothyrox in France. This event led to a surge in reports to the French regulatory authority. RESULTS Between January 1, 2017, and February 28, 2021, a total of 2,081,296 posts were extracted from 23 French web forums. These posts contained 437,192 normalized drug-ADE couples, annotated with the Anatomical Therapeutic Chemical (ATC) Classification and Medical Dictionary for Regulatory Activities (MedDRA). The analysis of the Levothyrox new formula revealed a notable pattern. In August 2017, there was a sharp increase in posts related to this medication on social media platforms, which coincided with a substantial uptick in reports submitted by patients to the national regulatory authority during the same period. CONCLUSIONS We demonstrated that conducting quantitative analysis using the GUI is straightforward and requires no coding. The results aligned with prior research and also offered potential insights into drug-related matters. Our hypothesis received partial confirmation because the final users were not involved in the evaluation process. Further studies, concentrating on ergonomics and the impact on professionals within regulatory agencies, are imperative for future research endeavors. We emphasized the versatility of our approach and the seamless interoperability between different modules over the performance of individual modules. Specifically, the annotation module was integrated early in the development process and could undergo substantial enhancement by leveraging contemporary techniques rooted in the Transformers architecture. Our pipeline holds potential applications in health surveillance by regulatory agencies or pharmaceutical companies, aiding in the identification of safety concerns. Moreover, it could be used by research teams for retrospective analysis of events.
Collapse
Affiliation(s)
- Pierre Karapetiantz
- Inserm, Sorbonne Université, université Paris 13, Laboratoire d'informatique médicale et d'ingénierie des connaissances en e-santé, LIMICS, F-75006, Paris, France
| | - Bissan Audeh
- Inserm, Sorbonne Université, université Paris 13, Laboratoire d'informatique médicale et d'ingénierie des connaissances en e-santé, LIMICS, F-75006, Paris, France
| | - Akram Redjdal
- Inserm, Sorbonne Université, université Paris 13, Laboratoire d'informatique médicale et d'ingénierie des connaissances en e-santé, LIMICS, F-75006, Paris, France
| | - Théophile Tiffet
- Service de santé publique et information médicale, CHU de Saint Etienne, 42000 Saint-Etienne, France
- Institut National de la Santé et de la Recherche Médicale, Université Jean Monnet, SAnté INgéniérie BIOlogie St-Etienne, SAINBIOSE, 42270 Saint-Priest-en-Jarez, France
| | - Cédric Bousquet
- Inserm, Sorbonne Université, université Paris 13, Laboratoire d'informatique médicale et d'ingénierie des connaissances en e-santé, LIMICS, F-75006, Paris, France
- Service de santé publique et information médicale, CHU de Saint Etienne, 42000 Saint-Etienne, France
| | - Marie-Christine Jaulent
- Inserm, Sorbonne Université, université Paris 13, Laboratoire d'informatique médicale et d'ingénierie des connaissances en e-santé, LIMICS, F-75006, Paris, France
| |
Collapse
|
3
|
Faviez C, Talmatkadi M, Foulquié P, Mebarki A, Schück S, Burgun A, Chen X. Assessment of the Early Detection of Anosmia and Ageusia Symptoms in COVID-19 on Twitter: Retrospective Study. JMIR INFODEMIOLOGY 2023; 3:e41863. [PMID: 37643302 PMCID: PMC10521907 DOI: 10.2196/41863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 06/29/2023] [Accepted: 08/01/2023] [Indexed: 08/31/2023]
Abstract
BACKGROUND During the unprecedented COVID-19 pandemic, social media has been extensively used to amplify the spread of information and to express personal health-related experiences regarding symptoms, including anosmia and ageusia, 2 symptoms that have been reported later than other symptoms. OBJECTIVE Our objective is to investigate to what extent Twitter users reported anosmia and ageusia symptoms in their tweets and if they connected them to COVID-19, to evaluate whether these symptoms could have been identified as COVID-19 symptoms earlier using Twitter rather than the official notice. METHODS We collected French tweets posted between January 1, 2020, and March 31, 2020, containing anosmia- or ageusia-related keywords. Symptoms were detected using fuzzy matching. The analysis consisted of 3 parts. First, we compared the coverage of anosmia and ageusia symptoms in Twitter and in traditional media to determine if the association between COVID-19 and anosmia or ageusia could have been identified earlier through Twitter. Second, we conducted a manual analysis of anosmia- and ageusia-related tweets to obtain quantitative and qualitative insights regarding their nature and to assess when the first associations between COVID-19 and these symptoms were established. We randomly annotated tweets from 2 periods: the early stage and the rapid spread stage of the epidemic. For each tweet, each symptom was annotated regarding 3 modalities: symptom (yes or no), associated with COVID-19 (yes, no, or unknown), and whether it was experienced by someone (yes, no, or unknown). Third, to evaluate if there was a global increase of tweets mentioning anosmia or ageusia in early 2020, corresponding to the beginning of the COVID-19 epidemic, we compared the tweets reporting experienced anosmia or ageusia between the first periods of 2019 and 2020. RESULTS In total, 832 (respectively 12,544) tweets containing anosmia (respectively ageusia) related keywords were extracted over the analysis period in 2020. The comparison to traditional media showed a strong correlation without any lag, which suggests an important reactivity of Twitter but no earlier detection on Twitter. The annotation of tweets from 2020 showed that tweets correlating anosmia or ageusia with COVID-19 could be found a few days before the official announcement. However, no association could be found during the first stage of the pandemic. Information about the temporality of symptoms and the psychological impact of these symptoms could be found in the tweets. The comparison between early 2020 and early 2019 showed no difference regarding the volumes of tweets. CONCLUSIONS Based on our analysis of French tweets, associations between COVID-19 and anosmia or ageusia by web users could have been found on Twitter just a few days before the official announcement but not during the early stage of the pandemic. Patients share qualitative information on Twitter regarding anosmia or ageusia symptoms that could be of interest for future analyses.
Collapse
Affiliation(s)
- Carole Faviez
- Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, Institut National de la Santé et de la Recherche Médicale (INSERM) UMR 1138, Paris, France
- Health Data- and Model- Driven Knowledge Acquisition (HeKA), Inria Paris, Paris, France
| | | | | | | | | | - Anita Burgun
- Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, Institut National de la Santé et de la Recherche Médicale (INSERM) UMR 1138, Paris, France
- Health Data- and Model- Driven Knowledge Acquisition (HeKA), Inria Paris, Paris, France
- Department of Medical Informatics, Hôpital Necker-Enfant Malades, Assistance Publique - Hôpitaux de Paris (AP-HP), Paris, France
| | - Xiaoyi Chen
- Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, Institut National de la Santé et de la Recherche Médicale (INSERM) UMR 1138, Paris, France
- Health Data- and Model- Driven Knowledge Acquisition (HeKA), Inria Paris, Paris, France
- Data Science Platform, Imagine Institute, Université Paris Cité, Institut National de la Santé et de la Recherche Médicale (INSERM) UMR 1163, Paris, France
| |
Collapse
|
4
|
Kaas-Hansen BS, Placido D, Rodríguez CL, Thorsen-Meyer HC, Gentile S, Nielsen AP, Brunak S, Jürgens G, Andersen SE. Language-agnostic pharmacovigilant text mining to elicit side effects from clinical notes and hospital medication records. Basic Clin Pharmacol Toxicol 2022; 131:282-293. [PMID: 35834334 PMCID: PMC9541191 DOI: 10.1111/bcpt.13773] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 06/10/2022] [Accepted: 07/09/2022] [Indexed: 11/26/2022]
Abstract
We sought to craft a drug safety signalling pipeline associating latent information in clinical free text with exposures to single drugs and drug pairs. Data arose from 12 secondary and tertiary public hospitals in two Danish regions, comprising approximately half the Danish population. Notes were operationalised with a fastText embedding, based on which we trained 10,720 neural-network models (one for each distinct single-drug/drug-pair exposure) predicting the risk of exposure given an embedding vector. We included 2,905,251 admissions between May 2008 and June 2016, with 13,740,564 distinct drug prescriptions; the median number of prescriptions was 5 (IQR: 3-9) and in 1,184,340 (41%) admissions patients used ≥5 drugs concomitantly. 10,788,259 clinical notes were included, with 179,441,739 tokens retained after pruning. Of 345 single-drug signals reviewed, 28 (8.1%) represented possibly undescribed relationships; 186 (54%) signals were clinically meaningful. 16 (14%) of the 115 drug-pair signals were possible interactions and 2 (1.7%) were known. In conclusion, we built a language-agnostic pipeline for mining associations between free-text information and medication exposure without manual curation, predicting not the likely outcome of a range of exposures, but the likely exposures for outcomes of interest. Our approach may help overcome limitations of text mining methods relying on curated data in English and can help leverage non-English free text for pharmacovigilance.
Collapse
Affiliation(s)
- Benjamin Skov Kaas-Hansen
- Clinical Pharmacology Unit, Zealand University Hospital, Denmark.,NNF Center for Protein Research, University of Copenhagen, Denmark.,Section of Biostatistics, Department of Public Health, University of Copenhagen, Denmark
| | - Davide Placido
- NNF Center for Protein Research, University of Copenhagen, Denmark
| | | | | | | | | | - Søren Brunak
- NNF Center for Protein Research, University of Copenhagen, Denmark
| | - Gesche Jürgens
- Clinical Pharmacology Unit, Zealand University Hospital, Denmark
| | | |
Collapse
|
5
|
Déguilhem A, Malaab J, Talmatkadi M, Renner S, Foulquié P, Fagherazzi G, Loussikian P, Marty T, Mebarki A, Texier N, Schuck S. Identifying Profiles and Symptoms of Patients With Long COVID in France: Data Mining Infodemiology Study Based on Social Media. JMIR INFODEMIOLOGY 2022; 2:e39849. [PMID: 36447795 PMCID: PMC9685517 DOI: 10.2196/39849] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 09/19/2022] [Accepted: 10/01/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND Long COVID-a condition with persistent symptoms post COVID-19 infection-is the first illness arising from social media. In France, the French hashtag #ApresJ20 described symptoms persisting longer than 20 days after contracting COVID-19. Faced with a lack of recognition from medical and official entities, patients formed communities on social media and described their symptoms as long-lasting, fluctuating, and multisystemic. While many studies on long COVID relied on traditional research methods with lengthy processes, social media offers a foundation for large-scale studies with a fast-flowing outburst of data. OBJECTIVE We aimed to identify and analyze Long Haulers' main reported symptoms, symptom co-occurrences, topics of discussion, difficulties encountered, and patient profiles. METHODS Data were extracted based on a list of pertinent keywords from public sites (eg, Twitter) and health-related forums (eg, Doctissimo). Reported symptoms were identified via the MedDRA dictionary, displayed per the volume of posts mentioning them, and aggregated at the user level. Associations were assessed by computing co-occurrences in users' messages, as pairs of preferred terms. Discussion topics were analyzed using the Biterm Topic Modeling; difficulties and unmet needs were explored manually. To identify patient profiles in relation to their symptoms, each preferred term's total was used to create user-level hierarchal clusters. RESULTS Between January 1, 2020, and August 10, 2021, overall, 15,364 messages were identified as originating from 6494 patients of long COVID or their caregivers. Our analyses revealed 3 major symptom co-occurrences: asthenia-dyspnea (102/289, 35.3%), asthenia-anxiety (65/289, 22.5%), and asthenia-headaches (50/289, 17.3%). The main reported difficulties were symptom management (150/424, 35.4% of messages), psychological impact (64/424,15.1%), significant pain (51/424, 12.0%), deterioration in general well-being (52/424, 12.3%), and impact on daily and professional life (40/424, 9.4% and 34/424, 8.0% of messages, respectively). We identified 3 profiles of patients in relation to their symptoms: profile A (n=406 patients) reported exclusively an asthenia symptom; profile B (n=129) expressed anxiety (n=129, 100%), asthenia (n=28, 21.7%), dyspnea (n=15, 11.6%), and ageusia (n=3, 2.3%); and profile C (n=141) described dyspnea (n=141, 100%), and asthenia (n=45, 31.9%). Approximately 49.1% of users (79/161) continued expressing symptoms after more than 3 months post infection, and 20.5% (33/161) after 1 year. CONCLUSIONS Long COVID is a lingering condition that affects people worldwide, physically and psychologically. It impacts Long Haulers' quality of life, everyday tasks, and professional activities. Social media played an undeniable role in raising and delivering Long Haulers' voices and can potentially rapidly provide large volumes of valuable patient-reported information. Since long COVID was a self-titled condition by patients themselves via social media, it is imperative to continuously include their perspectives in related research. Our results can help design patient-centric instruments to be further used in clinical practice to better capture meaningful dimensions of long COVID.
Collapse
Affiliation(s)
| | | | | | | | | | - Guy Fagherazzi
- Deep Digital Phenotyping Research Unit, Department of Precision Health, Luxembourg Institute of Health Strassen Luxembourg
| | | | | | | | | | | |
Collapse
|
6
|
Goadsby P, Ruiz de la Torre E, Constantin L, Amand C. Social Media Listening and Digital Profiling Study of People with Headache and Migraine: A Retrospective Infodemiology Study (Preprint). J Med Internet Res 2022; 25:e40461. [PMID: 37145844 DOI: 10.2196/40461] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 11/15/2022] [Accepted: 02/14/2023] [Indexed: 02/16/2023] Open
Abstract
BACKGROUND There is an unmet need for a better understanding and management of headache, particularly migraine, beyond specialist centers, which may be facilitated using digital technology. OBJECTIVE The objective of this study was to identify where, when, and how people with headache and migraine describe their symptoms and the nonpharmaceutical and medicinal treatments used as indicated on social media. METHODS Social media sources, including Twitter, web-based forums, blogs, YouTube, and review sites, were searched using a predefined search string related to headache and migraine. The real-time data from social media posts were collected retrospectively for a 1-year period from January 1, 2018, to December 31, 2018 (Japan), or a 2-year period from January 1, 2017, to December 31, 2018 (Germany and France). The data were analyzed after collection, using content analysis and audience profiling. RESULTS A total of 3,509,828 social media posts related to headache and migraine were obtained from Japan in 1 year and 146,257 and 306,787 posts from Germany and France, respectively, in 2 years. Among social media sites, Twitter was the most used platform across these countries. Japanese sufferers used specific terminology, such as "tension headaches" or "cluster headaches" (36%), whereas French sufferers even mentioned specific migraine types, such as ocular (7%) and aura (2%). The most detailed posts on headache or migraine were from Germany. The French sufferers explicitly mentioned "headache or migraine attacks" in the "evening (41%) or morning (38%)," whereas Japanese mentioned "morning (48%) or night (27%)" and German sufferers mentioned "evening (22%) or night (41%)." The use of "generic terms" such as medicine, tablet, and pill were prevalent. The most discussed drugs were ibuprofen and naproxen combination (43%) in Japan; ibuprofen (29%) in Germany; and acetylsalicylic acid, paracetamol, and caffeine combination (75%) in France. The top 3 nonpharmaceutical treatments are hydration, caffeinated beverages, and relaxation methods. Of the sufferers, 44% were between 18 and 24 years of age. CONCLUSIONS In this digital era, social media listening studies present an opportunity to provide unguided, self-reported, sufferers' perceptions in the real world. The generation of social media evidence requires appropriate methodology to translate data into scientific information and relevant medical insights. This social media listening study showed country-specific differences in headache and migraine symptoms experienced and in the times of the day and treatments used. Furthermore, this study highlighted the prevalence of social media usage by younger sufferers compared to that by older sufferers.
Collapse
Affiliation(s)
- Peter Goadsby
- NIHR King's Clinical Research Facility, King's College London, London, United Kingdom
| | | | | | | |
Collapse
|
7
|
Voillot P, Riche B, Portafax M, Foulquié P, Gedik A, Barbarot S, Misery L, Héas S, Mebarki A, Texier N, Schück S. Social Media Platforms Listening Study on Atopic Dermatitis: Quantitative and Qualitative Findings. J Med Internet Res 2022; 24:e31140. [PMID: 35089160 PMCID: PMC8838596 DOI: 10.2196/31140] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 10/04/2021] [Accepted: 11/30/2021] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Atopic dermatitis (AD) is a chronic, pruritic, inflammatory disease that occurs most frequently in children but also affects many adults. Social media have become key tools for finding and disseminating medical information. OBJECTIVE The aims of this study were to identify the main themes of discussion, the difficulties encountered by patients with respect to AD, the impact of the pathology on quality of life (QoL; physical, psychological, social, or financial), and to study the perception of patients regarding their treatment. METHODS A retrospective study was carried out by collecting social media posts in French language written by internet users mentioning their experience with AD, their QoL, and their treatments. Messages related to AD discomfort posted between July 1, 2010, and October 23, 2020, were extracted from French-speaking publicly available online forums. Automatic and manual extractions were implemented to create a general corpus and 2 subcorpuses depending on the level of control of the disease. RESULTS A total of 33,115 messages associated with AD were included in the analysis corpus after extraction and cleaning. These messages were posted by 15,857 separate web users, most of them being women younger than 40 years. Tips to manage AD and everyday hygiene/treatments were among the most discussed topics for controlled AD subcorpus, while baby-related topics and therapeutic failure were among the most discussed topics for insufficiently controlled AD subcorpus. QoL was discussed in both subcorpuses with a higher proportion in the controlled AD subcorpus. Treatments and their perception were also discussed by web users. CONCLUSIONS More than just emotional or peer support, patients with AD turn to online forums to discuss their health. Our findings show the need for an intersection between social media and health care and the importance of developing new approaches such as the Atopic Dermatitis Control Tool, which is a patient-related disease severity assessment tool focused on patients with AD.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Laurent Misery
- Centre Hospitalier Universitaire de Brest, Brest, France
| | | | | | | | | |
Collapse
|
8
|
Renner S, Marty T, Khadhar M, Foulquié P, Voillot P, Mebarki A, Montagni I, Texier N, Schück S. A New Method to Extract Health-Related Quality of Life Data From Social Media Testimonies: Algorithm Development and Validation. J Med Internet Res 2022; 24:e31528. [PMID: 35089152 PMCID: PMC8838601 DOI: 10.2196/31528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 10/05/2021] [Accepted: 10/29/2021] [Indexed: 11/13/2022] Open
Abstract
Background Monitoring social media has been shown to be a useful means to capture patients’ opinions and feelings about medical issues, ranging from diseases to treatments. Health-related quality of life (HRQoL) is a useful indicator of overall patients’ health, which can be captured online. Objective This study aimed to describe a social media listening algorithm able to detect the impact of diseases or treatments on specific dimensions of HRQoL based on posts written by patients in social media and forums. Methods Using a web crawler, 19 forums in France were harvested, and messages related to patients’ experience with disease or treatment were specifically collected. The SF-36 (Short Form Health Survey) and EQ-5D (Euro Quality of Life 5 Dimensions) HRQoL surveys were mixed and adapted for a tailored social media listening system. This was carried out to better capture the variety of expression on social media, resulting in 5 dimensions of the HRQoL, which are physical, psychological, activity-based, social, and financial. Models were trained using cross-validation and hyperparameter optimization. Oversampling was used to increase the infrequent dimension: after annotation, SMOTE (synthetic minority oversampling technique) was used to balance the proportions of the dimensions among messages. Results The training set was composed of 1399 messages, randomly taken from a batch of 20,000 health-related messages coming from forums. The algorithm was able to detect a general impact on HRQoL (sensitivity of 0.83 and specificity of 0.74), a physical impact (0.67 and 0.76), a psychic impact (0.82 and 0.60), an activity-related impact (0.73 and 0.78), a relational impact (0.73 and 0.70), and a financial impact (0.79 and 0.74). Conclusions The development of an innovative method to extract health data from social media as real time assessment of patients’ HRQoL is useful to a patient-centered medical care. As a source of real-world data, social media provide a complementary point of view to understand patients’ concerns and unmet needs, as well as shedding light on how diseases and treatments can be a burden in their daily lives.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Ilaria Montagni
- Bordeaux Population Health Research Center, UMR 1219, Bordeaux University, Inserm, Bordeaux, France
| | | | | |
Collapse
|
9
|
Schück S, Roustamal A, Gedik A, Voillot P, Foulquié P, Penfornis C, Job B. Assessing Patient Perceptions and Experiences of Paracetamol in France: Infodemiology Study Using Social Media Data Mining. J Med Internet Res 2021; 23:e25049. [PMID: 34255645 PMCID: PMC8314157 DOI: 10.2196/25049] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 03/24/2021] [Accepted: 04/25/2021] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Individuals frequently turning to social media to discuss medical conditions and medication, sharing their experiences and information and asking questions among themselves. These online discussions can provide valuable insights into individual perceptions of medical treatment, and increasingly, studies are focusing on the potential use of this information to improve health care management. OBJECTIVE The objective of this infodemiology study was to identify social media posts mentioning paracetamol-containing products to develop a better understanding of patients' opinions and perceptions of the drug. METHODS Posts between January 2003 and March 2019 containing at least one mention of paracetamol were extracted from 18 French forums in May 2019 with the use of the Detec't (Kap Code) web crawler. Posts were then analyzed using the automated Detec't tool, which uses machine learning and text mining methods to inspect social media posts and extract relevant content. Posts were classified into groups: Paracetamol Only, Paracetamol and Opioids, Paracetamol and Others, and the Aggregate group. RESULTS Overall, 44,283 posts were analyzed from 20,883 different users. Post volume over the study period showed a peak in activity between 2009 and 2012, as well as a spike in 2017 in the Aggregate group. The number of posts tended to be higher during winter each year. Posts were made predominantly by women (14,897/20,883, 71.34%), with 12.00% (2507/20,883) made by men and 16.67% (3479/20,883) by individuals of unknown gender. The mean age of web users was 39 (SD 19) years. In the Aggregate group, pain was the most common medical concept discussed (22,257/37,863, 58.78%), and paracetamol risk was the most common discussion topic, addressed in 20.36% (8902/43,725) of posts. Doliprane was the most common medication mentioned (14,058/44,283, 31.74%) within the Aggregate group, and tramadol was the most commonly mentioned drug in combination with paracetamol in the Aggregate group (1038/19,587, 5.30%). The most common unapproved indication mentioned within the Paracetamol Only group was fatigue (190/616, with 16.32% positive for an unapproved indication), with reference to dependence made by 1.61% (136/8470) of the web users, accounting for 1.33% (171/12,843) of the posts in the Paracetamol Only group. Dependence mentions in the Paracetamol and Opioids group were provided by 6.94% (248/3576) of web users, accounting for 5.44% (342/6281) of total posts. Reference to overdose was made by 245 web users across 291 posts within the Paracetamol Only group. The most common potential adverse event detected was nausea (306/12843, 2.38%) within the Paracetamol Only group. CONCLUSIONS The use of social media mining with the Detec't tool provided valuable information on the perceptions and understanding of the web users, highlighting areas where providing more information for the general public on paracetamol, as well as other medications, may be of benefit.
Collapse
|
10
|
Schäfer F, Faviez C, Voillot P, Foulquié P, Najm M, Jeanne JF, Fagherazzi G, Schück S, Le Nevé B. Mapping and Modeling of Discussions Related to Gastrointestinal Discomfort in French-Speaking Online Forums: Results of a 15-Year Retrospective Infodemiology Study. J Med Internet Res 2020; 22:e17247. [PMID: 33141087 PMCID: PMC7671840 DOI: 10.2196/17247] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 04/30/2020] [Accepted: 06/25/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Gastrointestinal (GI) discomfort is prevalent and known to be associated with impaired quality of life. Real-world information on factors of GI discomfort and solutions used by people is, however, limited. Social media, including online forums, have been considered a new source of information to examine the health of populations in real-life settings. OBJECTIVE The aims of this retrospective infodemiology study are to identify discussion topics, characterize users, and identify perceived determinants of GI discomfort in web-based messages posted by users of French social media. METHODS Messages related to GI discomfort posted between January 2003 and August 2018 were extracted from 14 French-speaking general and specialized publicly available online forums. Extracted messages were cleaned and deidentified. Relevant medical concepts were determined on the basis of the Medical Dictionary for Regulatory Activities and vernacular terms. The identification of discussion topics was carried out by using a correlated topic model on the basis of the latent Dirichlet allocation. A nonsupervised clustering algorithm was applied to cluster forum users according to the reported symptoms of GI discomfort, discussion topics, and activity on online forums. Users' age and gender were determined by linear regression and application of a support vector machine, respectively, to characterize the identified clusters according to demographic parameters. Perceived factors of GI discomfort were classified by a combined method on the basis of syntactic analysis to identify messages with causality terms and a second topic modeling in a relevant segment of phrases. RESULTS A total of 198,866 messages associated with GI discomfort were included in the analysis corpus after extraction and cleaning. These messages were posted by 36,989 separate web users, most of them being women younger than 40 years. Everyday life, diet, digestion, abdominal pain, impact on the quality of life, and tips to manage stress were among the most discussed topics. Segmentation of users identified 5 clusters corresponding to chronic and acute GI concerns. Diet topic was associated with each cluster, and stress was strongly associated with abdominal pain. Psychological factors, food, and allergens were perceived as the main causes of GI discomfort by web users. CONCLUSIONS GI discomfort is actively discussed by web users. This study reveals a complex relationship between food, stress, and GI discomfort. Our approach has shown that identifying web-based discussion topics associated with GI discomfort and its perceived factors is feasible and can serve as a complementary source of real-world evidence for caregivers.
Collapse
Affiliation(s)
- Florent Schäfer
- Innovation Science and Nutrition, Danone Nutricia Research, Palaiseau, France
| | | | | | | | | | | | - Guy Fagherazzi
- Deep Digital Phenotyping Research Unit, Department of Population Health, Luxembourg Institute of Health, Strassen, Luxembourg.,Center of Research in Epidemiology and Population Health, UMR 1018 Inserm, Institut Gustave Roussy, Paris-Sud Paris-Saclay University, Villejuif, France
| | | | - Boris Le Nevé
- Innovation Science and Nutrition, Danone Nutricia Research, Palaiseau, France
| |
Collapse
|
11
|
Li X, Lin X, Ren H, Guo J. Ontological Organization and Bioinformatic Analysis of Adverse Drug Reactions From Package Inserts: Development and Usability Study. J Med Internet Res 2020; 22:e20443. [PMID: 32706718 PMCID: PMC7400033 DOI: 10.2196/20443] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 06/11/2020] [Accepted: 06/14/2020] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Licensed drugs may cause unexpected adverse reactions in patients, resulting in morbidity, risk of mortality, therapy disruptions, and prolonged hospital stays. Officially approved drug package inserts list the adverse reactions identified from randomized controlled clinical trials with high evidence levels and worldwide postmarketing surveillance. Formal representation of the adverse drug reaction (ADR) enclosed in semistructured package inserts will enable deep recognition of side effects and rational drug use, substantially reduce morbidity, and decrease societal costs. OBJECTIVE This paper aims to present an ontological organization of traceable ADR information extracted from licensed package inserts. In addition, it will provide machine-understandable knowledge for bioinformatics analysis, semantic retrieval, and intelligent clinical applications. METHODS Based on the essential content of package inserts, a generic ADR ontology model is proposed from two dimensions (and nine subdimensions), covering the ADR information and medication instructions. This is followed by a customized natural language processing method programmed with Python to retrieve the relevant information enclosed in package inserts. After the biocuration and identification of retrieved data from the package insert, an ADR ontology is automatically built for further bioinformatic analysis. RESULTS We collected 165 package inserts of quinolone drugs from the National Medical Products Administration and other drug databases in China, and built a specialized ADR ontology containing 2879 classes and 15,711 semantic relations. For each quinolone drug, the reported ADR information and medication instructions have been logically represented and formally organized in an ADR ontology. To demonstrate its usage, the source data were further bioinformatically analyzed. For example, the number of drug-ADR triples and major ADRs associated with each active ingredient were recorded. The 10 ADRs most frequently observed among quinolones were identified and categorized based on the 18 categories defined in the proposal. The occurrence frequency, severity, and ADR mitigation method explicitly stated in package inserts were also analyzed, as well as the top 5 specific populations with contraindications for quinolone drugs. CONCLUSIONS Ontological representation and organization using officially approved information from drug package inserts enables the identification and bioinformatic analysis of adverse reactions caused by a specific drug with regard to predefined ADR ontology classes and semantic relations. The resulting ontology-based ADR knowledge source classifies drug-specific adverse reactions, and supports a better understanding of ADRs and safer prescription of medications.
Collapse
Affiliation(s)
- Xiaoying Li
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Xin Lin
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Huiling Ren
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Jinjing Guo
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
12
|
Mavragani A. Infodemiology and Infoveillance: Scoping Review. J Med Internet Res 2020; 22:e16206. [PMID: 32310818 PMCID: PMC7189791 DOI: 10.2196/16206] [Citation(s) in RCA: 126] [Impact Index Per Article: 25.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 02/05/2020] [Accepted: 02/08/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Web-based sources are increasingly employed in the analysis, detection, and forecasting of diseases and epidemics, and in predicting human behavior toward several health topics. This use of the internet has come to be known as infodemiology, a concept introduced by Gunther Eysenbach. Infodemiology and infoveillance studies use web-based data and have become an integral part of health informatics research over the past decade. OBJECTIVE The aim of this paper is to provide a scoping review of the state-of-the-art in infodemiology along with the background and history of the concept, to identify sources and health categories and topics, to elaborate on the validity of the employed methods, and to discuss the gaps identified in current research. METHODS The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines were followed to extract the publications that fall under the umbrella of infodemiology and infoveillance from the JMIR, PubMed, and Scopus databases. A total of 338 documents were extracted for assessment. RESULTS Of the 338 studies, the vast majority (n=282, 83.4%) were published with JMIR Publications. The Journal of Medical Internet Research features almost half of the publications (n=168, 49.7%), and JMIR Public Health and Surveillance has more than one-fifth of the examined studies (n=74, 21.9%). The interest in the subject has been increasing every year, with 2018 featuring more than one-fourth of the total publications (n=89, 26.3%), and the publications in 2017 and 2018 combined accounted for more than half (n=171, 50.6%) of the total number of publications in the last decade. The most popular source was Twitter with 45.0% (n=152), followed by Google with 24.6% (n=83), websites and platforms with 13.9% (n=47), blogs and forums with 10.1% (n=34), Facebook with 8.9% (n=30), and other search engines with 5.6% (n=19). As for the subjects examined, conditions and diseases with 17.2% (n=58) and epidemics and outbreaks with 15.7% (n=53) were the most popular categories identified in this review, followed by health care (n=39, 11.5%), drugs (n=40, 10.4%), and smoking and alcohol (n=29, 8.6%). CONCLUSIONS The field of infodemiology is becoming increasingly popular, employing innovative methods and approaches for health assessment. The use of web-based sources, which provide us with information that would not be accessible otherwise and tackles the issues arising from the time-consuming traditional methods, shows that infodemiology plays an important role in health informatics research.
Collapse
Affiliation(s)
- Amaryllis Mavragani
- Department of Computing Science and Mathematics, Faculty of Natural Sciences, University of Stirling, Stirling, United Kingdom
| |
Collapse
|
13
|
Harnessing social media data for pharmacovigilance: a review of current state of the art, challenges and future directions. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2019. [DOI: 10.1007/s41060-019-00175-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
14
|
Li F, Liu W, Yu H. Extraction of Information Related to Adverse Drug Events from Electronic Health Record Notes: Design of an End-to-End Model Based on Deep Learning. JMIR Med Inform 2018; 6:e12159. [PMID: 30478023 PMCID: PMC6288593 DOI: 10.2196/12159] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 10/31/2018] [Accepted: 11/09/2018] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Pharmacovigilance and drug-safety surveillance are crucial for monitoring adverse drug events (ADEs), but the main ADE-reporting systems such as Food and Drug Administration Adverse Event Reporting System face challenges such as underreporting. Therefore, as complementary surveillance, data on ADEs are extracted from electronic health record (EHR) notes via natural language processing (NLP). As NLP develops, many up-to-date machine-learning techniques are introduced in this field, such as deep learning and multi-task learning (MTL). However, only a few studies have focused on employing such techniques to extract ADEs. OBJECTIVE We aimed to design a deep learning model for extracting ADEs and related information such as medications and indications. Since extraction of ADE-related information includes two steps-named entity recognition and relation extraction-our second objective was to improve the deep learning model using multi-task learning between the two steps. METHODS We employed the dataset from the Medication, Indication and Adverse Drug Events (MADE) 1.0 challenge to train and test our models. This dataset consists of 1089 EHR notes of cancer patients and includes 9 entity types such as Medication, Indication, and ADE and 7 types of relations between these entities. To extract information from the dataset, we proposed a deep-learning model that uses a bidirectional long short-term memory (BiLSTM) conditional random field network to recognize entities and a BiLSTM-Attention network to extract relations. To further improve the deep-learning model, we employed three typical MTL methods, namely, hard parameter sharing, parameter regularization, and task relation learning, to build three MTL models, called HardMTL, RegMTL, and LearnMTL, respectively. RESULTS Since extraction of ADE-related information is a two-step task, the result of the second step (ie, relation extraction) was used to compare all models. We used microaveraged precision, recall, and F1 as evaluation metrics. Our deep learning model achieved state-of-the-art results (F1=65.9%), which is significantly higher than that (F1=61.7%) of the best system in the MADE1.0 challenge. HardMTL further improved the F1 by 0.8%, boosting the F1 to 66.7%, whereas RegMTL and LearnMTL failed to boost the performance. CONCLUSIONS Deep learning models can significantly improve the performance of ADE-related information extraction. MTL may be effective for named entity recognition and relation extraction, but it depends on the methods, data, and other factors. Our results can facilitate research on ADE detection, NLP, and machine learning.
Collapse
Affiliation(s)
- Fei Li
- Department of Computer Science, University of Massachusetts Lowell, Lowell, MA, United States
- Center for Healthcare Organization and Implementation Research, Bedford Veterans Affairs Medical Center, Bedford, MA, United States
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA, United States
| | - Weisong Liu
- Department of Computer Science, University of Massachusetts Lowell, Lowell, MA, United States
- Center for Healthcare Organization and Implementation Research, Bedford Veterans Affairs Medical Center, Bedford, MA, United States
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA, United States
| | - Hong Yu
- Department of Computer Science, University of Massachusetts Lowell, Lowell, MA, United States
- Center for Healthcare Organization and Implementation Research, Bedford Veterans Affairs Medical Center, Bedford, MA, United States
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA, United States
- School of Computer Science, University of Massachusetts, Amherst, MA, United States
| |
Collapse
|
15
|
Kürzinger ML, Schück S, Texier N, Abdellaoui R, Faviez C, Pouget J, Zhang L, Tcherny-Lessenot S, Lin S, Juhaeri J. Web-Based Signal Detection Using Medical Forums Data in France: Comparative Analysis. J Med Internet Res 2018; 20:e10466. [PMID: 30459145 PMCID: PMC6280030 DOI: 10.2196/10466] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 06/29/2018] [Accepted: 06/29/2018] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND While traditional signal detection methods in pharmacovigilance are based on spontaneous reports, the use of social media is emerging. The potential strength of Web-based data relies on their volume and real-time availability, allowing early detection of signals of disproportionate reporting (SDRs). OBJECTIVE This study aimed (1) to assess the consistency of SDRs detected from patients' medical forums in France compared with those detected from the traditional reporting systems and (2) to assess the ability of SDRs in identifying earlier than the traditional reporting systems. METHODS Messages posted on patients' forums between 2005 and 2015 were used. We retained 8 disproportionality definitions. Comparison of SDRs from the forums with SDRs detected in VigiBase was done by describing the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, receiver operating characteristics curve, and the area under the curve (AUC). The time difference in months between the detection dates of SDRs from the forums and VigiBase was provided. RESULTS The comparison analysis showed that the sensitivity ranged from 29% to 50.6%, the specificity from 86.1% to 95.5%, the PPV from 51.2% to 75.4%, the NPV from 68.5% to 91.6%, and the accuracy from 68% to 87.7%. The AUC reached 0.85 when using the metric empirical Bayes geometric mean. Up to 38% (12/32) of the SDRs were detected earlier in the forums than that in VigiBase. CONCLUSIONS The specificity, PPV, and NPV were high. The overall performance was good, showing that data from medical forums may be a valuable source for signal detection. In total, up to 38% (12/32) of the SDRs could have been detected earlier, thus, ensuring the increased safety of patients. Further enhancements are needed to investigate the reliability and validation of patients' medical forums worldwide, the extension of this analysis to all possible drugs or at least to a wider selection of drugs, as well as to further assess performance against established signals.
Collapse
Affiliation(s)
| | | | | | | | | | - Julie Pouget
- Information Technology and Solutions, Sanofi, Lyon, France
| | - Ling Zhang
- Global Pharmacovigilance, Sanofi, Bridgewater, NJ, United States
| | | | - Stephen Lin
- Global Pharmacovigilance, Sanofi, Bridgewater, NJ, United States
| | - Juhaeri Juhaeri
- Epidemiology and Benefit Risk Evaluation, Sanofi, Bridgewater, NJ, United States
| |
Collapse
|
16
|
Munkhdalai T, Liu F, Yu H. Clinical Relation Extraction Toward Drug Safety Surveillance Using Electronic Health Record Narratives: Classical Learning Versus Deep Learning. JMIR Public Health Surveill 2018; 4:e29. [PMID: 29695376 PMCID: PMC5943628 DOI: 10.2196/publichealth.9361] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 02/03/2018] [Accepted: 02/05/2018] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Medication and adverse drug event (ADE) information extracted from electronic health record (EHR) notes can be a rich resource for drug safety surveillance. Existing observational studies have mainly relied on structured EHR data to obtain ADE information; however, ADEs are often buried in the EHR narratives and not recorded in structured data. OBJECTIVE To unlock ADE-related information from EHR narratives, there is a need to extract relevant entities and identify relations among them. In this study, we focus on relation identification. This study aimed to evaluate natural language processing and machine learning approaches using the expert-annotated medical entities and relations in the context of drug safety surveillance, and investigate how different learning approaches perform under different configurations. METHODS We have manually annotated 791 EHR notes with 9 named entities (eg, medication, indication, severity, and ADEs) and 7 different types of relations (eg, medication-dosage, medication-ADE, and severity-ADE). Then, we explored 3 supervised machine learning systems for relation identification: (1) a support vector machines (SVM) system, (2) an end-to-end deep neural network system, and (3) a supervised descriptive rule induction baseline system. For the neural network system, we exploited the state-of-the-art recurrent neural network (RNN) and attention models. We report the performance by macro-averaged precision, recall, and F1-score across the relation types. RESULTS Our results show that the SVM model achieved the best average F1-score of 89.1% on test data, outperforming the long short-term memory (LSTM) model with attention (F1-score of 65.72%) as well as the rule induction baseline system (F1-score of 7.47%) by a large margin. The bidirectional LSTM model with attention achieved the best performance among different RNN models. With the inclusion of additional features in the LSTM model, its performance can be boosted to an average F1-score of 77.35%. CONCLUSIONS It shows that classical learning models (SVM) remains advantageous over deep learning models (RNN variants) for clinical relation identification, especially for long-distance intersentential relations. However, RNNs demonstrate a great potential of significant improvement if more training data become available. Our work is an important step toward mining EHRs to improve the efficacy of drug safety surveillance. Most importantly, the annotated data used in this study will be made publicly available, which will further promote drug safety research in the community.
Collapse
Affiliation(s)
- Tsendsuren Munkhdalai
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, United States
| | - Feifan Liu
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, United States
| | - Hong Yu
- Department of Computer Science, University of Massachusetts Lowell, Lowell, MA, United States.,The Bedford Veterans Affairs Medical Center, Bedford, MA, United States
| |
Collapse
|
17
|
Abdellaoui R, Foulquié P, Texier N, Faviez C, Burgun A, Schück S. Detection of Cases of Noncompliance to Drug Treatment in Patient Forum Posts: Topic Model Approach. J Med Internet Res 2018. [PMID: 29540337 PMCID: PMC5874436 DOI: 10.2196/jmir.9222] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Background Medication nonadherence is a major impediment to the management of many health conditions. A better understanding of the factors underlying noncompliance to treatment may help health professionals to address it. Patients use peer-to-peer virtual communities and social media to share their experiences regarding their treatments and diseases. Using topic models makes it possible to model themes present in a collection of posts, thus to identify cases of noncompliance. Objective The aim of this study was to detect messages describing patients’ noncompliant behaviors associated with a drug of interest. Thus, the objective was the clustering of posts featuring a homogeneous vocabulary related to nonadherent attitudes. Methods We focused on escitalopram and aripiprazole used to treat depression and psychotic conditions, respectively. We implemented a probabilistic topic model to identify the topics that occurred in a corpus of messages mentioning these drugs, posted from 2004 to 2013 on three of the most popular French forums. Data were collected using a Web crawler designed by Kappa Santé as part of the Detec’t project to analyze social media for drug safety. Several topics were related to noncompliance to treatment. Results Starting from a corpus of 3650 posts related to an antidepressant drug (escitalopram) and 2164 posts related to an antipsychotic drug (aripiprazole), the use of latent Dirichlet allocation allowed us to model several themes, including interruptions of treatment and changes in dosage. The topic model approach detected cases of noncompliance behaviors with a recall of 98.5% (272/276) and a precision of 32.6% (272/844). Conclusions Topic models enabled us to explore patients’ discussions on community websites and to identify posts related with noncompliant behaviors. After a manual review of the messages in the noncompliance topics, we found that noncompliance to treatment was present in 6.17% (276/4469) of the posts.
Collapse
Affiliation(s)
- Redhouane Abdellaoui
- Unité de Mixte de Recherche 1138 Team 22, Institut National de la Santé et de la Recherche Médicale / Université Pierre et Marie Curie, Paris, France
| | | | | | | | - Anita Burgun
- Unité de Mixte de Recherche 1138 Team 22, Institut National de la Santé et de la Recherche Médicale / Université Pierre et Marie Curie, Paris, France.,Medical Informatics, Hôpital Européen Georges-Pompidou, Assistance Publique-Hôpitaux de Paris, Paris, France
| | | |
Collapse
|
18
|
Bousquet C, Dahamna B, Guillemin-Lanne S, Darmoni SJ, Faviez C, Huot C, Katsahian S, Leroux V, Pereira S, Richard C, Schück S, Souvignet J, Lillo-Le Louët A, Texier N. The Adverse Drug Reactions from Patient Reports in Social Media Project: Five Major Challenges to Overcome to Operationalize Analysis and Efficiently Support Pharmacovigilance Process. JMIR Res Protoc 2017; 6:e179. [PMID: 28935617 PMCID: PMC5629348 DOI: 10.2196/resprot.6463] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Revised: 06/19/2017] [Accepted: 07/12/2017] [Indexed: 11/13/2022] Open
Abstract
Background Adverse drug reactions (ADRs) are an important cause of morbidity and mortality. Classical Pharmacovigilance process is limited by underreporting which justifies the current interest in new knowledge sources such as social media. The Adverse Drug Reactions from Patient Reports in Social Media (ADR-PRISM) project aims to extract ADRs reported by patients in these media. We identified 5 major challenges to overcome to operationalize the analysis of patient posts: (1) variable quality of information on social media, (2) guarantee of data privacy, (3) response to pharmacovigilance expert expectations, (4) identification of relevant information within Web pages, and (5) robust and evolutive architecture. Objective This article aims to describe the current state of advancement of the ADR-PRISM project by focusing on the solutions we have chosen to address these 5 major challenges. Methods In this article, we propose methods and describe the advancement of this project on several aspects: (1) a quality driven approach for selecting relevant social media for the extraction of knowledge on potential ADRs, (2) an assessment of ethical issues and French regulation for the analysis of data on social media, (3) an analysis of pharmacovigilance expert requirements when reviewing patient posts on the Internet, (4) an extraction method based on natural language processing, pattern based matching, and selection of relevant medical concepts in reference terminologies, and (5) specifications of a component-based architecture for the monitoring system. Results Considering the 5 major challenges, we (1) selected a set of 21 validated criteria for selecting social media to support the extraction of potential ADRs, (2) proposed solutions to guarantee data privacy of patients posting on Internet, (3) took into account pharmacovigilance expert requirements with use case diagrams and scenarios, (4) built domain-specific knowledge resources embeding a lexicon, morphological rules, context rules, semantic rules, syntactic rules, and post-analysis processing, and (5) proposed a component-based architecture that allows storage of big data and accessibility to third-party applications through Web services. Conclusions We demonstrated the feasibility of implementing a component-based architecture that allows collection of patient posts on the Internet, near real-time processing of those posts including annotation, and storage in big data structures. In the next steps, we will evaluate the posts identified by the system in social media to clarify the interest and relevance of such approach to improve conventional pharmacovigilance processes based on spontaneous reporting.
Collapse
Affiliation(s)
- Cedric Bousquet
- Laboratoire d'Informatique Médicale et d'Ingénieurie des Connaissances en e-Santé, U1142, Institut National de la Santé et de la Recherche Médicale, Paris, France.,Service de Santé Publique et de l'Information Médicale, Centre Hospitalier Universitaire de Saint Etienne, Saint-Etienne, France
| | - Badisse Dahamna
- Department of Biomedical Informatics, Rouen University Hospital, Rouen, France
| | | | - Stefan J Darmoni
- Laboratoire d'Informatique Médicale et d'Ingénieurie des Connaissances en e-Santé, U1142, Institut National de la Santé et de la Recherche Médicale, Paris, France.,Department of Biomedical Informatics, Rouen University Hospital, Rouen, France
| | | | | | - Sandrine Katsahian
- Unité mixte de recherche 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Paris, France
| | | | | | | | | | - Julien Souvignet
- Laboratoire d'Informatique Médicale et d'Ingénieurie des Connaissances en e-Santé, U1142, Institut National de la Santé et de la Recherche Médicale, Paris, France
| | - Agnès Lillo-Le Louët
- Assistance Publique-Hôpitaux de Paris, Hôpital Européen Georges Pompidou, Centre Régional de Pharmacovigilance, Paris, France
| | | |
Collapse
|