1
|
Wu D, Shead H, Ren Y, Raynor P, Tao Y, Villanueva H, Hung P, Li X, Brookshire RG, Eichelberger K, Guille C, Litwin AH, Olatosi B. Uncovering the Complexity of Perinatal Polysubstance Use Disclosure Patterns on X: Mixed Methods Study. J Med Internet Res 2024; 26:e53171. [PMID: 39302713 PMCID: PMC11452753 DOI: 10.2196/53171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 05/06/2024] [Accepted: 06/11/2024] [Indexed: 09/22/2024] Open
Abstract
BACKGROUND According to the Morbidity and Mortality Weekly Report, polysubstance use among pregnant women is prevalent, with 38.2% of those who consume alcohol also engaging in the use of one or more additional substances. However, the underlying mechanisms, contexts, and experiences of polysubstance use are unclear. Organic information is abundant on social media such as X (formerly Twitter). Traditional quantitative and qualitative methods, as well as natural language processing techniques, can be jointly used to derive insights into public opinions, sentiments, and clinical and public health policy implications. OBJECTIVE Based on perinatal polysubstance use (PPU) data that we extracted on X from May 1, 2019, to October 31, 2021, we proposed two primary research questions: (1) What is the overall trend and sentiment of PPU discussions on X? (2) Are there any distinct patterns in the discussion trends of PPU-related tweets? If so, what are the implications for perinatal care and associated public health policies? METHODS We used X's application programming interface to extract >6 million raw tweets worldwide containing ≥2 prenatal health- and substance-related keywords provided by our clinical team. After removing all non-English-language tweets, non-US tweets, and US tweets without disclosed geolocations, we obtained 4848 PPU-related US tweets. We then evaluated them using a mixed methods approach. The quantitative analysis applied frequency, trend analysis, and several natural language processing techniques such as sentiment analysis to derive statistics to preview the corpus. To further understand semantics and clinical insights among these tweets, we conducted an in-depth thematic content analysis with a random sample of 500 PPU-related tweets with a satisfying κ score of 0.7748 for intercoder reliability. RESULTS Our quantitative analysis indicates the overall trends, bigram and trigram patterns, and negative sentiments were more dominant in PPU tweets (2490/4848, 51.36%) than in the non-PPU sample (1323/4848, 27.29%). Paired polysubstance use (4134/4848, 85.27%) was the most common, with the combination alcohol and drugs identified as the most mentioned. From the qualitative analysis, we identified 3 main themes: nonsubstance, single substance, and polysubstance, and 4 subthemes to contextualize the rationale of underlying PPU behaviors: lifestyle, perceptions of others' drug use, legal implications, and public health. CONCLUSIONS This study identified underexplored, emerging, and important topics related to perinatal PPU, with significant stigmas and legal ramifications discussed on X. Overall, public sentiments on PPU were mixed, encompassing negative (2490/4848, 51.36%), positive (1884/4848, 38.86%), and neutral (474/4848, 9.78%) sentiments. The leading substances in PPU were alcohol and drugs, and the normalization of PPU discussed on X is becoming more prevalent. Thus, this study provides valuable insights to further understand the complexity of PPU and its implications for public health practitioners and policy makers to provide proper access and support to individuals with PPU.
Collapse
Affiliation(s)
- Dezhi Wu
- Department of Integrated Information Technology, University of South Carolina, Columbia, SC, United States
| | - Hannah Shead
- Department of Mathematics, Augusta University, Augusta, GA, United States
| | - Yang Ren
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States
| | - Phyllis Raynor
- College of Nursing, University of South Carolina, Columbia, SC, United States
| | - Youyou Tao
- Department of Information Systems and Business Analytics, Loyola Marymount University, Los Angeles, CA, United States
| | - Harvey Villanueva
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States
| | - Peiyin Hung
- Arnold School of Public Health, University of South Carolina, Columbia, SC, United States
| | - Xiaoming Li
- Arnold School of Public Health, University of South Carolina, Columbia, SC, United States
| | - Robert G Brookshire
- Department of Integrated Information Technology, University of South Carolina, Columbia, SC, United States
| | - Kacey Eichelberger
- School of Medicine Greenville, University of South Carolina, Greenville, SC, United States
- Prisma Health, Greenville, SC, United States
| | - Constance Guille
- College of Medicine, Medical University of South Carolina, Charleston, SC, United States
| | - Alain H Litwin
- School of Medicine Greenville, University of South Carolina, Greenville, SC, United States
- Prisma Health, Greenville, SC, United States
| | - Bankole Olatosi
- Arnold School of Public Health, University of South Carolina, Columbia, SC, United States
| |
Collapse
|
2
|
Golder S, Klein A, O'Connor K, Wang Y, Gonzalez‐Hernandez G. Social Media Posts on Statins: What Can We Learn About Patient Experiences and Perspectives? J Am Heart Assoc 2024; 13:e033992. [PMID: 38533982 PMCID: PMC11179751 DOI: 10.1161/jaha.124.033992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 02/02/2024] [Indexed: 03/28/2024]
Affiliation(s)
- Su Golder
- University of YorkYorkUnited Kingdom
| | - Ari Klein
- University of PennsylvaniaPhiladelphiaPAUSA
| | | | - Yunwen Wang
- Cedars Sinai Medical CenterWest HollywoodCAUSA
| | | |
Collapse
|
3
|
Helbich M, Zeng Y, Sarker A. Area-level Measures of the Social Environment: Operationalization, Pitfalls, and Ways Forward. Curr Top Behav Neurosci 2024. [PMID: 38453766 DOI: 10.1007/7854_2024_464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Abstract
People's mental health is intertwined with the social environment in which they reside. This chapter explores approaches for quantifying the area-level social environment, focusing specifically on socioeconomic deprivation and social fragmentation. We discuss census data and administrative units, egocentric and ecometric approaches, neighborhood audits, social media data, and street view-based assessments. We close the chapter by discussing possible paths forward from associations between social environments and health to establishing causality, including longitudinal research designs and time-series social environmental indices.
Collapse
Affiliation(s)
- Marco Helbich
- Department of Human Geography and Spatial Planning, Faculty of Geosciences, Utrecht University, Utrecht, The Netherlands.
- Health and Quality of Life in a Green and Sustainable Environment Research Group, Strategic Research and Innovation Program for the Development of MU - Plovdiv, Medical University of Plovdiv, Plovdiv, Bulgaria.
- Environmental Health Division, Research Institute at Medical University of Plovdiv, Medical University of Plovdiv, Plovdiv, Bulgaria.
| | - Yi Zeng
- Department of Human Geography and Spatial Planning, Faculty of Geosciences, Utrecht University, Utrecht, The Netherlands
| | - Abeed Sarker
- Emory University School of Medicine, Atlanta, GA, USA
| |
Collapse
|
4
|
Sarsam SM, Alzahrani AI, Al-Samarraie H. Early-stage pregnancy recognition on microblogs: Machine learning and lexicon-based approaches. Heliyon 2023; 9:e20132. [PMID: 37809524 PMCID: PMC10559919 DOI: 10.1016/j.heliyon.2023.e20132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 09/02/2023] [Accepted: 09/12/2023] [Indexed: 10/10/2023] Open
Abstract
Pregnancy carries high medical and psychosocial risks that could lead pregnant women to experience serious health consequences. Providing protective measures for pregnant women is one of the critical tasks during the pregnancy period. This study proposes an emotion-based mechanism to detect the early stage of pregnancy using real-time data from Twitter. Pregnancy-related emotions (e.g., anger, fear, sadness, joy, and surprise) and polarity (positive and negative) were extracted from users' tweets using NRC Affect Intensity Lexicon and SentiStrength techniques. Then, pregnancy-related terms were extracted and mapped with pregnancy-related sentiments using part-of-speech tagging and association rules mining techniques. The results showed that pregnancy tweets contained high positivity, as well as significant amounts of joy, sadness, and fear. The classification results demonstrated the possibility of using users' sentiments for early-stage pregnancy recognition on microblogs. The proposed mechanism offers valuable insights to healthcare decision-makers, allowing them to develop a comprehensive understanding of users' health status based on social media posts.
Collapse
Affiliation(s)
- Samer Muthana Sarsam
- School of Strategy and Leadership, Coventry University, Coventry, United Kingdom
| | - Ahmed Ibrahim Alzahrani
- Computer Science Department, Community College, King Saud University, Riyadh, 11437, Saudi Arabia
| | - Hosam Al-Samarraie
- School of Design, University of Leeds, Leeds, United Kingdom
- Centre for Instructional Technology and Multimedia, Universiti Sains Malaysia, Penang, Malaysia
| |
Collapse
|
5
|
Sarker A, Lakamana S, Guo Y, Ge Y, Leslie A, Okunromade O, Gonzalez-Polledo E, Perrone J, McKenzie-Brown AM. #ChronicPain: Automated Building of a Chronic Pain Cohort from Twitter Using Machine Learning. HEALTH DATA SCIENCE 2023; 3:0078. [PMID: 38333075 PMCID: PMC10852024 DOI: 10.34133/hds.0078] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 06/12/2023] [Indexed: 02/10/2024]
Abstract
Background Due to the high burden of chronic pain, and the detrimental public health consequences of its treatment with opioids, there is a high-priority need to identify effective alternative therapies. Social media is a potentially valuable resource for knowledge about self-reported therapies by chronic pain sufferers. Methods We attempted to (a) verify the presence of large-scale chronic pain-related chatter on Twitter, (b) develop natural language processing and machine learning methods for automatically detecting self-disclosures, (c) collect longitudinal data posted by them, and (d) semiautomatically analyze the types of chronic pain-related information reported by them. We collected data using chronic pain-related hashtags and keywords and manually annotated 4,998 posts to indicate if they were self-reports of chronic pain experiences. We trained and evaluated several state-of-the-art supervised text classification models and deployed the best-performing classifier. We collected all publicly available posts from detected cohort members and conducted manual and natural language processing-driven descriptive analyses. Results Interannotator agreement for the binary annotation was 0.82 (Cohen's kappa). The RoBERTa model performed best (F1 score: 0.84; 95% confidence interval: 0.80 to 0.89), and we used this model to classify all collected unlabeled posts. We discovered 22,795 self-reported chronic pain sufferers and collected over 3 million of their past posts. Further analyses revealed information about, but not limited to, alternative treatments, patient sentiments about treatments, side effects, and self-management strategies. Conclusion Our social media based approach will result in an automatically growing large cohort over time, and the data can be leveraged to identify effective opioid-alternative therapies for diverse chronic pain types.
Collapse
Affiliation(s)
- Abeed Sarker
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, USA
| | - Sahithi Lakamana
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, USA
| | - Yuting Guo
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, USA
| | - Yao Ge
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, USA
| | - Abimbola Leslie
- Department of Radiology, Robert Larner College of Medicine, University of Vermont, Burlington, VT, USA
| | - Omolola Okunromade
- Department of Health Policy and Community Health, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA, USA
| | | | - Jeanmarie Perrone
- Department of Emergency Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | |
Collapse
|
6
|
Golder S, McRobbie-Johnson ACE, Klein A, Polite FG, Gonzalez Hernandez G. Social media and COVID-19 vaccination hesitancy during pregnancy: a mixed methods analysis. BJOG 2023; 130:750-758. [PMID: 37078279 DOI: 10.1111/1471-0528.17481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 01/13/2023] [Accepted: 01/28/2023] [Indexed: 04/21/2023]
Abstract
OBJECTIVE To evaluate the reasons for COVID-19 vaccine hesitancy during pregnancy. DESIGN We used regular expressions to identify publicly available social media posts from pregnant people expressing at least one reason for their decision not to accept COVID-19 vaccine. SETTING Two social media platforms - WhatToExpect and Twitter. SAMPLE A total of 945 pregnant people in WhatToExpect (1017 posts) and 345 pregnant people in Twitter (435 tweets). METHODS Two annotators manually coded posts according to the Scientific Advisory Group for Emergencies (SAGE) working group's 3Cs vaccine hesitancy model (confidence, complacency and convenience barriers). Within each 3Cs we created subthemes that emerged from the data. MAIN OUTCOME MEASURES Subthemes were derived according to the people's posting own words. RESULTS Safety concerns were most common and largely linked to the perceived speed at which the vaccine was created and the lack of data about its safety in pregnancy. This led to a preference to wait until after the baby was born or to take other precautions instead. Complacency surrounded a belief that they are young and healthy or already had COVID-19. Misinformation led to false safety and efficacy allegations, or even conspiracy theories, and fed into creating confidence and complacency barriers. Convenience barriers (such as availability) were uncommon. CONCLUSION The information in this study can be used to highlight the questions, fears and hesitations pregnant people have about the COVID-19 vaccine. Highlighting these hesitations can help public health campaigns and improve communication between healthcare professionals and patients.
Collapse
Affiliation(s)
- S Golder
- Department of Health Sciences, University of York, York, UK
| | - A C E McRobbie-Johnson
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - A Klein
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - F G Polite
- Department of Obstetrics & Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - G Gonzalez Hernandez
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, West Hollywood, California, USA
| |
Collapse
|
7
|
Patel N, Pokras S, Ferma J, Casey V, Manuguid F, Culver K, Bauer S. Treatment patterns and outcomes in patients with metastatic synovial sarcoma in France, Germany, Italy, Spain and the United Kingdom. Future Oncol 2023. [PMID: 37139794 DOI: 10.2217/fon-2022-1005] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2023] Open
Abstract
Aim: Describing the treatment patterns, outcomes by line of treatment (LOT), and healthcare resource utilization (HCRU) in patients with metastatic synovial sarcoma (mSS). Patients & methods: In this descriptive, non-interventional, retrospective cohort study, physicians from five European countries reported on patients with recent pharmacological treatment for mSS. Results: Among 296 patients with mSS, 86.1, 38.9 and 8.4% received 1 LOT (1L), 2 LOTs (2L) and 3+ LOTs (L3+), respectively. Common regimens were doxorubicin/ifosfamide-based (37.4%) for 1L and trabectedin-based for 2L (29.7%). For 1L, median time to next treatment was 13.1 and 6.0 months for living and deceased patients, respectively. Median OS was 22.0, 6.0 and 4.9 months in all patients, 2L and 3L, respectively. HCRU data showed median one inpatient hospital admission, 3 days in hospital and four outpatient visits yearly. Conclusion: This large-scale study documents high unmet needs in patients previously treated for mSS and for more effective therapies.
Collapse
Affiliation(s)
- Nashita Patel
- Global Value, Evidence & Outcomes, GSK, Brentford, London, TW8 9GS, UK
| | - Shibani Pokras
- Value Evidence & Outcomes, GSK, Collegeville, PA 19426, USA
| | - Jane Ferma
- Data Science & Advanced Analytics, IQVIA, London, N1 9JY, UK
| | - Vicky Casey
- Data Science & Advanced Analytics, IQVIA, London, N1 9JY, UK
| | - Fil Manuguid
- Data Science & Advanced Analytics, IQVIA, London, N1 9JY, UK
| | - Ken Culver
- Global Medical Affairs, GSK, Collegeville, PA 19426, USA
| | - Sebastian Bauer
- Department of Medical Oncology & Sarcoma Center, University Hospital, University of Duisburg-Essen, Essen, 45147, Germany
| |
Collapse
|
8
|
Klein AZ, Kunatharaju S, O'Connor K, Gonzalez-Hernandez G. Pregex: Rule-Based Detection and Extraction of Twitter Data in Pregnancy. J Med Internet Res 2023; 25:e40569. [PMID: 36757756 PMCID: PMC9951068 DOI: 10.2196/40569] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 09/02/2022] [Accepted: 01/22/2023] [Indexed: 01/23/2023] Open
Affiliation(s)
- Ari Z Klein
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Shriya Kunatharaju
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Karen O'Connor
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | | |
Collapse
|
9
|
Weissenbacher D, O’Connor K, Rawal S, Zhang Y, Tsai RTH, Miller T, Xu D, Anderson C, Liu B, Han Q, Zhang J, Kulev I, Köprü B, Rodriguez-Esteban R, Ozkirimli E, Ayach A, Roller R, Piccolo S, Han P, Vydiswaran VGV, Tekumalla R, Banda JM, Bagherzadeh P, Bergler S, Silva JF, Almeida T, Martinez P, Rivera-Zavala R, Wang CK, Dai HJ, Alberto Robles Hernandez L, Gonzalez-Hernandez G. Automatic Extraction of Medication Mentions from Tweets-Overview of the BioCreative VII Shared Task 3 Competition. Database (Oxford) 2023; 2023:baac108. [PMID: 36734300 PMCID: PMC9896308 DOI: 10.1093/database/baac108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 10/28/2022] [Accepted: 12/13/2022] [Indexed: 02/04/2023]
Abstract
This study presents the outcomes of the shared task competition BioCreative VII (Task 3) focusing on the extraction of medication names from a Twitter user's publicly available tweets (the user's 'timeline'). In general, detecting health-related tweets is notoriously challenging for natural language processing tools. The main challenge, aside from the informality of the language used, is that people tweet about any and all topics, and most of their tweets are not related to health. Thus, finding those tweets in a user's timeline that mention specific health-related concepts such as medications requires addressing extreme imbalance. Task 3 called for detecting tweets in a user's timeline that mentions a medication name and, for each detected mention, extracting its span. The organizers made available a corpus consisting of 182 049 tweets publicly posted by 212 Twitter users with all medication mentions manually annotated. The corpus exhibits the natural distribution of positive tweets, with only 442 tweets (0.2%) mentioning a medication. This task was an opportunity for participants to evaluate methods that are robust to class imbalance beyond the simple lexical match. A total of 65 teams registered, and 16 teams submitted a system run. This study summarizes the corpus created by the organizers and the approaches taken by the participating teams for this challenge. The corpus is freely available at https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-3/. The methods and the results of the competing systems are analyzed with a focus on the approaches taken for learning from class-imbalanced data.
Collapse
Affiliation(s)
- Davy Weissenbacher
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Karen O’Connor
- DBEI, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Siddharth Rawal
- DBEI, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yu Zhang
- Department of Computer Science and Information Engineering, National Central University, No. 300, Zhongda Rd, Zhongli District, Taoyuan 320, Taiwan
| | - Richard Tzong-Han Tsai
- Department of Computer Science and Information Engineering, National Central University, No. 300, Zhongda Rd, Zhongli District, Taoyuan 320, Taiwan
- IoX Center, National Taiwan University, Da’an District, Section 4, Roosevelt Rd, No. 1, Barry Lam Hall, Taipei 106, Taiwan
- Research Center for Humanities and Social Sciences, Academia Sinica, No. 128, Section 2, Academia Rd, Nangang District, Taipei 115, Taiwan
| | - Timothy Miller
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Dongfang Xu
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | | | - Bo Liu
- NVIDIA, Santa Clara, CA, USA
| | - Qing Han
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Igor Kulev
- Data and Analytics Chapter, F. Hoffmann-La Roche Ltd, Switzerland
| | - Berkay Köprü
- Data and Analytics Chapter, F. Hoffmann-La Roche Ltd, Switzerland
| | - Raul Rodriguez-Esteban
- Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Switzerland
| | - Elif Ozkirimli
- Data and Analytics Chapter, F. Hoffmann-La Roche Ltd, Switzerland
| | - Ammer Ayach
- Speech and Language Technology Lab, DFKI, Berlin, Germany
| | - Roland Roller
- Speech and Language Technology Lab, DFKI, Berlin, Germany
| | - Stephen Piccolo
- Department of Biology, Brigham Young University, Provo, UT, USA
| | - Peijin Han
- Department of Computational Medicine and Bioinformatics, Medical School, University of Michigan, Ann Arbor, MI, USA
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, Medical School, University of Michigan, Ann Arbor, MI, USA
- School of Information, University of Michigan, Ann Arbor, MI, USA
| | - Ramya Tekumalla
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Juan M Banda
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | | | | | - João F Silva
- DETI, Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Portugal
| | - Tiago Almeida
- DETI, Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Portugal
- Department of Computation, University of A Coruña, Spain
| | - Paloma Martinez
- Computer Science and Engineering Department, Universidad Carlos III de Madrid, Madrid, Spain
| | - Renzo Rivera-Zavala
- Computer Science and Engineering Department, Universidad Carlos III de Madrid, Madrid, Spain
| | - Chen-Kai Wang
- Big Data Laboratory, Chunghwa Telecom Laboratories, Taoyuan, Taiwan
- Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Hong-Jie Dai
- Department of Electrical Engineering, College of Electrical Engineering and Computer Science, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
| | | | | |
Collapse
|
10
|
Klein AZ, O'Connor K, Levine LD, Gonzalez-Hernandez G. Using Twitter Data for Cohort Studies of Drug Safety in Pregnancy: Proof-of-concept With β-Blockers. JMIR Form Res 2022; 6:e36771. [PMID: 35771614 PMCID: PMC9284350 DOI: 10.2196/36771] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 04/27/2022] [Accepted: 06/06/2022] [Indexed: 01/26/2023] Open
Abstract
Background Despite the fact that medication is taken during more than 90% of pregnancies, the fetal risk for most medications is unknown, and the majority of medications have no data regarding safety in pregnancy. Objective Using β-blockers as a proof-of-concept, the primary objective of this study was to assess the utility of Twitter data for a cohort study design—in particular, whether we could identify (1) Twitter users who have posted tweets reporting that they took medication during pregnancy and (2) their associated pregnancy outcomes. Methods We searched for mentions of β-blockers in 2.75 billion tweets posted by 415,690 users who announced their pregnancy on Twitter. We manually reviewed the matching tweets to first determine if the user actually took the β-blocker mentioned in the tweet. Then, to help determine if the β-blocker was taken during pregnancy, we used the time stamp of the tweet reporting intake and drew upon an automated natural language processing (NLP) tool that estimates the date of the user’s prenatal time period. For users who posted tweets indicating that they took or may have taken the β-blocker during pregnancy, we drew upon additional NLP tools to help identify tweets that report their pregnancy outcomes. Adverse pregnancy outcomes included miscarriage, stillbirth, birth defects, preterm birth (<37 weeks gestation), low birth weight (<5 pounds and 8 ounces at delivery), and neonatal intensive care unit (NICU) admission. Normal pregnancy outcomes included gestational age ≥37 weeks and birth weight ≥5 pounds and 8 ounces. Results We retrieved 5114 tweets, posted by 2339 users, that mention a β-blocker, and manually identified 2332 (45.6%) tweets, posted by 1195 (51.1%) of the users, that self-report taking the β-blocker. We were able to estimate the date of the prenatal time period for 356 pregnancies among 334 (27.9%) of these 1195 users. Among these 356 pregnancies, we identified 257 (72.2%) during which the β-blocker was or may have been taken. We manually verified an adverse pregnancy outcome—preterm birth, NICU admission, low birth weight, birth defects, or miscarriage—for 38 (14.8%) of these 257 pregnancies. We manually verified a gestational age ≥37 weeks for 198 (90.4%) and a birth weight ≥5 pounds and 8 ounces for 50 (22.8%) of the 219 pregnancies for which we did not identify an adverse pregnancy outcome. Conclusions Our ability to detect pregnancy outcomes for Twitter users who posted tweets reporting that they took or may have taken a β-blocker during pregnancy suggests that Twitter can be a complementary resource for cohort studies of drug safety in pregnancy.
Collapse
Affiliation(s)
- Ari Z Klein
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Karen O'Connor
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Lisa D Levine
- Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | | |
Collapse
|
11
|
Sarker A, Al-Garadi MA, Ge Y, Nataraj N, Jones CM, Sumner SA. Signals of increasing co-use of stimulants and opioids from online drug forum data. Harm Reduct J 2022; 19:51. [PMID: 35614501 PMCID: PMC9131693 DOI: 10.1186/s12954-022-00628-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 05/10/2022] [Indexed: 11/21/2022] Open
Abstract
Background Despite recent rises in fatal overdoses involving multiple substances, there is a paucity of knowledge about stimulant co-use patterns among people who use opioids (PWUO) or people being treated with medications for opioid use disorder (PTMOUD). A better understanding of the timing and patterns in stimulant co-use among PWUO based on mentions of these substances on social media can help inform prevention programs, policy, and future research directions. This study examines stimulant co-mention trends among PWUO/PTMOUD on social media over multiple years. Methods We collected publicly available data from 14 forums on Reddit (subreddits) that focused on prescription and illicit opioids, and medications for opioid use disorder (MOUD). Collected data ranged from 2011 to 2020, and we also collected timelines comprising past posts from a sample of Reddit users (Redditors) on these forums. We applied natural language processing to generate lexical variants of all included prescription and illicit opioids and stimulants and detect mentions of them on the chosen subreddits. Finally, we analyzed and described trends and patterns in co-mentions. Results Posts collected for 13,812 Redditors showed that 12,306 (89.1%) mentioned at least 1 opioid, opioid-related medication, or stimulant. Analyses revealed that the number and proportion of Redditors mentioning both opioids and/or opioid-related medications and stimulants steadily increased over time. Relative rates of co-mentions by the same Redditor of heroin and methamphetamine, the substances most commonly co-mentioned, decreased in recent years, while co-mentions of both fentanyl and MOUD with methamphetamine increased. Conclusion Our analyses reflect increasing mentions of stimulants, particularly methamphetamine, among PWUO/PTMOUD, which closely resembles the growth in overdose deaths involving both opioids and stimulants. These findings are consistent with recent reports suggesting increasing stimulant use among people receiving treatment for opioid use disorder. These data offer insights on emerging trends in the overdose epidemic and underscore the importance of scaling efforts to address co-occurring opioid and stimulant use including harm reduction and comprehensive healthcare access spanning mental-health services and substance use disorder treatment. Supplementary Information The online version contains supplementary material available at 10.1186/s12954-022-00628-2.
Collapse
Affiliation(s)
- Abeed Sarker
- Department of Biomedical Informatics, School of Medicine, Emory University, 101 Woodruff Circle, Suite 4101, Atlanta, GA, 30322, USA.
| | - Mohammed Ali Al-Garadi
- Department of Biomedical Informatics, School of Medicine, Emory University, 101 Woodruff Circle, Suite 4101, Atlanta, GA, 30322, USA
| | - Yao Ge
- Department of Biomedical Informatics, School of Medicine, Emory University, 101 Woodruff Circle, Suite 4101, Atlanta, GA, 30322, USA
| | - Nisha Nataraj
- National Center for Injury Prevention and Control, Centers for Disease Control and Prevention, Atlanta, GA, 30341, USA
| | - Christopher M Jones
- National Center for Injury Prevention and Control, Centers for Disease Control and Prevention, Atlanta, GA, 30341, USA
| | - Steven A Sumner
- National Center for Injury Prevention and Control, Centers for Disease Control and Prevention, Atlanta, GA, 30341, USA
| |
Collapse
|
12
|
Pimenta JM, Painter JL, Gemzoe K, Levy RA, Powell M, Meizlik P, Powell G. Identifying Barriers to Enrollment in Patient Pregnancy Registries: Building Evidence Through Crowdsourcing. JMIR Form Res 2022; 6:e30573. [PMID: 35612888 PMCID: PMC9178445 DOI: 10.2196/30573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 03/04/2022] [Accepted: 03/15/2022] [Indexed: 11/15/2022] Open
Abstract
Background Enrollment in pregnancy registries is challenging despite substantial awareness-raising activities, generally resulting in low recruitment owing to limited safety data. Understanding patient and physician awareness of and attitudes toward pregnancy registries is needed to facilitate enrollment. Crowdsourcing, in which services, ideas, or content are obtained by soliciting contributions from a large group of people using web-based platforms, has shown promise for improving patient engagement and obtaining patient insights. Objective This study aimed to use web-based crowdsourcing platforms to evaluate Belimumab Pregnancy Registry (BPR) awareness among patients and physicians and to identify potential barriers to pregnancy registry enrollment with the BPR as a case study. Methods We conducted 2 surveys using separate web-based crowdsourcing platforms: Amazon Mechanical Turk (a 14-question patient survey) and Sermo RealTime (a 11-question rheumatologist survey). Eligible patients were women, aged 18-55 years; diagnosed with systemic lupus erythematosus (SLE); and pregnant, recently pregnant (within 2 years), or planning pregnancy. Eligible rheumatologists had prescribed belimumab and treated pregnant women. Responses were descriptively analyzed. Results Of 151 patient respondents over a 3-month period (n=88, 58.3% aged 26-35 years; n=149, 98.7% with mild or moderate SLE; and n=148, 98% from the United States), 51% (77/151) were currently or recently pregnant. Overall, 169 rheumatologists completed the survey within 48 hours, and 59.2% (100/169) were based in the United States. Belimumab exposure was reported by 41.7% (63/151) patients, whereas 51.7% (75/145) rheumatologists had prescribed belimumab to <5 patients, 25.5% (37/145) had prescribed to 5-10 patients, and 22.8% (33/145) had prescribed to >10 patients who were pregnant or trying to conceive. Of the patients exposed to belimumab, 51% (32/63) were BPR-aware, and 45.5% (77/169) of the rheumatologists were BPR-aware. Overall, 60% (38/63) of patients reported belimumab discontinuation because of pregnancy or planned pregnancy. Among the 77 BPR-aware rheumatologists, 70 (91%) referred patients to the registry. Concerns among rheumatologists who did not prescribe belimumab during pregnancy included unknown pregnancy safety profile (119/169, 70.4%), and 61.5% (104/169) reported their patients’ concerns about the unknown pregnancy safety profile. Belimumab exposure during or recently after pregnancy or while trying to conceive was reported in patients with mild (6/64, 9%), moderate (22/85, 26%), or severe (1/2, 50%) SLE. Rheumatologists more commonly recommended belimumab for moderate (84/169, 49.7%) and severe (123/169, 72.8%) SLE than for mild SLE (36/169, 21.3%) for patients trying to conceive recently or currently pregnant. Overall, 81.6% (138/169) of the rheumatologists suggested a belimumab washout period before pregnancy of 0-30 days (44/138, 31.9%), 30-60 days (64/138, 46.4%), or >60 days (30/138, 21.7%). Conclusions In this case, crowdsourcing efficiently obtained patient and rheumatologist input, with some patients with SLE continuing to use belimumab during or while planning a pregnancy. There was moderate awareness of the BPR among patients and physicians.
Collapse
Affiliation(s)
| | - Jeffery L Painter
- Safety Innovation and Analytics, GlaxoSmithKline, Durham, NC, United States
| | - Kim Gemzoe
- GlaxoSmithKline, Stevenage, United Kingdom
| | | | - Marcy Powell
- Safety Innovation and Analytics, GlaxoSmithKline, Durham, NC, United States
| | | | - Gregory Powell
- Safety Innovation and Analytics, GlaxoSmithKline, Durham, NC, United States
| |
Collapse
|
13
|
Lucía Schmidt A, Rodriguez-Esteban R, Gottowik J, Leddin M. Applications of quantitative social media listening to patient-centric drug development. Drug Discov Today 2022; 27:1523-1530. [PMID: 35114364 DOI: 10.1016/j.drudis.2022.01.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 08/13/2021] [Accepted: 01/26/2022] [Indexed: 11/27/2022]
Abstract
Social media listening has been increasingly acknowledged as a tool with applications in many stages of the drug development process. These applications were created to meet the need for patient-centric therapies that are fit-for-purpose and meaningful to patients. Such applications, however, require the leverage of new quantitative approaches and analytical methods that draw from developments in artificial intelligence and real-world data (RWD) analysis. Here, we review the state-of-the-art in quantitative social media listening (QSML) methods applied to drug discovery from the perspective of the pharmaceutical industry.
Collapse
Affiliation(s)
- Ana Lucía Schmidt
- Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070 Basel, Switzerland
| | - Raul Rodriguez-Esteban
- Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070 Basel, Switzerland.
| | - Juergen Gottowik
- Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070 Basel, Switzerland
| | - Mathias Leddin
- Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070 Basel, Switzerland
| |
Collapse
|
14
|
Klein AZ, Magge A, Gonzalez-Hernandez G. ReportAGE: Automatically extracting the exact age of Twitter users based on self-reports in tweets. PLoS One 2022; 17:e0262087. [PMID: 35077484 PMCID: PMC8789116 DOI: 10.1371/journal.pone.0262087] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 12/17/2021] [Indexed: 11/18/2022] Open
Abstract
Advancing the utility of social media data for research applications requires methods for automatically detecting demographic information about social media study populations, including users' age. The objective of this study was to develop and evaluate a method that automatically identifies the exact age of users based on self-reports in their tweets. Our end-to-end automatic natural language processing (NLP) pipeline, ReportAGE, includes query patterns to retrieve tweets that potentially mention an age, a classifier to distinguish retrieved tweets that self-report the user's exact age ("age" tweets) and those that do not ("no age" tweets), and rule-based extraction to identify the age. To develop and evaluate ReportAGE, we manually annotated 11,000 tweets that matched the query patterns. Based on 1000 tweets that were annotated by all five annotators, inter-annotator agreement (Fleiss' kappa) was 0.80 for distinguishing "age" and "no age" tweets, and 0.95 for identifying the exact age among the "age" tweets on which the annotators agreed. A deep neural network classifier, based on a RoBERTa-Large pretrained transformer model, achieved the highest F1-score of 0.914 (precision = 0.905, recall = 0.942) for the "age" class. When the age extraction was evaluated using the classifier's predictions, it achieved an F1-score of 0.855 (precision = 0.805, recall = 0.914) for the "age" class. When it was evaluated directly on the held-out test set, it achieved an F1-score of 0.931 (precision = 0.873, recall = 0.998) for the "age" class. We deployed ReportAGE on a collection of more than 1.2 billion tweets, posted by 245,927 users, and predicted ages for 132,637 (54%) of them. Scaling the detection of exact age to this large number of users can advance the utility of social media data for research applications that do not align with the predefined age groupings of extant binary or multi-class classification approaches.
Collapse
Affiliation(s)
- Ari Z. Klein
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Arjun Magge
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Graciela Gonzalez-Hernandez
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
15
|
Klein AZ, O'Connor K, Gonzalez-Hernandez G. Toward Using Twitter Data to Monitor COVID-19 Vaccine Safety in Pregnancy: Proof-of-Concept Study of Cohort Identification. JMIR Form Res 2022; 6:e33792. [PMID: 34870607 PMCID: PMC8734607 DOI: 10.2196/33792] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 11/15/2021] [Accepted: 11/22/2021] [Indexed: 01/19/2023] Open
Abstract
Background COVID-19 during pregnancy is associated with an increased risk of maternal death, intensive care unit admission, and preterm birth; however, many people who are pregnant refuse to receive COVID-19 vaccination because of a lack of safety data. Objective The objective of this preliminary study was to assess whether Twitter data could be used to identify a cohort for epidemiologic studies of COVID-19 vaccination in pregnancy. Specifically, we examined whether it is possible to identify users who have reported (1) that they received COVID-19 vaccination during pregnancy or the periconception period, and (2) their pregnancy outcomes. Methods We developed regular expressions to search for reports of COVID-19 vaccination in a large collection of tweets posted through the beginning of July 2021 by users who have announced their pregnancy on Twitter. To help determine if users were vaccinated during pregnancy, we drew upon a natural language processing (NLP) tool that estimates the timeframe of the prenatal period. For users who posted tweets with a timestamp indicating they were vaccinated during pregnancy, we drew upon additional NLP tools to help identify tweets that reported their pregnancy outcomes. Results We manually verified the content of tweets detected automatically, identifying 150 users who reported on Twitter that they received at least one dose of COVID-19 vaccination during pregnancy or the periconception period. We manually verified at least one reported outcome for 45 of the 60 (75%) completed pregnancies. Conclusions Given the limited availability of data on COVID-19 vaccine safety in pregnancy, Twitter can be a complementary resource for potentially increasing the acceptance of COVID-19 vaccination in pregnant populations. The results of this preliminary study justify the development of scalable methods to identify a larger cohort for epidemiologic studies.
Collapse
Affiliation(s)
- Ari Z Klein
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Karen O'Connor
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Graciela Gonzalez-Hernandez
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
16
|
Koss J, Rheinlaender A, Truebel H, Bohnet-Joschko S. Social media mining in drug development-Fundamentals and use cases. Drug Discov Today 2021; 26:2871-2880. [PMID: 34481080 DOI: 10.1016/j.drudis.2021.08.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 06/03/2021] [Accepted: 08/27/2021] [Indexed: 11/18/2022]
Abstract
The incorporation of patients' perspectives into drug discovery and development has become critically important from the viewpoint of accounting for modern-day business dynamics. There is a trend among patients to narrate their disease experiences on social media. The insights gained by analyzing the data pertaining to such social-media posts could be leveraged to support patient-centered drug development. Manual analysis of these data is nearly impossible, but artificial intelligence enables automated and cost-effective processing, also referred as social media mining (SMM). This paper discusses the fundamental SMM methods along with several relevant drug-development use cases.
Collapse
Affiliation(s)
| | | | - Hubert Truebel
- Witten/Herdecke University, Witten, Germany; AiCuris AG, Wuppertal, Germany
| | | |
Collapse
|
17
|
Guntuku SC, Gaulton JS, Seltzer EK, Asch DA, Srinivas SK, Ungar LH, Mancheno C, Klinger EV, Merchant RM. Studying social media language changes associated with pregnancy status, trimester, and parity from medical records. ACTA ACUST UNITED AC 2021; 16:1745506520949392. [PMID: 33028170 PMCID: PMC7549071 DOI: 10.1177/1745506520949392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We sought to evaluate whether there was variability in language used on social
media across different time points of pregnancy (before, during, and after
pregnancy, as well as by trimester and parity). Consenting patients shared
access to their individual Facebook posts and electronic medical records. Random
forest models trained on Facebook posts could differentiate first trimester of
pregnancy from 3 months before pregnancy (F1 score = .63) and from a random
3-month time period (F1 score = .64). Posts during pregnancy were more likely to
include themes about family (β = .22), food craving (β = .14), and date/times
(β = .13), while posts 3 months prior to pregnancy included themes about social
life (β = .30), sleep (β = .31), and curse words (β = .27), and 3 months
post-pregnancy included themes of gratitude (β = .17), health appointments
(β = .21), and religiosity (β = .18). Users who were pregnant for the first time
were more likely to post about lack of sleep (β = .15), activities of daily
living (β = .09), and communication (β = .08) compared with those who were
pregnant after having a child who posted about others’ birthdays (β = .16) and
life events (.12). A better understanding about social media timelines can
provide insight into lifestyle choices that are specific to pregnancy.
Collapse
Affiliation(s)
- Sharath Chandra Guntuku
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, PA, USA.,Department of Emergency Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.,Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Jessica S Gaulton
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, PA, USA.,Department of Emergency Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Emily K Seltzer
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, PA, USA.,Penn Medicine Center for Health Care Innovation, University of Pennsylvania, Philadelphia, PA, USA
| | - David A Asch
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, PA, USA.,Penn Medicine Center for Health Care Innovation, University of Pennsylvania, Philadelphia, PA, USA
| | - Sindhu K Srinivas
- Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Lyle H Ungar
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, PA, USA.,Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA.,Positive Psychology Center, University of Pennsylvania, Philadelphia, PA, USA
| | - Christina Mancheno
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, PA, USA
| | - Elissa V Klinger
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, PA, USA.,Penn Medicine Center for Health Care Innovation, University of Pennsylvania, Philadelphia, PA, USA
| | - Raina M Merchant
- Penn Medicine Center for Digital Health, University of Pennsylvania, Philadelphia, PA, USA.,Department of Emergency Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.,Penn Medicine Center for Health Care Innovation, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
18
|
Yang YC, Al-Garadi MA, Love JS, Perrone J, Sarker A. Automatic gender detection in Twitter profiles for health-related cohort studies. JAMIA Open 2021; 4:ooab042. [PMID: 34169232 PMCID: PMC8220305 DOI: 10.1093/jamiaopen/ooab042] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Revised: 04/27/2021] [Accepted: 05/04/2021] [Indexed: 11/17/2022] Open
Abstract
Objective Biomedical research involving social media data is gradually moving from population-level to targeted, cohort-level data analysis. Though crucial for biomedical studies, social media user’s demographic information (eg, gender) is often not explicitly known from profiles. Here, we present an automatic gender classification system for social media and we illustrate how gender information can be incorporated into a social media-based health-related study. Materials and Methods We used a large Twitter dataset composed of public, gender-labeled users (Dataset-1) for training and evaluating the gender detection pipeline. We experimented with machine learning algorithms including support vector machines (SVMs) and deep-learning models, and public packages including M3. We considered users’ information including profile and tweets for classification. We also developed a meta-classifier ensemble that strategically uses the predicted scores from the classifiers. We then applied the best-performing pipeline to Twitter users who have self-reported nonmedical use of prescription medications (Dataset-2) to assess the system’s utility. Results and Discussion We collected 67 181 and 176 683 users for Dataset-1 and Dataset-2, respectively. A meta-classifier involving SVM and M3 performed the best (Dataset-1 accuracy: 94.4% [95% confidence interval: 94.0–94.8%]; Dataset-2: 94.4% [95% confidence interval: 92.0–96.6%]). Including automatically classified information in the analyses of Dataset-2 revealed gender-specific trends—proportions of females closely resemble data from the National Survey of Drug Use and Health 2018 (tranquilizers: 0.50 vs 0.50; stimulants: 0.50 vs 0.45), and the overdose Emergency Room Visit due to Opioids by Nationwide Emergency Department Sample (pain relievers: 0.38 vs 0.37). Conclusion Our publicly available, automated gender detection pipeline may aid cohort-specific social media data analyses (https://bitbucket.org/sarkerlab/gender-detection-for-public).
Collapse
Affiliation(s)
- Yuan-Chi Yang
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, Georgia, USA
| | - Mohammed Ali Al-Garadi
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, Georgia, USA
| | - Jennifer S Love
- Department of Emergency Medicine, School of Medicine, Oregon Health & Science University, Portland, Oregon, USA
| | - Jeanmarie Perrone
- Department of Emergency Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Abeed Sarker
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, Georgia, USA.,Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia, USA
| |
Collapse
|
19
|
Pang RD, Dormanesh A, Hoang Y, Chu M, Allem JP. Twitter Posts About Cannabis Use During Pregnancy and Postpartum:A Content Analysis. Subst Use Misuse 2021; 56:1074-1077. [PMID: 33821757 DOI: 10.1080/10826084.2021.1906277] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
The prevalence of cannabis use has increased among U.S. pregnant women. Given this increase, and rapidly changing cannabis policies, it may be important to harness digital data sources to help capture trends and perceptions of cannabis use during pregnancy and postpartum. The objective of this study was to examine cannabis and pregnancy-related posts on Twitter over a 12-month period. Methods: Twitter posts from December 1, 2019 to December 1, 2020 that contained pregnancy and cannabis-related keywords were collected in this study (n = 17,238). A sample of 1,000 posts proportionally sampled by week and cannabis/pregnancy-related terms were selected for coding. Posts were classified by one or more of the following themes: 1) Safety during pregnancy i.e. mentions the safety of cannabis use during pregnancy, 2) Safety postpartum i.e. mentions the safety of cannabis use postpartum, and 3) Use for pregnancy-related symptoms i.e. mentions use of cannabis to help with morning sickness, nausea, vomiting, headaches, pain, stress, and fatigue. Results: Safety during pregnancy occurred in 36.00% of the posts and 2.30% posts asked about safety during postpartum. Use of cannabis for pregnancy-related symptoms occurred in 2.70% of posts. Discussion: Findings show that conversations about the risks and benefits of cannabis use during pregnancy and postpartum take place on Twitter. These findings suggests that health practitioners should discuss the risks of cannabis use (including CBD) during pregnancy and breastfeeding with their patients. Health communication planners may need to find ways to communicate risks with the public to prevent the spread of misinformation.
Collapse
Affiliation(s)
- Raina D Pang
- Department of Preventive Medicine, Keck School of Medicine of USC, Los Angeles, California, USA
| | - Allison Dormanesh
- Department of Preventive Medicine, Keck School of Medicine of USC, Los Angeles, California, USA
| | - Yannie Hoang
- Department of Preventive Medicine, Keck School of Medicine of USC, Los Angeles, California, USA
| | - Maya Chu
- Department of Preventive Medicine, Keck School of Medicine of USC, Los Angeles, California, USA
| | - Jon-Patrick Allem
- Department of Preventive Medicine, Keck School of Medicine of USC, Los Angeles, California, USA
| |
Collapse
|
20
|
Klein AZ, Gonzalez-Hernandez G. An annotated data set for identifying women reporting adverse pregnancy outcomes on Twitter. Data Brief 2020; 32:106249. [PMID: 32944604 PMCID: PMC7481818 DOI: 10.1016/j.dib.2020.106249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 08/25/2020] [Indexed: 10/29/2022] Open
Abstract
Despite the prevalence in the United States of miscarriage [1], stillbirth [2], and infant mortality associated with preterm birth and low birthweight [3], their causes remain largely unknown [4], [5], [6]. To advance the use of social media data as a complementary resource for epidemiology of adverse pregnancy outcomes, we present a data set of 6487 tweets that mention miscarriage, stillbirth, preterm birth or premature labor, low birthweight, neonatal intensive care, or fetal/infant loss in general. These tweets are a subset of 22,912 tweets retrieved by applying hand-written regular expressions to a database containing more than 400 million public tweets posted by more than 100,000 women who have announced their pregnancy on Twitter [7]. Two professional annotators labeled the 6487 tweets in a binary fashion, distinguishing those potentially reporting that the user has personally experienced the outcome ("outcome" tweets) from those that merely mention the outcome ("non-outcome" tweets). Inter-annotator agreement was κ = 0.90 (Cohen's kappa). The tweets annotated as "outcome" include 1318 women reporting miscarriage, 94 stillbirth, 591 preterm birth or premature labor, 171 low birthweight, 453 neonatal intensive care, and 356 fetal/infant loss in general. These "outcome" tweets can be used to explore patient experiences and perceptions of adverse pregnancy outcomes, and can direct researchers to the users' broader timelines-tweets posted by a user over time-for observational studies. Our past work demonstrates the analysis of timelines for selecting a study population [8] and conducting a case-control study [9] of users reporting that their child has a birth defect. For larger-scale studies, the full annotated corpus can be used to train supervised machine learning algorithms to automatically identify additional users reporting adverse pregnancy outcomes on Twitter. We used the annotated corpus to train feature-engineered and deep learning-based classifiers presented in "A natural language processing pipeline to advance the use of Twitter data for digital epidemiology of adverse pregnancy outcomes" [10].
Collapse
Affiliation(s)
- Ari Z. Klein
- University of Pennsylvania, Philadelphia, PA, USA
| | | |
Collapse
|
21
|
Klein AZ, Cai H, Weissenbacher D, Levine LD, Gonzalez-Hernandez G. A natural language processing pipeline to advance the use of Twitter data for digital epidemiology of adverse pregnancy outcomes. J Biomed Inform 2020; 112S:100076. [PMID: 34417007 PMCID: PMC11524020 DOI: 10.1016/j.yjbinx.2020.100076] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 06/30/2020] [Accepted: 07/27/2020] [Indexed: 10/23/2022]
Abstract
BACKGROUND In the United States, 17% of pregnancies end in fetal loss: miscarriage or stillbirth. Preterm birth affects 10% of live births in the United States and is the leading cause of neonatal death globally. Preterm births with low birthweight are the second leading cause of infant mortality in the United States. Despite their prevalence, the causes of miscarriage, stillbirth, and preterm birth are largely unknown. OBJECTIVE The primary objectives of this study are to (1) assess whether women report miscarriage, stillbirth, and preterm birth, among others, on Twitter, and (2) develop natural language processing (NLP) methods to automatically identify users from which to select cases for large-scale observational studies. METHODS We handcrafted regular expressions to retrieve tweets that mention an adverse pregnancy outcome, from a database containing more than 400 million publicly available tweets posted by more than 100,000 users who have announced their pregnancy on Twitter. Two annotators independently annotated 8109 (one random tweet per user) of the 22,912 retrieved tweets, distinguishing those reporting that the user has personally experienced the outcome ("outcome" tweets) from those that merely mention the outcome ("non-outcome" tweets). Inter-annotator agreement was κ = 0.90 (Cohen's kappa). We used the annotated tweets to train and evaluate feature-engineered and deep learning-based classifiers. We further annotated 7512 (of the 8109) tweets to develop a generalizable, rule-based module designed to filter out reported speech-that is, posts containing what was said by others-prior to automatic classification. We performed an extrinsic evaluation assessing whether the reported speech filter could improve the detection of women reporting adverse pregnancy outcomes on Twitter. RESULTS The tweets annotated as "outcome" include 1632 women reporting miscarriage, 119 stillbirth, 749 preterm birth or premature labor, 217 low birthweight, 558 NICU admission, and 458 fetal/infant loss in general. A deep neural network, BERT-based classifier achieved the highest overall F1-score (0.88) for automatically detecting "outcome" tweets (precision = 0.87, recall = 0.89), with an F1-score of at least 0.82 and a precision of at least 0.84 for each of the adverse pregnancy outcomes. Our reported speech filter significantly (P < 0.05) improved the accuracy of Logistic Regression (from 78.0% to 80.8%) and majority voting-based ensemble (from 81.1% to 82.9%) classifiers. Although the filter did not improve the F1-score of the BERT-based classifier, it did improve precision-a trade-off of recall that may be acceptable for automated case selection of more prevalent outcomes. Without the filter, reported speech is one of the main sources of errors for the BERT-based classifier. CONCLUSION This study demonstrates that (1) women do report their adverse pregnancy outcomes on Twitter, (2) our NLP pipeline can automatically identify users from which to select cases for large-scale observational studies, and (3) our reported speech filter would reduce the cost of annotating health-related social media data and can significantly improve the overall performance of feature-based classifiers.
Collapse
Affiliation(s)
- Ari Z Klein
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Haitao Cai
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Davy Weissenbacher
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Lisa D Levine
- Maternal and Child Health Research Center, Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Graciela Gonzalez-Hernandez
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
22
|
Equity, Inclusivity, and Innovative Digital Technologies to Improve Adolescent and Young Adult Health. J Adolesc Health 2020; 67:S4-S6. [PMID: 32718514 DOI: 10.1016/j.jadohealth.2020.05.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Accepted: 05/15/2020] [Indexed: 11/22/2022]
|
23
|
Davoudi A, Klein AZ, Sarker A, Gonzalez-Hernandez G. Towards Automatic Bot Detection in Twitter for Health-related Tasks. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2020; 2020:136-141. [PMID: 32477632 PMCID: PMC7233076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
With the increasing use of social media data for health-related research, the credibility of the information from this source has been questioned as the posts may not from originating personal accounts. While automatic bot detection approaches have been proposed, none have been evaluated on users posting health-related information. In this paper, we extend an existing bot detection system and customize it for health-related research. Using a dataset of Twitter users, we first show that the system, which was designed for political bot detection, underperforms when applied to health-related Twitter users. We then incorporate additional features and a statistical machine learning classifier to improve bot detection performance significantly. Our approach obtains F1-scores of 0.7 for the "bot" class, representing improvements of 0.339. Our approach is customizable and generalizable for bot detection in other health-related social media cohorts.
Collapse
Affiliation(s)
- Anahita Davoudi
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Ari Z Klein
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Abeed Sarker
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA 30322
| | - Graciela Gonzalez-Hernandez
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| |
Collapse
|
24
|
Klein AZ, Gebreyesus A, Gonzalez-Hernandez G. Automatically Identifying Comparator Groups on Twitter for Digital Epidemiology of Pregnancy Outcomes. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2020; 2020:317-325. [PMID: 32477651 PMCID: PMC7233041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Despite the prevalence of adverse pregnancy outcomes such as miscarriage, stillbirth, birth defects, and preterm birth, their causes are largely unknown. We seek to advance the use of social media for observational studies of pregnancy outcomes by developing a natural language processing pipeline for automatically identifying users from which to select comparator groups on Twitter. We annotated 2361 tweets by users who have announced their pregnancy on Twitter, which were used to train and evaluate supervised machine learning algorithms as a basis for automatically detecting women who have reported that their pregnancy had reached term and their baby was born at a normal weight. Upon further processing the tweet-level predictions of a majority voting-based ensemble classifier, the pipeline achieved a user-level F1-score of 0.933 (precision = 0.947, recall = 0.920). Our pipeline will be deployed to identify large comparator groups for studying pregnancy outcomes on Twitter.
Collapse
Affiliation(s)
- Ari Z Klein
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Abeselom Gebreyesus
- Department of Sociology, Anthropology, and Health Administration and Policy, University of Maryland, Baltimore County, Baltimore, MD, USA
| | - Graciela Gonzalez-Hernandez
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
25
|
Pharmacoepidemiologic Evaluation of Birth Defects from Health-Related Postings in Social Media During Pregnancy. Drug Saf 2020; 42:389-400. [PMID: 30284214 PMCID: PMC6426821 DOI: 10.1007/s40264-018-0731-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Introduction Adverse effects of medications taken during pregnancy are traditionally studied through post-marketing pregnancy registries, which have limitations. Social media data may be an alternative data source for pregnancy surveillance studies. Objective The objective of this study was to assess the feasibility of using social media data as an alternative source for pregnancy surveillance for regulatory decision making. Methods We created an automated method to identify Twitter accounts of pregnant women. We identified 196 pregnant women with a mention of a birth defect in relation to their baby and 196 without a mention of a birth defect in relation to their baby. We extracted information on pregnancy and maternal demographics, medication intake and timing, and birth defects. Results Although often incomplete, we extracted data for the majority of the pregnancies. Among women that reported birth defects, 35% reported taking one or more medications during pregnancy compared with 17% of controls. After accounting for age, race, and place of residence, a higher medication intake was observed in women who reported birth defects. The rate of birth defects in the pregnancy cohort was lower (0.44%) compared with the rate in the general population (3%). Conclusions Twitter data capture information on medication intake and birth defects; however, the information obtained cannot replace pregnancy registries at this time. Development of improved methods to automatically extract and annotate social media data may increase their value to support regulatory decision making regarding pregnancy outcomes in women using medications during their pregnancies.
Collapse
|
26
|
Automatic Breast Cancer Cohort Detection from Social Media for Studying Factors Affecting Patient-Centered Outcomes. Artif Intell Med 2020. [DOI: 10.1007/978-3-030-59137-3_10] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
27
|
Weissenbacher D, Sarker A, Klein A, O’Connor K, Magge A, Gonzalez-Hernandez G. Deep neural networks ensemble for detecting medication mentions in tweets. J Am Med Inform Assoc 2019; 26:1618-1626. [PMID: 31562510 PMCID: PMC6857507 DOI: 10.1093/jamia/ocz156] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 07/26/2019] [Accepted: 08/13/2019] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE Twitter posts are now recognized as an important source of patient-generated data, providing unique insights into population health. A fundamental step toward incorporating Twitter data in pharmacoepidemiologic research is to automatically recognize medication mentions in tweets. Given that lexical searches for medication names suffer from low recall due to misspellings or ambiguity with common words, we propose a more advanced method to recognize them. MATERIALS AND METHODS We present Kusuri, an Ensemble Learning classifier able to identify tweets mentioning drug products and dietary supplements. Kusuri (, "medication" in Japanese) is composed of 2 modules: first, 4 different classifiers (lexicon based, spelling variant based, pattern based, and a weakly trained neural network) are applied in parallel to discover tweets potentially containing medication names; second, an ensemble of deep neural networks encoding morphological, semantic, and long-range dependencies of important words in the tweets makes the final decision. RESULTS On a class-balanced (50-50) corpus of 15 005 tweets, Kusuri demonstrated performances close to human annotators with an F1 score of 93.7%, the best score achieved thus far on this corpus. On a corpus made of all tweets posted by 112 Twitter users (98 959 tweets, with only 0.26% mentioning medications), Kusuri obtained an F1 score of 78.8%. To the best of our knowledge, Kusuri is the first system to achieve this score on such an extremely imbalanced dataset. CONCLUSIONS The system identifies tweets mentioning drug names with performance high enough to ensure its usefulness, and is ready to be integrated in pharmacovigilance, toxicovigilance, or more generally, public health pipelines that depend on medication name mentions.
Collapse
Affiliation(s)
- Davy Weissenbacher
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Abeed Sarker
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Ari Klein
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Karen O’Connor
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Arjun Magge
- Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, Tempe, Arizona, USA
| | - Graciela Gonzalez-Hernandez
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
28
|
Sarker A, Gonzalez-Hernandez G, Ruan Y, Perrone J. Machine Learning and Natural Language Processing for Geolocation-Centric Monitoring and Characterization of Opioid-Related Social Media Chatter. JAMA Netw Open 2019; 2:e1914672. [PMID: 31693125 PMCID: PMC6865282 DOI: 10.1001/jamanetworkopen.2019.14672] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
IMPORTANCE Automatic curation of consumer-generated, opioid-related social media big data may enable real-time monitoring of the opioid epidemic in the United States. OBJECTIVE To develop and validate an automatic text-processing pipeline for geospatial and temporal analysis of opioid-mentioning social media chatter. DESIGN, SETTING, AND PARTICIPANTS This cross-sectional, population-based study was conducted from December 1, 2017, to August 31, 2019, and used more than 3 years of publicly available social media posts on Twitter, dated from January 1, 2012, to October 31, 2015, that were geolocated in Pennsylvania. Opioid-mentioning tweets were extracted using prescription and illicit opioid names, including street names and misspellings. Social media posts (tweets) (n = 9006) were manually categorized into 4 classes, and training and evaluation of several machine learning algorithms were performed. Temporal and geospatial patterns were analyzed with the best-performing classifier on unlabeled data. MAIN OUTCOMES AND MEASURES Pearson and Spearman correlations of county- and substate-level abuse-indicating tweet rates with opioid overdose death rates from the Centers for Disease Control and Prevention WONDER database and with 4 metrics from the National Survey on Drug Use and Health for 3 years were calculated. Classifier performances were measured through microaveraged F1 scores (harmonic mean of precision and recall) or accuracies and 95% CIs. RESULTS A total of 9006 social media posts were annotated, of which 1748 (19.4%) were related to abuse, 2001 (22.2%) were related to information, 4830 (53.6%) were unrelated, and 427 (4.7%) were not in the English language. Yearly rates of abuse-indicating social media post showed statistically significant correlation with county-level opioid-related overdose death rates (n = 75) for 3 years (Pearson r = 0.451, P < .001; Spearman r = 0.331, P = .004). Abuse-indicating tweet rates showed consistent correlations with 4 NSDUH metrics (n = 13) associated with nonmedical prescription opioid use (Pearson r = 0.683, P = .01; Spearman r = 0.346, P = .25), illicit drug use (Pearson r = 0.850, P < .001; Spearman r = 0.341, P = .25), illicit drug dependence (Pearson r = 0.937, P < .001; Spearman r = 0.495, P = .09), and illicit drug dependence or abuse (Pearson r = 0.935, P < .001; Spearman r = 0.401, P = .17) over the same 3-year period, although the tests lacked power to demonstrate statistical significance. A classification approach involving an ensemble of classifiers produced the best performance in accuracy or microaveraged F1 score (0.726; 95% CI, 0.708-0.743). CONCLUSIONS AND RELEVANCE The correlations obtained in this study suggest that a social media-based approach reliant on supervised machine learning may be suitable for geolocation-centric monitoring of the US opioid epidemic in near real time.
Collapse
Affiliation(s)
- Abeed Sarker
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, Georgia
| | - Graciela Gonzalez-Hernandez
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia
| | - Yucheng Ruan
- School of Engineering and Applied Science, University of Pennsylvania, Philadelphia
| | - Jeanmarie Perrone
- Department of Emergency Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia
| |
Collapse
|
29
|
Klein AZ, Sarker A, Weissenbacher D, Gonzalez-Hernandez G. Towards scaling Twitter for digital epidemiology of birth defects. NPJ Digit Med 2019; 2:96. [PMID: 31583284 PMCID: PMC6773753 DOI: 10.1038/s41746-019-0170-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 08/12/2019] [Indexed: 11/13/2022] Open
Abstract
Social media has recently been used to identify and study a small cohort of Twitter users whose pregnancies with birth defect outcomes-the leading cause of infant mortality-could be observed via their publicly available tweets. In this study, we exploit social media on a larger scale by developing natural language processing (NLP) methods to automatically detect, among thousands of users, a cohort of mothers reporting that their child has a birth defect. We used 22,999 annotated tweets to train and evaluate supervised machine learning algorithms-feature-engineered and deep learning-based classifiers-that automatically distinguish tweets referring to the user's pregnancy outcome from tweets that merely mention birth defects. Because 90% of the tweets merely mention birth defects, we experimented with under-sampling and over-sampling approaches to address this class imbalance. An SVM classifier achieved the best performance for the two positive classes: an F1-score of 0.65 for the "defect" class and 0.51 for the "possible defect" class. We deployed the classifier on 20,457 unlabeled tweets that mention birth defects, which helped identify 542 additional users for potential inclusion in our cohort. Contributions of this study include (1) NLP methods for automatically detecting tweets by users reporting their birth defect outcomes, (2) findings that an SVM classifier can outperform a deep neural network-based classifier for highly imbalanced social media data, (3) evidence that automatic classification can be used to identify additional users for potential inclusion in our cohort, and (4) a publicly available corpus for training and evaluating supervised machine learning algorithms.
Collapse
Affiliation(s)
- Ari Z. Klein
- Department of Biostatistics, Epidemiology, and Informatics Perelman School of Medicine University of Pennsylvania, Philadelphia, PA USA
| | - Abeed Sarker
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA USA
| | - Davy Weissenbacher
- Department of Biostatistics, Epidemiology, and Informatics Perelman School of Medicine University of Pennsylvania, Philadelphia, PA USA
| | - Graciela Gonzalez-Hernandez
- Department of Biostatistics, Epidemiology, and Informatics Perelman School of Medicine University of Pennsylvania, Philadelphia, PA USA
| |
Collapse
|
30
|
Rezaallah B, Lewis DJ, Pierce C, Zeilhofer HF, Berg BI. Social Media Surveillance of Multiple Sclerosis Medications Used During Pregnancy and Breastfeeding: Content Analysis. J Med Internet Res 2019; 21:e13003. [PMID: 31392963 PMCID: PMC6702799 DOI: 10.2196/13003] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 06/02/2019] [Accepted: 06/29/2019] [Indexed: 12/19/2022] Open
Abstract
Background Multiple sclerosis (MS) is a chronic neurological disease occurring mostly in women of childbearing age. Pregnant women with MS are usually excluded from clinical trials; as users of the internet, however, they are actively engaged in threads and forums on social media. Social media provides the potential to explore real-world patient experiences and concerns about the use of medicinal products during pregnancy and breastfeeding. Objective This study aimed to analyze the content of posts concerning pregnancy and use of medicines in online forums; thus, the study aimed to gain a thorough understanding of patients’ experiences with MS medication. Methods Using the names of medicinal products as search terms, we collected posts from 21 publicly available pregnancy forums, which were accessed between March 2015 and March 2018. After the identification of relevant posts, we analyzed the content of each post using a content analysis technique and categorized the main topics that users discussed most frequently. Results We identified 6 main topics in 70 social media posts. These topics were as follows: (1) expressing personal experiences with MS medication use during the reproductive period (55/70, 80%), (2) seeking and sharing advice about the use of medicines (52/70, 74%), (3) progression of MS during and after pregnancy (35/70, 50%), (4) discussing concerns about MS medications during the reproductive period (35/70, 50%), (5) querying the possibility of breastfeeding while taking MS medications (30/70, 42%), and (6) commenting on communications with physicians (26/70, 37%). Conclusions Overall, many pregnant women or women considering pregnancy shared profound uncertainties and specific concerns about taking medicines during the reproductive period. There is a significant need to provide advice and guidance to MS patients concerning the use of medicines in pregnancy and postpartum as well as during breastfeeding. Advice must be tailored to the circumstances of each patient and, of course, to the individual medicine. Information must be provided by a trusted source with relevant expertise and made publicly available.
Collapse
Affiliation(s)
- Bita Rezaallah
- Department of Clinical Research, University of Basel, Basel, Switzerland.,Patient Safety, Novartis Pharma AG, Basel, Switzerland
| | - David John Lewis
- Patient Safety, Novartis Pharma AG, Basel, Switzerland.,School of Health and Human Sciences, University of Hertfordshire, Hatfield, United Kingdom
| | | | - Hans-Florian Zeilhofer
- Department of Cranio-Maxillofacial Surgery, University Hospital of Basel, Basel, Switzerland.,Hightech Research Center of Cranio-Maxillofacial Surgery, University of Basel, Basel, Switzerland
| | - Britt-Isabelle Berg
- Department of Cranio-Maxillofacial Surgery, University Hospital of Basel, Basel, Switzerland.,Hightech Research Center of Cranio-Maxillofacial Surgery, University of Basel, Basel, Switzerland
| |
Collapse
|
31
|
Nikfarjam A, Ransohoff JD, Callahan A, Polony V, Shah NH. Profiling off-label prescriptions in cancer treatment using social health networks. JAMIA Open 2019; 2:301-305. [PMID: 31709388 PMCID: PMC6824514 DOI: 10.1093/jamiaopen/ooz025] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 05/10/2019] [Accepted: 06/20/2019] [Indexed: 11/12/2022] Open
Abstract
Objectives To investigate using patient posts in social media as a resource to profile off-label prescriptions of cancer drugs. Methods We analyzed patient posts from the Inspire health forums (www.inspire.com) and extracted mentions of cancer drugs from the 14 most active cancer-type specific support groups. To quantify drug-disease associations, we calculated information component scores from the frequency of posts in each cancer-specific group with mentions of a given drug. We evaluated the results against three sources: manual review, Wolters-Kluwer Medi-span, and Truven MarketScan insurance claims. Results We identified 279 frequently discussed and therefore highly associated drug-disease pairs from Inspire posts. Of these, 96 are FDA approved, 9 are known off-label uses, and 174 do not have records of known usage (potentially novel off-label uses). We achieved a mean average precision of 74.9% in identifying drug-disease pairs with a true indication association from patient posts and found consistent evidence in medical claims records. We achieved a recall of 69.2% in identifying known off-label drug uses (based on Wolters-Kluwer Medi-span) from patient posts.
Collapse
Affiliation(s)
- Azadeh Nikfarjam
- Stanford Center for Biomedical Informatics Research, Stanford, California, USA
| | - Julia D Ransohoff
- Stanford School of Medicine, Department of Internal Medicine, Stanford, California, USA
| | - Alison Callahan
- Stanford Center for Biomedical Informatics Research, Stanford, California, USA
| | - Vladimir Polony
- Stanford Center for Biomedical Informatics Research, Stanford, California, USA
| | - Nigam H Shah
- Stanford Center for Biomedical Informatics Research, Stanford, California, USA
| |
Collapse
|
32
|
Klein AZ, Sarker A, O'Connor K, Gonzalez-Hernandez G. An Analysis of a Twitter Corpus for Training a Medication Intake Classifier. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2019; 2019:102-106. [PMID: 31258961 PMCID: PMC6568126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
While social media has evolved into a useful resource for studying medication-related information, observational studies of medications have continued to rely on other sources of data. Towards advancing the use of social media data for medication-related observational studies, we analyze an annotated corpus of 27,941 tweets designed for training machine learning algorithms to automatically detect users' medication intake. In particular, we assess how a baseline classifier trained on the general corpus-that is, on various types of medication-performs for specific types. For most types, the classifier performs significantly better than it does overall; however, for nervous system medications, it performs significantly worse. These results suggest that, while the general corpus may have utility for observational studies focusing on most types of medication, studying nervous system medications may benefit from training a classifier exclusively for this type. We will explore this data-level approach in future work.
Collapse
Affiliation(s)
- Ari Z Klein
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Abeed Sarker
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Karen O'Connor
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Graciela Gonzalez-Hernandez
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
33
|
Klein AZ, Sarker A, Cai H, Weissenbacher D, Gonzalez-Hernandez G. Social media mining for birth defects research: A rule-based, bootstrapping approach to collecting data for rare health-related events on Twitter. J Biomed Inform 2018; 87:68-78. [PMID: 30292855 PMCID: PMC6295660 DOI: 10.1016/j.jbi.2018.10.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Revised: 09/26/2018] [Accepted: 10/03/2018] [Indexed: 10/28/2022]
Abstract
BACKGROUND Although birth defects are the leading cause of infant mortality in the United States, methods for observing human pregnancies with birth defect outcomes are limited. OBJECTIVE The primary objectives of this study were (i) to assess whether rare health-related events-in this case, birth defects-are reported on social media, (ii) to design and deploy a natural language processing (NLP) approach for collecting such sparse data from social media, and (iii) to utilize the collected data to discover a cohort of women whose pregnancies with birth defect outcomes could be observed on social media for epidemiological analysis. METHODS To assess whether birth defects are mentioned on social media, we mined 432 million tweets posted by 112,647 users who were automatically identified via their public announcements of pregnancy on Twitter. To retrieve tweets that mention birth defects, we developed a rule-based, bootstrapping approach, which relies on a lexicon, lexical variants generated from the lexicon entries, regular expressions, post-processing, and manual analysis guided by distributional properties. To identify users whose pregnancies with birth defect outcomes could be observed for epidemiological analysis, inclusion criteria were (i) tweets indicating that the user's child has a birth defect, and (ii) accessibility to the user's tweets during pregnancy. We conducted a semi-automatic evaluation to estimate the recall of the tweet-collection approach, and performed a preliminary assessment of the prevalence of selected birth defects among the pregnancy cohort derived from Twitter. RESULTS We manually annotated 16,822 retrieved tweets, distinguishing tweets indicating that the user's child has a birth defect (true positives) from tweets that merely mention birth defects (false positives). Inter-annotator agreement was substantial: κ = 0.79 (Cohen's kappa). Analyzing the timelines of the 646 users whose tweets were true positives resulted in the discovery of 195 users that met the inclusion criteria. Congenital heart defects are the most common type of birth defect reported on Twitter, consistent with findings in the general population. Based on an evaluation of 4169 tweets retrieved using alternative text mining methods, the recall of the tweet-collection approach was 0.95. CONCLUSIONS Our contributions include (i) evidence that rare health-related events are indeed reported on Twitter, (ii) a generalizable, systematic NLP approach for collecting sparse tweets, (iii) a semi-automatic method to identify undetected tweets (false negatives), and (iv) a collection of publicly available tweets by pregnant users with birth defect outcomes, which could be used for future epidemiological analysis. In future work, the annotated tweets could be used to train machine learning algorithms to automatically identify users reporting birth defect outcomes, enabling the large-scale use of social media mining as a complementary method for such epidemiological research.
Collapse
Affiliation(s)
- Ari Z Klein
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.
| | - Abeed Sarker
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.
| | - Haitao Cai
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.
| | - Davy Weissenbacher
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.
| | - Graciela Gonzalez-Hernandez
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.
| |
Collapse
|