Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Klein AZ, Sarker A, Cai H, Weissenbacher D, Gonzalez-Hernandez G. Social media mining for birth defects research: A rule-based, bootstrapping approach to collecting data for rare health-related events on Twitter. J Biomed Inform 2018;87:68-78. [PMID: 30292855 PMCID: PMC6295660 DOI: 10.1016/j.jbi.2018.10.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Revised: 09/26/2018] [Accepted: 10/03/2018] [Indexed: 10/28/2022]

For:	Klein AZ, Sarker A, Cai H, Weissenbacher D, Gonzalez-Hernandez G. Social media mining for birth defects research: A rule-based, bootstrapping approach to collecting data for rare health-related events on Twitter. J Biomed Inform 2018;87:68-78. [PMID: 30292855 PMCID: PMC6295660 DOI: 10.1016/j.jbi.2018.10.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Revised: 09/26/2018] [Accepted: 10/03/2018] [Indexed: 10/28/2022]

Number

Cited by Other Article(s)

Tanaka H, Shimaoka M. Challenges associated with delayed definitive diagnosis among Japanese patients with specific intractable diseases: A cross-sectional study. Intractable Rare Dis Res 2023;12:213-221. [PMID: 38024587 PMCID: PMC10680161 DOI: 10.5582/irdr.2023.01068] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 09/18/2023] [Accepted: 10/15/2023] [Indexed: 12/01/2023] Open

Pathak R, Catalan-Matamoros D. Can Twitter posts serve as early indicators for potential safety signals? A retrospective analysis. INTERNATIONAL JOURNAL OF RISK & SAFETY IN MEDICINE 2023;34:41-61. [PMID: 35491804 DOI: 10.3233/jrs-210024] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]

Thakur N. MonkeyPox2022Tweets: A Large-Scale Twitter Dataset on the 2022 Monkeypox Outbreak, Findings from Analysis of Tweets, and Open Research Questions. Infect Dis Rep 2022;14:855-883. [PMID: 36412745 PMCID: PMC9680479 DOI: 10.3390/idr14060087] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/13/2022] [Accepted: 11/08/2022] [Indexed: 11/16/2022] Open

Abstract

The mining of Tweets to develop datasets on recent issues, global challenges, pandemics, virus outbreaks, emerging technologies, and trending matters has been of significant interest to the scientific community in the recent past, as such datasets serve as a rich data resource for the investigation of different research questions. Furthermore, the virus outbreaks of the past, such as COVID-19, Ebola, Zika virus, and flu, just to name a few, were associated with various works related to the analysis of the multimodal components of Tweets to infer the different characteristics of conversations on Twitter related to these respective outbreaks. The ongoing outbreak of the monkeypox virus, declared a Global Public Health Emergency (GPHE) by the World Health Organization (WHO), has resulted in a surge of conversations about this outbreak on Twitter, which is resulting in the generation of tremendous amounts of Big Data. There has been no prior work in this field thus far that has focused on mining such conversations to develop a Twitter dataset. Furthermore, no prior work has focused on performing a comprehensive analysis of Tweets about this ongoing outbreak. To address these challenges, this work makes three scientific contributions to this field. First, it presents an open-access dataset of 556,427 Tweets about monkeypox that have been posted on Twitter since the first detected case of this outbreak. A comparative study is also presented that compares this dataset with 36 prior works in this field that focused on the development of Twitter datasets to further uphold the novelty, relevance, and usefulness of this dataset. Second, the paper reports the results of a comprehensive analysis of the Tweets of this dataset. This analysis presents several novel findings; for instance, out of all the 34 languages supported by Twitter, English has been the most used language to post Tweets about monkeypox, about 40,000 Tweets related to monkeypox were posted on the day WHO declared monkeypox as a GPHE, a total of 5470 distinct hashtags have been used on Twitter about this outbreak out of which #monkeypox is the most used hashtag, and Twitter for iPhone has been the leading source of Tweets about the outbreak. The sentiment analysis of the Tweets was also performed, and the results show that despite a lot of discussions, debate, opinions, information, and misinformation, on Twitter on various topics in this regard, such as monkeypox and the LGBTQI+ community, monkeypox and COVID-19, vaccines for monkeypox, etc., "neutral" sentiment was present in most of the Tweets. It was followed by "negative" and "positive" sentiments, respectively. Finally, to support research and development in this field, the paper presents a list of 50 open research questions related to the outbreak in the areas of Big Data, Data Mining, Natural Language Processing, and Machine Learning that may be investigated based on this dataset.

Collapse

Frantz LM, Wall LB, Goldfarb CA. Media Depiction of Birth Differences of the Upper Extremity: Accuracy of Shared Diagnoses. J Pediatr Orthop 2022;42:e753-e755. [PMID: 35576061 DOI: 10.1097/bpo.0000000000002185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Klein AZ, O'Connor K, Levine LD, Gonzalez-Hernandez G. Using Twitter Data for Cohort Studies of Drug Safety in Pregnancy: Proof-of-concept With β-Blockers. JMIR Form Res 2022;6:e36771. [PMID: 35771614 PMCID: PMC9284350 DOI: 10.2196/36771] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 04/27/2022] [Accepted: 06/06/2022] [Indexed: 01/26/2023] Open

Abstract

Background

Despite the fact that medication is taken during more than 90% of pregnancies, the fetal risk for most medications is unknown, and the majority of medications have no data regarding safety in pregnancy.

Objective

Using β-blockers as a proof-of-concept, the primary objective of this study was to assess the utility of Twitter data for a cohort study design—in particular, whether we could identify (1) Twitter users who have posted tweets reporting that they took medication during pregnancy and (2) their associated pregnancy outcomes.

Methods

We searched for mentions of β-blockers in 2.75 billion tweets posted by 415,690 users who announced their pregnancy on Twitter. We manually reviewed the matching tweets to first determine if the user actually took the β-blocker mentioned in the tweet. Then, to help determine if the β-blocker was taken during pregnancy, we used the time stamp of the tweet reporting intake and drew upon an automated natural language processing (NLP) tool that estimates the date of the user’s prenatal time period. For users who posted tweets indicating that they took or may have taken the β-blocker during pregnancy, we drew upon additional NLP tools to help identify tweets that report their pregnancy outcomes. Adverse pregnancy outcomes included miscarriage, stillbirth, birth defects, preterm birth (<37 weeks gestation), low birth weight (<5 pounds and 8 ounces at delivery), and neonatal intensive care unit (NICU) admission. Normal pregnancy outcomes included gestational age ≥37 weeks and birth weight ≥5 pounds and 8 ounces.

Results

We retrieved 5114 tweets, posted by 2339 users, that mention a β-blocker, and manually identified 2332 (45.6%) tweets, posted by 1195 (51.1%) of the users, that self-report taking the β-blocker. We were able to estimate the date of the prenatal time period for 356 pregnancies among 334 (27.9%) of these 1195 users. Among these 356 pregnancies, we identified 257 (72.2%) during which the β-blocker was or may have been taken. We manually verified an adverse pregnancy outcome—preterm birth, NICU admission, low birth weight, birth defects, or miscarriage—for 38 (14.8%) of these 257 pregnancies. We manually verified a gestational age ≥37 weeks for 198 (90.4%) and a birth weight ≥5 pounds and 8 ounces for 50 (22.8%) of the 219 pregnancies for which we did not identify an adverse pregnancy outcome.

Conclusions

Our ability to detect pregnancy outcomes for Twitter users who posted tweets reporting that they took or may have taken a β-blocker during pregnancy suggests that Twitter can be a complementary resource for cohort studies of drug safety in pregnancy.

Collapse

Ge Y, Guo Y, Yang YC, Al-Garadi MA, Sarker A. A comparison of few-shot and traditional named entity recognition models for medical text. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2022;2022:84-89. [PMID: 37641590 PMCID: PMC10462421 DOI: 10.1109/ichi54592.2022.00024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]

Abstract

Many research problems involving medical texts have limited amounts of annotated data available (e.g., expressions of rare diseases). Traditional supervised machine learning algorithms, particularly those based on deep neural networks, require large volumes of annotated data, and they underperform when only small amounts of labeled data are available. Few-shot learning (FSL) is a category of machine learning models that are designed with the intent of solving problems that have small annotated datasets available. However, there is no current study that compares the performances of FSL models with traditional models (e.g., conditional random fields) for medical text at different training set sizes. In this paper, we attempted to fill this gap in research by comparing multiple FSL models with traditional models for the task of named entity recognition (NER) from medical texts. Using five health-related annotated NER datasets, we benchmarked three traditional NER models based on BERT-BERT-Linear Classifier (BLC), BERT-CRF (BC) and SANER; and three FSL NER models-StructShot & NNShot, Few-Shot Slot Tagging (FS-ST) and ProtoNER. Our benchmarking results show that almost all models, whether traditional or FSL, achieve significantly lower performances compared to the state-of-the-art with small amounts of training data. For the NER experiments we executed, the F1-scores were very low with small training sets, typically below 30%. FSL models that were reported to perform well on non-medical texts significantly underperformed, compared to their reported best, on medical texts. Our experiments also suggest that FSL methods tend to perform worse on data sets from noisy sources of medical texts, such as social media (which includes misspellings and colloquial expressions), compared to less noisy sources such as medical literature. Our experiments demonstrate that the current state-of-the-art FSL systems are not yet suitable for effective NER in medical natural language processing tasks, and further research needs to be carried out to improve their performances. Creation of specialized, standardized datasets replicating real-world scenarios may help to move this category of methods forward.

Collapse

Klein AZ, O'Connor K, Gonzalez-Hernandez G. Toward Using Twitter Data to Monitor COVID-19 Vaccine Safety in Pregnancy: Proof-of-Concept Study of Cohort Identification. JMIR Form Res 2022;6:e33792. [PMID: 34870607 PMCID: PMC8734607 DOI: 10.2196/33792] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 11/15/2021] [Accepted: 11/22/2021] [Indexed: 01/19/2023] Open

Abstract

Background

COVID-19 during pregnancy is associated with an increased risk of maternal death, intensive care unit admission, and preterm birth; however, many people who are pregnant refuse to receive COVID-19 vaccination because of a lack of safety data.

Objective

The objective of this preliminary study was to assess whether Twitter data could be used to identify a cohort for epidemiologic studies of COVID-19 vaccination in pregnancy. Specifically, we examined whether it is possible to identify users who have reported (1) that they received COVID-19 vaccination during pregnancy or the periconception period, and (2) their pregnancy outcomes.

Methods

We developed regular expressions to search for reports of COVID-19 vaccination in a large collection of tweets posted through the beginning of July 2021 by users who have announced their pregnancy on Twitter. To help determine if users were vaccinated during pregnancy, we drew upon a natural language processing (NLP) tool that estimates the timeframe of the prenatal period. For users who posted tweets with a timestamp indicating they were vaccinated during pregnancy, we drew upon additional NLP tools to help identify tweets that reported their pregnancy outcomes.

Results

We manually verified the content of tweets detected automatically, identifying 150 users who reported on Twitter that they received at least one dose of COVID-19 vaccination during pregnancy or the periconception period. We manually verified at least one reported outcome for 45 of the 60 (75%) completed pregnancies.

Conclusions

Given the limited availability of data on COVID-19 vaccine safety in pregnancy, Twitter can be a complementary resource for potentially increasing the acceptance of COVID-19 vaccination in pregnant populations. The results of this preliminary study justify the development of scalable methods to identify a larger cohort for epidemiologic studies.

Collapse

Helman SM, Herrup EA, Christopher AB, Al-Zaiti SS. The role of machine learning applications in diagnosing and assessing critical and non-critical CHD: a scoping review. Cardiol Young 2021;31:1770-1780. [PMID: 34725005 PMCID: PMC8805679 DOI: 10.1017/s1047951121004212] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Koss J, Rheinlaender A, Truebel H, Bohnet-Joschko S. Social media mining in drug development-Fundamentals and use cases. Drug Discov Today 2021;26:2871-2880. [PMID: 34481080 DOI: 10.1016/j.drudis.2021.08.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 06/03/2021] [Accepted: 08/27/2021] [Indexed: 11/18/2022]

Durand-Moreau Q, Mackenzie G, Adisesh A, Straube S, Chan XHS, Zelyas N, Greenhalgh T. Twitter Analytics to Inform Provisional Guidance for COVID-19 Challenges in the Meatpacking Industry. Ann Work Expo Health 2021;65:373-376. [PMID: 33492381 PMCID: PMC7929462 DOI: 10.1093/annweh/wxaa123] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 11/02/2020] [Accepted: 11/13/2020] [Indexed: 12/04/2022] Open

Davidson L, Boland MR. Towards deep phenotyping pregnancy: a systematic review on artificial intelligence and machine learning methods to improve pregnancy outcomes. Brief Bioinform 2021;22:6065792. [PMID: 33406530 PMCID: PMC8424395 DOI: 10.1093/bib/bbaa369] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Revised: 10/13/2020] [Accepted: 11/18/2020] [Indexed: 12/16/2022] Open

Les médecins du travail doivent être plus visibles sur Twitter. ARCH MAL PROF ENVIRO 2020. [DOI: 10.1016/j.admp.2020.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Klein AZ, Gonzalez-Hernandez G. An annotated data set for identifying women reporting adverse pregnancy outcomes on Twitter. Data Brief 2020;32:106249. [PMID: 32944604 PMCID: PMC7481818 DOI: 10.1016/j.dib.2020.106249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 08/25/2020] [Indexed: 10/29/2022] Open

Abstract

Despite the prevalence in the United States of miscarriage [1], stillbirth [2], and infant mortality associated with preterm birth and low birthweight [3], their causes remain largely unknown [4], [5], [6]. To advance the use of social media data as a complementary resource for epidemiology of adverse pregnancy outcomes, we present a data set of 6487 tweets that mention miscarriage, stillbirth, preterm birth or premature labor, low birthweight, neonatal intensive care, or fetal/infant loss in general. These tweets are a subset of 22,912 tweets retrieved by applying hand-written regular expressions to a database containing more than 400 million public tweets posted by more than 100,000 women who have announced their pregnancy on Twitter [7]. Two professional annotators labeled the 6487 tweets in a binary fashion, distinguishing those potentially reporting that the user has personally experienced the outcome ("outcome" tweets) from those that merely mention the outcome ("non-outcome" tweets). Inter-annotator agreement was κ = 0.90 (Cohen's kappa). The tweets annotated as "outcome" include 1318 women reporting miscarriage, 94 stillbirth, 591 preterm birth or premature labor, 171 low birthweight, 453 neonatal intensive care, and 356 fetal/infant loss in general. These "outcome" tweets can be used to explore patient experiences and perceptions of adverse pregnancy outcomes, and can direct researchers to the users' broader timelines-tweets posted by a user over time-for observational studies. Our past work demonstrates the analysis of timelines for selecting a study population [8] and conducting a case-control study [9] of users reporting that their child has a birth defect. For larger-scale studies, the full annotated corpus can be used to train supervised machine learning algorithms to automatically identify additional users reporting adverse pregnancy outcomes on Twitter. We used the annotated corpus to train feature-engineered and deep learning-based classifiers presented in "A natural language processing pipeline to advance the use of Twitter data for digital epidemiology of adverse pregnancy outcomes" [10].

Collapse

Yamaguchi A, Queralt-Rosinach N. A proof-of-concept study of extracting patient histories for rare/intractable diseases from social media. Genomics Inform 2020;18:e17. [PMID: 32634871 PMCID: PMC7362943 DOI: 10.5808/gi.2020.18.2.e17] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 06/18/2020] [Indexed: 12/22/2022] Open

Davoudi A, Klein AZ, Sarker A, Gonzalez-Hernandez G. Towards Automatic Bot Detection in Twitter for Health-related Tasks. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2020;2020:136-141. [PMID: 32477632 PMCID: PMC7233076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Klein AZ, Gebreyesus A, Gonzalez-Hernandez G. Automatically Identifying Comparator Groups on Twitter for Digital Epidemiology of Pregnancy Outcomes. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2020;2020:317-325. [PMID: 32477651 PMCID: PMC7233041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Tavoschi L, Quattrone F, D’Andrea E, Ducange P, Vabanesi M, Marcelloni F, Lopalco PL. Twitter as a sentinel tool to monitor public opinion on vaccination: an opinion mining analysis from September 2016 to August 2017 in Italy. Hum Vaccin Immunother 2020;16:1062-1069. [PMID: 32118519 PMCID: PMC7227677 DOI: 10.1080/21645515.2020.1714311] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2019] [Revised: 12/18/2019] [Accepted: 01/06/2020] [Indexed: 11/29/2022] Open

Wexler A, Davoudi A, Weissenbacher D, Choi R, O’Connor K, Cummings H, Gonzalez-Hernandez G. Pregnancy and health in the age of the Internet: A content analysis of online "birth club" forums. PLoS One 2020;15:e0230947. [PMID: 32287266 PMCID: PMC7156049 DOI: 10.1371/journal.pone.0230947] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/12/2020] [Indexed: 12/17/2022] Open

Abstract

BACKGROUND

Although studies report that more than 90% of pregnant women utilize digital sources to supplement their maternal healthcare, little is known about the kinds of information that women seek from their peers during pregnancy. To date, most research has used self-report measures to elucidate how and why women to turn to digital sources during pregnancy. However, given that these measures may differ from actual utilization of online health information, it is important to analyze the online content pregnant women generate.

OBJECTIVE

To apply machine learning methods to analyze online pregnancy forums, to better understand how women seek information from a community of online peers during pregnancy.

METHODS

Data from seven WhatToExpect.com "birth club" forums (September 2018; January-June 2018) were scraped. Forum posts were collected for a one-year period, which included three trimesters and three months postpartum. Only initial posts from each thread were analyzed (n = 262,238). Automatic natural language processing (NLP) methods captured 50 discussed topics, which were annotated by two independent coders and grouped categorically.

RESULTS

The largest topic categories were maternal health (45%), baby-related topics (29%), and people/relationships (10%). While pain was a popular topic all throughout pregnancy, individual topics that were dominant by trimester included miscarriage (first trimester), labor (third trimester), and baby sleeping routine (postpartum period).

CONCLUSION

More than just emotional or peer support, pregnant women turn to online forums to discuss their health. Dominant topics, such as labor and miscarriage, suggest unmet informational needs in these domains. With misinformation becoming a growing public health concern, more attention must be directed toward peer-exchange outlets.

Collapse

Pharmacoepidemiologic Evaluation of Birth Defects from Health-Related Postings in Social Media During Pregnancy. Drug Saf 2020;42:389-400. [PMID: 30284214 PMCID: PMC6426821 DOI: 10.1007/s40264-018-0731-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Klein AZ, Cai H, Weissenbacher D, Levine LD, Gonzalez-Hernandez G. A natural language processing pipeline to advance the use of Twitter data for digital epidemiology of adverse pregnancy outcomes. J Biomed Inform 2020;112S:100076. [PMID: 34417007 DOI: 10.1016/j.yjbinx.2020.100076] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 06/30/2020] [Accepted: 07/27/2020] [Indexed: 10/23/2022]

Abstract

BACKGROUND

In the United States, 17% of pregnancies end in fetal loss: miscarriage or stillbirth. Preterm birth affects 10% of live births in the United States and is the leading cause of neonatal death globally. Preterm births with low birthweight are the second leading cause of infant mortality in the United States. Despite their prevalence, the causes of miscarriage, stillbirth, and preterm birth are largely unknown.

OBJECTIVE

The primary objectives of this study are to (1) assess whether women report miscarriage, stillbirth, and preterm birth, among others, on Twitter, and (2) develop natural language processing (NLP) methods to automatically identify users from which to select cases for large-scale observational studies.

METHODS

We handcrafted regular expressions to retrieve tweets that mention an adverse pregnancy outcome, from a database containing more than 400 million publicly available tweets posted by more than 100,000 users who have announced their pregnancy on Twitter. Two annotators independently annotated 8109 (one random tweet per user) of the 22,912 retrieved tweets, distinguishing those reporting that the user has personally experienced the outcome ("outcome" tweets) from those that merely mention the outcome ("non-outcome" tweets). Inter-annotator agreement was κ = 0.90 (Cohen's kappa). We used the annotated tweets to train and evaluate feature-engineered and deep learning-based classifiers. We further annotated 7512 (of the 8109) tweets to develop a generalizable, rule-based module designed to filter out reported speech-that is, posts containing what was said by others-prior to automatic classification. We performed an extrinsic evaluation assessing whether the reported speech filter could improve the detection of women reporting adverse pregnancy outcomes on Twitter.

RESULTS

The tweets annotated as "outcome" include 1632 women reporting miscarriage, 119 stillbirth, 749 preterm birth or premature labor, 217 low birthweight, 558 NICU admission, and 458 fetal/infant loss in general. A deep neural network, BERT-based classifier achieved the highest overall F₁-score (0.88) for automatically detecting "outcome" tweets (precision = 0.87, recall = 0.89), with an F₁-score of at least 0.82 and a precision of at least 0.84 for each of the adverse pregnancy outcomes. Our reported speech filter significantly (P < 0.05) improved the accuracy of Logistic Regression (from 78.0% to 80.8%) and majority voting-based ensemble (from 81.1% to 82.9%) classifiers. Although the filter did not improve the F₁-score of the BERT-based classifier, it did improve precision-a trade-off of recall that may be acceptable for automated case selection of more prevalent outcomes. Without the filter, reported speech is one of the main sources of errors for the BERT-based classifier.

CONCLUSION

This study demonstrates that (1) women do report their adverse pregnancy outcomes on Twitter, (2) our NLP pipeline can automatically identify users from which to select cases for large-scale observational studies, and (3) our reported speech filter would reduce the cost of annotating health-related social media data and can significantly improve the overall performance of feature-based classifiers.

Collapse

Klein AZ, Sarker A, Weissenbacher D, Gonzalez-Hernandez G. Towards scaling Twitter for digital epidemiology of birth defects. NPJ Digit Med 2019;2:96. [PMID: 31583284 PMCID: PMC6773753 DOI: 10.1038/s41746-019-0170-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 08/12/2019] [Indexed: 11/13/2022] Open

Grabar N, Grouin C. A Year of Papers Using Biomedical Texts: Findings from the Section on Natural Language Processing of the IMIA Yearbook. Yearb Med Inform 2019;28:218-222. [PMID: 31419835 PMCID: PMC6697498 DOI: 10.1055/s-0039-1677937] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open

Natsiavas P, Malousi A, Bousquet C, Jaulent MC, Koutkias V. Computational Advances in Drug Safety: Systematic and Mapping Review of Knowledge Engineering Based Approaches. Front Pharmacol 2019;10:415. [PMID: 31156424 PMCID: PMC6533857 DOI: 10.3389/fphar.2019.00415] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Accepted: 04/02/2019] [Indexed: 12/12/2022] Open

Abstract

Drug Safety (DS) is a domain with significant public health and social impact. Knowledge Engineering (KE) is the Computer Science discipline elaborating on methods and tools for developing “knowledge-intensive” systems, depending on a conceptual “knowledge” schema and some kind of “reasoning” process. The present systematic and mapping review aims to investigate KE-based approaches employed for DS and highlight the introduced added value as well as trends and possible gaps in the domain. Journal articles published between 2006 and 2017 were retrieved from PubMed/MEDLINE and Web of Science® (873 in total) and filtered based on a comprehensive set of inclusion/exclusion criteria. The 80 finally selected articles were reviewed on full-text, while the mapping process relied on a set of concrete criteria (concerning specific KE and DS core activities, special DS topics, employed data sources, reference ontologies/terminologies, and computational methods, etc.). The analysis results are publicly available as online interactive analytics graphs. The review clearly depicted increased use of KE approaches for DS. The collected data illustrate the use of KE for various DS aspects, such as Adverse Drug Event (ADE) information collection, detection, and assessment. Moreover, the quantified analysis of using KE for the respective DS core activities highlighted room for intensifying research on KE for ADE monitoring, prevention and reporting. Finally, the assessed use of the various data sources for DS special topics demonstrated extensive use of dominant data sources for DS surveillance, i.e., Spontaneous Reporting Systems, but also increasing interest in the use of emerging data sources, e.g., observational healthcare databases, biochemical/genetic databases, and social media. Various exemplar applications were identified with promising results, e.g., improvement in Adverse Drug Reaction (ADR) prediction, detection of drug interactions, and novel ADE profiles related with specific mechanisms of action, etc. Nevertheless, since the reviewed studies mostly concerned proof-of-concept implementations, more intense research is required to increase the maturity level that is necessary for KE approaches to reach routine DS practice. In conclusion, we argue that efficiently addressing DS data analytics and management challenges requires the introduction of high-throughput KE-based methods for effective knowledge discovery and management, resulting ultimately, in the establishment of a continuous learning DS system.

Collapse